source: doc/proposals/ctordtor/ctor.tex @ b2052f7

Last change on this file since b2052f7 was 2298a7b8, checked in by Rob Schluntz <rschlunt@…>, 8 years ago

added ctor and tuple design docs

  • Property mode set to 100644
File size: 18.9 KB
1% inline code ©...© (copyright symbol) emacs: C-q M-)
2% red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
3% blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
4% green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
5% LaTex escape §...§ (section symbol) emacs: C-q M-'
6% keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
7% math escape $...$ (dollar symbol)
13% Latex packages used in the document (copied from CFA user manual).
14\usepackage[T1]{fontenc}                                % allow Latin1 (extended ASCII) characters
19\usepackage{upquote}                                                                    % switch curled `'" to straight
23\usepackage{varioref}                                                                   % extended references
24\usepackage{listings}                                                                   % format program code
25\usepackage[flushmargin]{footmisc}                                              % support label/reference in footnote
26\usepackage{latexsym}                                   % \Box glyph
27\usepackage{mathptmx}                                   % better math font with "times"
31\input{common}                                          % bespoke macros used in the document
36\setlength{\topmargin}{-0.45in}                                                 % move running title into header
47\Huge \vspace*{1in} Constructors and Destructors in \CFA \\
52\huge Rob Schluntz \\
53\Large \vspace*{0.1in} \texttt{} \\
54\Large Cheriton School of Computer Science \\
55\Large University of Waterloo
64\newcommand{\bigO}[1]{O\!\left( #1 \right)}
68% changed after setting pagestyle
69\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
70\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
72% \linenumbers                                            % comment out to turn off line numbering
89Unguided manual resource management is difficult.
90Part of the difficulty results from not having any guarantees about the current state of an object.
91Objects can be internally composed of pointers which may reference resources that may or may not need to be manually released, and keeping track of that state for each object can be difficult for the end user.
93Constructors and destructors provide a mechanism which bookend the lifetime of an object, allowing the designer of an object to establish invariants for objects of a specific type.
94Constructors guarantee that object initialization code is run before the object can be used, while destructors provide a mechanism that is guaranteed to be run immediately before an object's lifetime ends.
95Constructors and destructors can help to simplify resource management when used in a disciplined way.
96In particular, when all resources are acquired in a constructor, and all resources are released in a destructor, no resource leaks are possible.
97This pattern is a popular idiom in several languages, such as \CC, known as RAII (Resource Acquisition Is Initialization).
102In designing constructors and destructors for \CFA, the primary goals were ease of use and maintaining backwards compatibility.
104In C, when a variable is defined, its value is initially undefined unless it is explicitly initialized or allocated in the static area.
106int main() {
107  int x;
108  int y = 5;
109  x = y;
112In the example above, ©x© is defined and left uninitialized, while ©y© is defined and initialized to 5.
113In the last line, ©x© is assigned the value of ©y©.
114The key difference between assignment and initialization being that assignment occurs on a live object (i.e. an object that contains data).
115It is important to note that this means ©x© could have been used uninitialized prior to being assigned, while ©y© could not be used uninitialized.
116Use of uninitialized variables commonly yields undefined behaviour, which is a common source of bugs in C programs. % TODO: *citation*
118A constructor is a special function which provides a way of ensuring that some form of object initialization is performed.
119This goal is achieved through a guarantee that a constructor will be called implicitly on every object that is allocated, immediately after the object's definition.
120Since a constructor is called on every object, it is impossible to forget to initialize a single object, as long as all constructors perform some sensible form of initialization.
122In \CFA, a constructor is a function with the name ©?{}©.
123Every constructor must have a return type of ©void© and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming languages.
124The ©this© parameter must have a pointer type, whose base type is the type of object that the function constructs.
125There is currently a proposal to add reference types to \CFA.
126Once this proposal has been implemented, the ©this© parameter should instead be a reference type with the same restrictions.
128% \begin{lstlisting}
129% void ?{}(int * i) {
130%   *i = 0;
131% }
132% int x, y = 1, z;
133% \end{lstlisting}
134% In this example, a constructor is defined which initializes variables of type ©int© to 0.
136Consider the definition of a simple type which encapsulates a dynamic array of ©int©s.
139struct Array {
140  int * data;
141  int len;
145In C, if the user creates an ©Array© object, the fields ©data© and ©len© will be uninitialized by default.
146It is the user's responsibility to remember to initialize both of the fields to sensible values.
147In \CFA, the user can define a constructor to handle initialization of ©Array© objects.
150void ?{}(Array * arr){
151  arr->len = 10;
152  arr->data = malloc(sizeof(int)*len);
153  for (int i = 0; i < arr->len; ++i) {
154    arr->data[i] = 0;
155  }
157Array x;
160This constructor will initialize ©x© so that its ©length© field has the value 10, and its ©data© field holds a pointer to a block of memory large enough to hold 10 ©int©s, and sets the value of each element of the array to 0.
161This particular form of constructor is called the \emph{default constructor}, because it is called an object defined without an initializer.
162A default constructor is a constructor which takes a single argument, the ©this© parameter.
164In \CFA, a destructor is a function much like a constructor, except that its name is ©^?{}©.
165A destructor for the ©Array© type can be defined as such.
167void ^?{}(Array * arr) {
168  free(arr->data);
171Since the destructor is automatically called for all objects of type ©Array©, the memory associated with an ©Array© will automatically be freed when the object's lifetime ends.
172% The exact guarantees made by \CFA with respect to the calling of destructors will be discussed in detail [later].
174As discussed previously, the distinction between initialization and assignment is important.
175Consider the following example.
177Array x;
178Array y;
179Array z = x;
180y = x;
182By the previous definition of the default constructor for ©Array©, ©x© and ©y© are initialized to valid arrays of length 10 after their respective definitions.
183On line 3, ©z© is initialized with the value of ©x©, while on line ©4©, ©y© is assigned the value of ©x©.
184The key distinction between initialization and assignment is that a value about to be initialized does not hold any meaningful values, whereas an object about to be assigned might.
185In particular, these cases cannot be handled the same way because in the former case ©z© does not currently own an array, while ©y© does.
188void ?{}(Array * arr, Array other) {
189  arr->len = other.len;
190  arr->data = malloc(sizeof(int)*arr->len)
191  for (int i = 0; i < arr->len; ++i) {
192    arr->data[i] =[i];
193  }
195Array ?=?(Array * arr, Array other) {
196  ^?{}(arr);
197  ?{}(arr, other);
198  return *arr;
201The two functions above handle these cases.
202The first function is called a \emph{copy constructor}, because it constructs its argument from a single value of the same type.
203The second function is the standard copy assignment operator.
204These four functions are special in that they control the state of most objects.
206% TODO: start new section here? this is where the definition of the rules begins to become more formal, whereas everything leading up to this was mostly exposition by example
207\subsection{Function Generation Details}
208By default, every type is defined to have the core set of functions described previously.
209To mimic the behaviour of plain C, the default constructor and destructor for all of the basic types are defined to do nothing, while the copy constructor and assignment operator perform a bitwise copy of the source parameter.
211There are several options for user-defined types: structures, unions, and enumerations.
212To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed.
214The generated functions for enumerations are the simplest.
215Since enumerations in C are essentially just another integral type, the generated functions behave in the same way that the builtin functions for the basic types work.
217For structures, the situation is more complicated.
218For a structure ©S© with members ©M$_0$©, ©M$_1$©, ... ©M$_{N-1}$©, each function ©f© in the standard set will call ©f(s->M$_i$, ...)© for each ©$i$©.
219That is, a default constructor for ©S© will default construct the members of ©S©, the copy constructor with copy construct them, and so on.
220 % TODO: description VERY weak. Flesh out details
221For example given the struct definition
223struct A {
224  B b;
225  C c;
228The following functions are implicitly generated.
230void ?{}(A * this) {
231  ?{}(&this->b);
232  ?{}(&this->c);
234void ?{}(A * this, A other) {
235  ?{}(&this->b, other.b);
236  ?{}(&this->c, other.c);
238A ?=?(A * this, A other) {
239  ?=?(&this->b, other.b);
240  ?=?(&this->c, other.c);
242void ^?{}(A * this) {
243  ^?{}(&this->b);
244  ^?{}(&this->c);
248In addition to the standard set, a set of \emph{field constructors} is also generated for structures.
249The field constructors are constructors that consume a prefix of the struct's member list.
250That is, $N$ constructors are built of the form ©void ?{}(S *, T$_{\text{M}_0}$)©, ©void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)©, ..., ©void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)©, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise.
251The addition of field constructors allows structs in \CFA to be used naturally in the same ways that they could be used in C (i.e. to initialize any prefix of the struct).
252Extending the previous example, the following constructors are implicitly generated for ©A©.
254void ?{}(A * this, B b) {
255  ?{}(&this->b, b);
256  ?{}(&this->c);
258void ?{}(A * this, B b, C c) {
259  ?{}(&this->b, b);
260  ?{}(&this->c, c);
264For unions, the default constructor and destructor do nothing, as it's not obvious which member if any should be constructed.
265For copy constructor and assignment operations, a bitwise ©memcpy© is applied.
266An alterantive to this design is to always construct and destruct the first member of a union, to match with the C semantics of initializing the first member of the union.
267This approach ultimately feels subtle and unsafe.
268Another option is to, like \CC, disallow unions from containing members which are themselves managed types.
269This is a reasonable approach from a safety standpoint, but is not very C-like.
270Since the primary purpose of a union is to provide low-level memory optimization, it is assumed that the user has a certain level of maturity.
271It is therefore the responsibility of the user to define the special functions explicitly if they are appropriate, since it is impossible to accurately predict the ways that a union is intended to be used at compile-time.
273Arrays are a special case in the C type system.
274C arrays do not carry around their size, making it impossible to write a standalone \CFA function which constructs or destructs an array while maintaining the standard interface for constructors and destructors.
275Instead, \CFA defines the initialization and destruction of an array recursively.
276That is, when an array is defined, each of its elements will be constructed in order from element 0 up to element $n-1$.
277When an array is to be implicitly destructed, each of its elements is destructed in reverse order from element $n-1$ down to element 0.
279\subsection{Using Constructors and Destructors}
280Implicitly generated constructor and destructor calls ignore the outermost type qualifiers on a type by way of a cast on the first argument to the function.
281This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects.
282Since this applies only to implicitly generated constructor calls, the language will not allow qualified objects to be re-initialized with a constructor without an explicit cast.
284Unlike \CC, \CFA provides an escape hatch that allows a user to decide at an object's definition whether it should be managed or not.
285An object initialized with \lstinline{@=} is guaranteed to be initialized like a C object, and will not be implicitly destructed.
286This feature provides all of the freedom that C programmers are used to having to optimize the program, while maintaining safety as a sensible default.
287In addition to freedom, \lstinline{@=} provides a simple path to migrating legacy C code to Cforall, in that objects can be moved from C-style initialization to \CFA gradually and individually.
288It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants will be established on an unmanaged object.
289It is recommended that most objects are managed by sensible constructors and destructors, except where absolutely necessary.
291When the user defines any constructor, the intrinsic/generated functions become invisible.
292In the current implementation, the default constructor and copy constructor are only hidden when explicitly overriden, since the derefence operator ©*?© is currently an ©otype© function, which would make it impossible to override a constructor, due to the lack of assertion-satifying funcitons.
293There is a proposal which decouples size/alignment type information from ©otype©, and implementing this would allow constructors and destructors to be hidden by the same rules that \CC uses.
295% TODO discuss compile time "checks" for subobjects when defining ctor/dtor for struct
296When defining a constructor or destructor for a struct ©S©, any members that are not explicitly constructed or destructed will be implicitly constructed or destructed automatically.
297If an explicit call is present, then that call is taken in preference to any implicitly generated call.
298A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of subobjects on a per-constructor basis, whereas in \CC subobject initialization and destruction is always performed based on the declaration order.
299Finally, it is illegal for a subobject to be explicitly constructed after it is used for the first time.
300If the translator cannot be reasonably sure that an object is constructed prior to its first use, but may be constructed afterward, and error is emitted.
301To override this rule, \lstinline{@=} can be used to force the translator to trust the programmer's discretion.
302This form of \lstinline{@=} is not yet implemented.
304% TODO discuss error if initializer nesting is too deep or contains designations
305Despite great effort, some forms of C syntax do not work well with constructors in \CFA.
306In particular, constructor calls cannot contain designations, since this is equivalent to allowing designations on the arguments to arbitrary function calls.
307In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition.
308Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names.
309As a result, it was decided that any attempt to resolve designated function calls with C's function prototype rules would be brittle, and thus it is not sensible to allow designations in constructor calls.
310In addition, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one.
311In C, having a greater nesting depth would mean that the programmer intends to initialize subobjects with the nested initializer.
312The reason for this omission is to both simplify the mental model for using constructors, and to make the case of initialization simpler for the resolver.
313If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument to one of the available constructors, making the problem highly recursive and potentially much more expensive.
314It should be noted that if an object does not have a non-trivial constructor, it can still make use of designations and nested initializers in \CFA.
316% TODO section on where destruction occurs - probably belongs in implementation section? or part of it does, anyway
317Destructors are automatically called at the end of the block in which the object was declared.
318In addition to this, destructors are automatically called when statements manipulate control flow to leave the block in which the object is declared, e.g. with return, break, continue, and goto statements.
320% TODO discuss copy construction for function parameters and return values
321When a function is called, the arguments supplied to the call are subject to implicit copy construction, and the return value is subject to destruction.
322When a value is returned from a function, the copy constructor is called to pass the value back to the call site.
323Exempt from these rules are intrinsic and builtin functions.
325% TODO discuss ©= + copy construction?
327% \subsection{Implementation}
328% Discuss the implementation details of constructors and destructors.
Note: See TracBrowser for help on using the repository browser.