source: doc/proposals/ctordtor/ctor.tex @ bc179fd3

ADTarm-ehast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since bc179fd3 was 484ee53, checked in by Peter A. Buhr <pabuhr@…>, 6 years ago

update Makefiles so ${Build} is order only

  • Property mode set to 100644
File size: 18.9 KB
Line 
1\documentclass[twoside,11pt]{article}
2
3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4
5% Latex packages used in the document (copied from CFA user manual).
6\usepackage[T1]{fontenc}                                % allow Latin1 (extended ASCII) characters
7\usepackage{textcomp}
8\usepackage[latin1]{inputenc}
9
10\usepackage{fullpage,times,comment}
11\usepackage{epic,eepic}
12\usepackage{upquote}                                    % switch curled `'" to straight
13\usepackage{calc}
14\usepackage{xspace}
15\usepackage{graphicx}
16\usepackage{varioref}                                   % extended references
17\usepackage{listings}                                   % format program code
18\usepackage[flushmargin]{footmisc}                      % support label/reference in footnote
19\usepackage{latexsym}                                   % \Box glyph
20\usepackage{mathptmx}                                   % better math font with "times"
21\usepackage[usenames]{color}
22\usepackage[pagewise]{lineno}
23\renewcommand{\linenumberfont}{\scriptsize\sffamily}
24\input{common}                                          % bespoke macros used in the document
25\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
26\usepackage{breakurl}
27\renewcommand{\UrlFont}{\small\sf}
28
29\setlength{\topmargin}{-0.45in}                         % move running title into header
30\setlength{\headsep}{0.25in}
31
32\usepackage{caption}
33\usepackage{subcaption}
34\usepackage{bigfoot}
35\usepackage{amsmath}
36
37\interfootnotelinepenalty=10000
38
39\CFAStyle                                               % use default CFA format-style
40% inline code ©...© (copyright symbol) emacs: C-q M-)
41% red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
42% blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
43% green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
44% LaTex escape §...§ (section symbol) emacs: C-q M-'
45% keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
46% math escape $...$ (dollar symbol)
47
48
49\title{
50\Huge \vspace*{1in} Constructors and Destructors in \CFA \\
51\vspace*{1in}
52}
53
54\author{
55\huge Rob Schluntz \\
56\Large \vspace*{0.1in} \texttt{rschlunt@uwaterloo.ca} \\
57\Large Cheriton School of Computer Science \\
58\Large University of Waterloo
59}
60
61\date{
62\today
63}
64
65%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
66
67\newcommand{\bigO}[1]{O\!\left( #1 \right)}
68
69\begin{document}
70\pagestyle{headings}
71% changed after setting pagestyle
72\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
73\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
74\pagenumbering{roman}
75% \linenumbers                                            % comment out to turn off line numbering
76
77\maketitle
78\thispagestyle{empty}
79
80\clearpage
81\thispagestyle{plain}
82\pdfbookmark[1]{Contents}{section}
83\tableofcontents
84
85\clearpage
86\thispagestyle{plain}
87\pagenumbering{arabic}
88
89
90\section{Introduction}
91Unguided manual resource management is difficult.
92Part of the difficulty results from not having any guarantees about the current state of an object.
93Objects can be internally composed of pointers which may reference resources that may or may not need to be manually released, and keeping track of that state for each object can be difficult for the end user.
94
95Constructors and destructors provide a mechanism which bookend the lifetime of an object, allowing the designer of an object to establish invariants for objects of a specific type.
96Constructors guarantee that object initialization code is run before the object can be used, while destructors provide a mechanism that is guaranteed to be run immediately before an object's lifetime ends.
97Constructors and destructors can help to simplify resource management when used in a disciplined way.
98In particular, when all resources are acquired in a constructor, and all resources are released in a destructor, no resource leaks are possible.
99This pattern is a popular idiom in several languages, such as \CC, known as RAII (Resource Acquisition Is Initialization).
100
101
102\section{Design}
103\label{s:Design}
104In designing constructors and destructors for \CFA, the primary goals were ease of use and maintaining backwards compatibility.
105
106In C, when a variable is defined, its value is initially undefined unless it is explicitly initialized or allocated in the static area.
107\begin{lstlisting}
108int main() {
109  int x;
110  int y = 5;
111  x = y;
112}
113\end{lstlisting}
114In the example above, ©x© is defined and left uninitialized, while ©y© is defined and initialized to 5.
115In the last line, ©x© is assigned the value of ©y©.
116The key difference between assignment and initialization being that assignment occurs on a live object (i.e. an object that contains data).
117It is important to note that this means ©x© could have been used uninitialized prior to being assigned, while ©y© could not be used uninitialized.
118Use of uninitialized variables commonly yields undefined behaviour, which is a common source of bugs in C programs. % TODO: *citation*
119
120A constructor is a special function which provides a way of ensuring that some form of object initialization is performed.
121This goal is achieved through a guarantee that a constructor will be called implicitly on every object that is allocated, immediately after the object's definition.
122Since a constructor is called on every object, it is impossible to forget to initialize a single object, as long as all constructors perform some sensible form of initialization.
123
124In \CFA, a constructor is a function with the name ©?{}©.
125Every constructor must have a return type of ©void© and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming languages.
126The ©this© parameter must have a pointer type, whose base type is the type of object that the function constructs.
127There is currently a proposal to add reference types to \CFA.
128Once this proposal has been implemented, the ©this© parameter should instead be a reference type with the same restrictions.
129
130% \begin{lstlisting}
131% void ?{}(int * i) {
132%   *i = 0;
133% }
134% int x, y = 1, z;
135% \end{lstlisting}
136% In this example, a constructor is defined which initializes variables of type ©int© to 0.
137
138Consider the definition of a simple type which encapsulates a dynamic array of ©int©s.
139
140\begin{lstlisting}
141struct Array {
142  int * data;
143  int len;
144}
145\end{lstlisting}
146
147In C, if the user creates an ©Array© object, the fields ©data© and ©len© will be uninitialized by default.
148It is the user's responsibility to remember to initialize both of the fields to sensible values.
149In \CFA, the user can define a constructor to handle initialization of ©Array© objects.
150
151\begin{lstlisting}
152void ?{}(Array * arr){
153  arr->len = 10;
154  arr->data = malloc(sizeof(int)*len);
155  for (int i = 0; i < arr->len; ++i) {
156    arr->data[i] = 0;
157  }
158}
159Array x;
160\end{lstlisting}
161
162This constructor will initialize ©x© so that its ©length© field has the value 10, and its ©data© field holds a pointer to a block of memory large enough to hold 10 ©int©s, and sets the value of each element of the array to 0.
163This particular form of constructor is called the \emph{default constructor}, because it is called an object defined without an initializer.
164A default constructor is a constructor which takes a single argument, the ©this© parameter.
165
166In \CFA, a destructor is a function much like a constructor, except that its name is ©^?{}©.
167A destructor for the ©Array© type can be defined as such.
168\begin{lstlisting}
169void ^?{}(Array * arr) {
170  free(arr->data);
171}
172\end{lstlisting}
173Since the destructor is automatically called for all objects of type ©Array©, the memory associated with an ©Array© will automatically be freed when the object's lifetime ends.
174% The exact guarantees made by \CFA with respect to the calling of destructors will be discussed in detail [later].
175
176As discussed previously, the distinction between initialization and assignment is important.
177Consider the following example.
178\begin{lstlisting}
179Array x;
180Array y;
181Array z = x;
182y = x;
183\end{lstlisting}
184By the previous definition of the default constructor for ©Array©, ©x© and ©y© are initialized to valid arrays of length 10 after their respective definitions.
185On line 3, ©z© is initialized with the value of ©x©, while on line ©4©, ©y© is assigned the value of ©x©.
186The key distinction between initialization and assignment is that a value about to be initialized does not hold any meaningful values, whereas an object about to be assigned might.
187In particular, these cases cannot be handled the same way because in the former case ©z© does not currently own an array, while ©y© does.
188
189\begin{lstlisting}
190void ?{}(Array * arr, Array other) {
191  arr->len = other.len;
192  arr->data = malloc(sizeof(int)*arr->len)
193  for (int i = 0; i < arr->len; ++i) {
194    arr->data[i] = other.data[i];
195  }
196}
197Array ?=?(Array * arr, Array other) {
198  ^?{}(arr);
199  ?{}(arr, other);
200  return *arr;
201}
202\end{lstlisting}
203The two functions above handle these cases.
204The first function is called a \emph{copy constructor}, because it constructs its argument from a single value of the same type.
205The second function is the standard copy assignment operator.
206These four functions are special in that they control the state of most objects.
207
208% TODO: start new section here? this is where the definition of the rules begins to become more formal, whereas everything leading up to this was mostly exposition by example
209\subsection{Function Generation Details}
210By default, every type is defined to have the core set of functions described previously.
211To mimic the behaviour of plain C, the default constructor and destructor for all of the basic types are defined to do nothing, while the copy constructor and assignment operator perform a bitwise copy of the source parameter.
212
213There are several options for user-defined types: structures, unions, and enumerations.
214To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed.
215
216The generated functions for enumerations are the simplest.
217Since enumerations in C are essentially just another integral type, the generated functions behave in the same way that the builtin functions for the basic types work.
218
219For structures, the situation is more complicated.
220For a structure ©S© with members ©M$_0$©, ©M$_1$©, ... ©M$_{N-1}$©, each function ©f© in the standard set will call ©f(s->M$_i$, ...)© for each ©$i$©.
221That is, a default constructor for ©S© will default construct the members of ©S©, the copy constructor with copy construct them, and so on.
222 % TODO: description VERY weak. Flesh out details
223For example given the struct definition
224\begin{lstlisting}
225struct A {
226  B b;
227  C c;
228}
229\end{lstlisting}
230The following functions are implicitly generated.
231\begin{lstlisting}
232void ?{}(A * this) {
233  ?{}(&this->b);
234  ?{}(&this->c);
235}
236void ?{}(A * this, A other) {
237  ?{}(&this->b, other.b);
238  ?{}(&this->c, other.c);
239}
240A ?=?(A * this, A other) {
241  ?=?(&this->b, other.b);
242  ?=?(&this->c, other.c);
243}
244void ^?{}(A * this) {
245  ^?{}(&this->b);
246  ^?{}(&this->c);
247}
248\end{lstlisting}
249
250In addition to the standard set, a set of \emph{field constructors} is also generated for structures.
251The field constructors are constructors that consume a prefix of the struct's member list.
252That is, $N$ constructors are built of the form ©void ?{}(S *, T$_{\text{M}_0}$)©, ©void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)©, ..., ©void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)©, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise.
253The addition of field constructors allows structs in \CFA to be used naturally in the same ways that they could be used in C (i.e. to initialize any prefix of the struct).
254Extending the previous example, the following constructors are implicitly generated for ©A©.
255\begin{lstlisting}
256void ?{}(A * this, B b) {
257  ?{}(&this->b, b);
258  ?{}(&this->c);
259}
260void ?{}(A * this, B b, C c) {
261  ?{}(&this->b, b);
262  ?{}(&this->c, c);
263}
264\end{lstlisting}
265
266For unions, the default constructor and destructor do nothing, as it's not obvious which member if any should be constructed.
267For copy constructor and assignment operations, a bitwise ©memcpy© is applied.
268An alterantive to this design is to always construct and destruct the first member of a union, to match with the C semantics of initializing the first member of the union.
269This approach ultimately feels subtle and unsafe.
270Another option is to, like \CC, disallow unions from containing members which are themselves managed types.
271This is a reasonable approach from a safety standpoint, but is not very C-like.
272Since the primary purpose of a union is to provide low-level memory optimization, it is assumed that the user has a certain level of maturity.
273It is therefore the responsibility of the user to define the special functions explicitly if they are appropriate, since it is impossible to accurately predict the ways that a union is intended to be used at compile-time.
274
275Arrays are a special case in the C type system.
276C arrays do not carry around their size, making it impossible to write a standalone \CFA function which constructs or destructs an array while maintaining the standard interface for constructors and destructors.
277Instead, \CFA defines the initialization and destruction of an array recursively.
278That is, when an array is defined, each of its elements will be constructed in order from element 0 up to element $n-1$.
279When an array is to be implicitly destructed, each of its elements is destructed in reverse order from element $n-1$ down to element 0.
280
281\subsection{Using Constructors and Destructors}
282Implicitly generated constructor and destructor calls ignore the outermost type qualifiers on a type by way of a cast on the first argument to the function.
283This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects.
284Since this applies only to implicitly generated constructor calls, the language will not allow qualified objects to be re-initialized with a constructor without an explicit cast.
285
286Unlike \CC, \CFA provides an escape hatch that allows a user to decide at an object's definition whether it should be managed or not.
287An object initialized with \lstinline{@=} is guaranteed to be initialized like a C object, and will not be implicitly destructed.
288This feature provides all of the freedom that C programmers are used to having to optimize the program, while maintaining safety as a sensible default.
289In addition to freedom, \lstinline{@=} provides a simple path to migrating legacy C code to Cforall, in that objects can be moved from C-style initialization to \CFA gradually and individually.
290It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants will be established on an unmanaged object.
291It is recommended that most objects are managed by sensible constructors and destructors, except where absolutely necessary.
292
293When the user defines any constructor, the intrinsic/generated functions become invisible.
294In the current implementation, the default constructor and copy constructor are only hidden when explicitly overriden, since the derefence operator ©*?© is currently an ©otype© function, which would make it impossible to override a constructor, due to the lack of assertion-satifying funcitons.
295There is a proposal which decouples size/alignment type information from ©otype©, and implementing this would allow constructors and destructors to be hidden by the same rules that \CC uses.
296
297% TODO discuss compile time "checks" for subobjects when defining ctor/dtor for struct
298When defining a constructor or destructor for a struct ©S©, any members that are not explicitly constructed or destructed will be implicitly constructed or destructed automatically.
299If an explicit call is present, then that call is taken in preference to any implicitly generated call.
300A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of subobjects on a per-constructor basis, whereas in \CC subobject initialization and destruction is always performed based on the declaration order.
301Finally, it is illegal for a subobject to be explicitly constructed after it is used for the first time.
302If the translator cannot be reasonably sure that an object is constructed prior to its first use, but may be constructed afterward, and error is emitted.
303To override this rule, \lstinline{@=} can be used to force the translator to trust the programmer's discretion.
304This form of \lstinline{@=} is not yet implemented.
305
306% TODO discuss error if initializer nesting is too deep or contains designations
307Despite great effort, some forms of C syntax do not work well with constructors in \CFA.
308In particular, constructor calls cannot contain designations, since this is equivalent to allowing designations on the arguments to arbitrary function calls.
309In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition.
310Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names.
311As a result, it was decided that any attempt to resolve designated function calls with C's function prototype rules would be brittle, and thus it is not sensible to allow designations in constructor calls.
312In addition, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one.
313In C, having a greater nesting depth would mean that the programmer intends to initialize subobjects with the nested initializer.
314The reason for this omission is to both simplify the mental model for using constructors, and to make the case of initialization simpler for the resolver.
315If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument to one of the available constructors, making the problem highly recursive and potentially much more expensive.
316It should be noted that if an object does not have a non-trivial constructor, it can still make use of designations and nested initializers in \CFA.
317
318% TODO section on where destruction occurs - probably belongs in implementation section? or part of it does, anyway
319Destructors are automatically called at the end of the block in which the object was declared.
320In addition to this, destructors are automatically called when statements manipulate control flow to leave the block in which the object is declared, e.g. with return, break, continue, and goto statements.
321
322% TODO discuss copy construction for function parameters and return values
323When a function is called, the arguments supplied to the call are subject to implicit copy construction, and the return value is subject to destruction.
324When a value is returned from a function, the copy constructor is called to pass the value back to the call site.
325Exempt from these rules are intrinsic and builtin functions.
326
327% TODO discuss ©= + copy construction?
328
329% \subsection{Implementation}
330% Discuss the implementation details of constructors and destructors.
331
332\end{document}
Note: See TracBrowser for help on using the repository browser.