% inline code ©...© (copyright symbol) emacs: C-q M-) % red highlighting ®...® (registered trademark symbol) emacs: C-q M-. % blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_ % green highlighting ¢...¢ (cent symbol) emacs: C-q M-" % LaTex escape §...§ (section symbol) emacs: C-q M-' % keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^ % math escape $...$ (dollar symbol) \documentclass[twoside,11pt]{article} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Latex packages used in the document (copied from CFA user manual). \usepackage[T1]{fontenc} % allow Latin1 (extended ASCII) characters \usepackage{textcomp} \usepackage[latin1]{inputenc} \usepackage{fullpage,times,comment} \usepackage{epic,eepic} \usepackage{upquote} % switch curled `'" to straight \usepackage{calc} \usepackage{xspace} \usepackage{graphicx} \usepackage{varioref} % extended references \usepackage{listings} % format program code \usepackage[flushmargin]{footmisc} % support label/reference in footnote \usepackage{latexsym} % \Box glyph \usepackage{mathptmx} % better math font with "times" \usepackage[usenames]{color} \usepackage[pagewise]{lineno} \renewcommand{\linenumberfont}{\scriptsize\sffamily} \input{common} % bespoke macros used in the document \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref} \usepackage{breakurl} \renewcommand{\UrlFont}{\small\sf} \setlength{\topmargin}{-0.45in} % move running title into header \setlength{\headsep}{0.25in} \usepackage{caption} \usepackage{subcaption} \usepackage{bigfoot} \usepackage{amsmath} \interfootnotelinepenalty=10000 \title{ \Huge \vspace*{1in} Constructors and Destructors in \CFA \\ \vspace*{1in} } \author{ \huge Rob Schluntz \\ \Large \vspace*{0.1in} \texttt{rschlunt@uwaterloo.ca} \\ \Large Cheriton School of Computer Science \\ \Large University of Waterloo } \date{ \today } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\bigO}[1]{O\!\left( #1 \right)} \begin{document} \pagestyle{headings} % changed after setting pagestyle \renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}} \renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}} \pagenumbering{roman} % \linenumbers % comment out to turn off line numbering \maketitle \thispagestyle{empty} \clearpage \thispagestyle{plain} \pdfbookmark[1]{Contents}{section} \tableofcontents \clearpage \thispagestyle{plain} \pagenumbering{arabic} \section{Introduction} Unguided manual resource management is difficult. Part of the difficulty results from not having any guarantees about the current state of an object. Objects can be internally composed of pointers which may reference resources that may or may not need to be manually released, and keeping track of that state for each object can be difficult for the end user. Constructors and destructors provide a mechanism which bookend the lifetime of an object, allowing the designer of an object to establish invariants for objects of a specific type. Constructors guarantee that object initialization code is run before the object can be used, while destructors provide a mechanism that is guaranteed to be run immediately before an object's lifetime ends. Constructors and destructors can help to simplify resource management when used in a disciplined way. In particular, when all resources are acquired in a constructor, and all resources are released in a destructor, no resource leaks are possible. This pattern is a popular idiom in several languages, such as \CC, known as RAII (Resource Acquisition Is Initialization). \section{Design} \label{s:Design} In designing constructors and destructors for \CFA, the primary goals were ease of use and maintaining backwards compatibility. In C, when a variable is defined, its value is initially undefined unless it is explicitly initialized or allocated in the static area. \begin{lstlisting} int main() { int x; int y = 5; x = y; } \end{lstlisting} In the example above, ©x© is defined and left uninitialized, while ©y© is defined and initialized to 5. In the last line, ©x© is assigned the value of ©y©. The key difference between assignment and initialization being that assignment occurs on a live object (i.e. an object that contains data). It is important to note that this means ©x© could have been used uninitialized prior to being assigned, while ©y© could not be used uninitialized. Use of uninitialized variables commonly yields undefined behaviour, which is a common source of bugs in C programs. % TODO: *citation* A constructor is a special function which provides a way of ensuring that some form of object initialization is performed. This goal is achieved through a guarantee that a constructor will be called implicitly on every object that is allocated, immediately after the object's definition. Since a constructor is called on every object, it is impossible to forget to initialize a single object, as long as all constructors perform some sensible form of initialization. In \CFA, a constructor is a function with the name ©?{}©. Every constructor must have a return type of ©void© and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming languages. The ©this© parameter must have a pointer type, whose base type is the type of object that the function constructs. There is currently a proposal to add reference types to \CFA. Once this proposal has been implemented, the ©this© parameter should instead be a reference type with the same restrictions. % \begin{lstlisting} % void ?{}(int * i) { % *i = 0; % } % int x, y = 1, z; % \end{lstlisting} % In this example, a constructor is defined which initializes variables of type ©int© to 0. Consider the definition of a simple type which encapsulates a dynamic array of ©int©s. \begin{lstlisting} struct Array { int * data; int len; } \end{lstlisting} In C, if the user creates an ©Array© object, the fields ©data© and ©len© will be uninitialized by default. It is the user's responsibility to remember to initialize both of the fields to sensible values. In \CFA, the user can define a constructor to handle initialization of ©Array© objects. \begin{lstlisting} void ?{}(Array * arr){ arr->len = 10; arr->data = malloc(sizeof(int)*len); for (int i = 0; i < arr->len; ++i) { arr->data[i] = 0; } } Array x; \end{lstlisting} This constructor will initialize ©x© so that its ©length© field has the value 10, and its ©data© field holds a pointer to a block of memory large enough to hold 10 ©int©s, and sets the value of each element of the array to 0. This particular form of constructor is called the \emph{default constructor}, because it is called an object defined without an initializer. A default constructor is a constructor which takes a single argument, the ©this© parameter. In \CFA, a destructor is a function much like a constructor, except that its name is ©^?{}©. A destructor for the ©Array© type can be defined as such. \begin{lstlisting} void ^?{}(Array * arr) { free(arr->data); } \end{lstlisting} Since the destructor is automatically called for all objects of type ©Array©, the memory associated with an ©Array© will automatically be freed when the object's lifetime ends. % The exact guarantees made by \CFA with respect to the calling of destructors will be discussed in detail [later]. As discussed previously, the distinction between initialization and assignment is important. Consider the following example. \begin{lstlisting} Array x; Array y; Array z = x; y = x; \end{lstlisting} By the previous definition of the default constructor for ©Array©, ©x© and ©y© are initialized to valid arrays of length 10 after their respective definitions. On line 3, ©z© is initialized with the value of ©x©, while on line ©4©, ©y© is assigned the value of ©x©. The key distinction between initialization and assignment is that a value about to be initialized does not hold any meaningful values, whereas an object about to be assigned might. In particular, these cases cannot be handled the same way because in the former case ©z© does not currently own an array, while ©y© does. \begin{lstlisting} void ?{}(Array * arr, Array other) { arr->len = other.len; arr->data = malloc(sizeof(int)*arr->len) for (int i = 0; i < arr->len; ++i) { arr->data[i] = other.data[i]; } } Array ?=?(Array * arr, Array other) { ^?{}(arr); ?{}(arr, other); return *arr; } \end{lstlisting} The two functions above handle these cases. The first function is called a \emph{copy constructor}, because it constructs its argument from a single value of the same type. The second function is the standard copy assignment operator. These four functions are special in that they control the state of most objects. % TODO: start new section here? this is where the definition of the rules begins to become more formal, whereas everything leading up to this was mostly exposition by example \subsection{Function Generation Details} By default, every type is defined to have the core set of functions described previously. To mimic the behaviour of plain C, the default constructor and destructor for all of the basic types are defined to do nothing, while the copy constructor and assignment operator perform a bitwise copy of the source parameter. There are several options for user-defined types: structures, unions, and enumerations. To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed. The generated functions for enumerations are the simplest. Since enumerations in C are essentially just another integral type, the generated functions behave in the same way that the builtin functions for the basic types work. For structures, the situation is more complicated. For a structure ©S© with members ©M$_0$©, ©M$_1$©, ... ©M$_{N-1}$©, each function ©f© in the standard set will call ©f(s->M$_i$, ...)© for each ©$i$©. That is, a default constructor for ©S© will default construct the members of ©S©, the copy constructor with copy construct them, and so on. % TODO: description VERY weak. Flesh out details For example given the struct definition \begin{lstlisting} struct A { B b; C c; } \end{lstlisting} The following functions are implicitly generated. \begin{lstlisting} void ?{}(A * this) { ?{}(&this->b); ?{}(&this->c); } void ?{}(A * this, A other) { ?{}(&this->b, other.b); ?{}(&this->c, other.c); } A ?=?(A * this, A other) { ?=?(&this->b, other.b); ?=?(&this->c, other.c); } void ^?{}(A * this) { ^?{}(&this->b); ^?{}(&this->c); } \end{lstlisting} In addition to the standard set, a set of \emph{field constructors} is also generated for structures. The field constructors are constructors that consume a prefix of the struct's member list. That is, $N$ constructors are built of the form ©void ?{}(S *, T$_{\text{M}_0}$)©, ©void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)©, ..., ©void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)©, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise. The addition of field constructors allows structs in \CFA to be used naturally in the same ways that they could be used in C (i.e. to initialize any prefix of the struct). Extending the previous example, the following constructors are implicitly generated for ©A©. \begin{lstlisting} void ?{}(A * this, B b) { ?{}(&this->b, b); ?{}(&this->c); } void ?{}(A * this, B b, C c) { ?{}(&this->b, b); ?{}(&this->c, c); } \end{lstlisting} For unions, the default constructor and destructor do nothing, as it's not obvious which member if any should be constructed. For copy constructor and assignment operations, a bitwise ©memcpy© is applied. An alterantive to this design is to always construct and destruct the first member of a union, to match with the C semantics of initializing the first member of the union. This approach ultimately feels subtle and unsafe. Another option is to, like \CC, disallow unions from containing members which are themselves managed types. This is a reasonable approach from a safety standpoint, but is not very C-like. Since the primary purpose of a union is to provide low-level memory optimization, it is assumed that the user has a certain level of maturity. It is therefore the responsibility of the user to define the special functions explicitly if they are appropriate, since it is impossible to accurately predict the ways that a union is intended to be used at compile-time. Arrays are a special case in the C type system. C arrays do not carry around their size, making it impossible to write a standalone \CFA function which constructs or destructs an array while maintaining the standard interface for constructors and destructors. Instead, \CFA defines the initialization and destruction of an array recursively. That is, when an array is defined, each of its elements will be constructed in order from element 0 up to element $n-1$. When an array is to be implicitly destructed, each of its elements is destructed in reverse order from element $n-1$ down to element 0. \subsection{Using Constructors and Destructors} Implicitly generated constructor and destructor calls ignore the outermost type qualifiers on a type by way of a cast on the first argument to the function. This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects. Since this applies only to implicitly generated constructor calls, the language will not allow qualified objects to be re-initialized with a constructor without an explicit cast. Unlike \CC, \CFA provides an escape hatch that allows a user to decide at an object's definition whether it should be managed or not. An object initialized with \lstinline{@=} is guaranteed to be initialized like a C object, and will not be implicitly destructed. This feature provides all of the freedom that C programmers are used to having to optimize the program, while maintaining safety as a sensible default. In addition to freedom, \lstinline{@=} provides a simple path to migrating legacy C code to Cforall, in that objects can be moved from C-style initialization to \CFA gradually and individually. It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants will be established on an unmanaged object. It is recommended that most objects are managed by sensible constructors and destructors, except where absolutely necessary. When the user defines any constructor, the intrinsic/generated functions become invisible. In the current implementation, the default constructor and copy constructor are only hidden when explicitly overriden, since the derefence operator ©*?© is currently an ©otype© function, which would make it impossible to override a constructor, due to the lack of assertion-satifying funcitons. There is a proposal which decouples size/alignment type information from ©otype©, and implementing this would allow constructors and destructors to be hidden by the same rules that \CC uses. % TODO discuss compile time "checks" for subobjects when defining ctor/dtor for struct When defining a constructor or destructor for a struct ©S©, any members that are not explicitly constructed or destructed will be implicitly constructed or destructed automatically. If an explicit call is present, then that call is taken in preference to any implicitly generated call. A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of subobjects on a per-constructor basis, whereas in \CC subobject initialization and destruction is always performed based on the declaration order. Finally, it is illegal for a subobject to be explicitly constructed after it is used for the first time. If the translator cannot be reasonably sure that an object is constructed prior to its first use, but may be constructed afterward, and error is emitted. To override this rule, \lstinline{@=} can be used to force the translator to trust the programmer's discretion. This form of \lstinline{@=} is not yet implemented. % TODO discuss error if initializer nesting is too deep or contains designations Despite great effort, some forms of C syntax do not work well with constructors in \CFA. In particular, constructor calls cannot contain designations, since this is equivalent to allowing designations on the arguments to arbitrary function calls. In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition. Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names. As a result, it was decided that any attempt to resolve designated function calls with C's function prototype rules would be brittle, and thus it is not sensible to allow designations in constructor calls. In addition, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one. In C, having a greater nesting depth would mean that the programmer intends to initialize subobjects with the nested initializer. The reason for this omission is to both simplify the mental model for using constructors, and to make the case of initialization simpler for the resolver. If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument to one of the available constructors, making the problem highly recursive and potentially much more expensive. It should be noted that if an object does not have a non-trivial constructor, it can still make use of designations and nested initializers in \CFA. % TODO section on where destruction occurs - probably belongs in implementation section? or part of it does, anyway Destructors are automatically called at the end of the block in which the object was declared. In addition to this, destructors are automatically called when statements manipulate control flow to leave the block in which the object is declared, e.g. with return, break, continue, and goto statements. % TODO discuss copy construction for function parameters and return values When a function is called, the arguments supplied to the call are subject to implicit copy construction, and the return value is subject to destruction. When a value is returned from a function, the copy constructor is called to pass the value back to the call site. Exempt from these rules are intrinsic and builtin functions. % TODO discuss ©= + copy construction? % \subsection{Implementation} % Discuss the implementation details of constructors and destructors. \end{document}