Ignore:
Timestamp:
Apr 3, 2017, 7:04:30 PM (5 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
fbd7ad6
Parents:
ae6cc8b
Message:

incorporate Peter's feedback, handle many TODOs

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/rob_thesis/ctordtor.tex

    rae6cc8b r7493339  
    22\chapter{Constructors and Destructors}
    33%======================================================================
    4 
    5 % TODO: discuss move semantics; they haven't been implemented, but could be. Currently looking at alternative models. (future work)
    64
    75% TODO: as an experiment, implement Andrei Alexandrescu's ScopeGuard http://www.drdobbs.com/cpp/generic-change-the-way-you-write-excepti/184403758?pgno=2
     
    553551% // and so on
    554552
    555 
    556 
    557 % TODO: talk somewhere about compound literals?
    558 
    559553Since \CFA is a true systems language, it does not provide a garbage collector.
    560 As well, \CFA is not an object-oriented programming language, i.e. structures cannot have routine members.
     554As well, \CFA is not an object-oriented programming language, i.e., structures cannot have routine members.
    561555Nevertheless, one important goal is to reduce programming complexity and increase safety.
    562556To that end, \CFA provides support for implicit pre/post-execution of routines for objects, via constructors and destructors.
    563 
    564 % TODO: this is old. remove or refactor
    565 % Manual resource management is difficult.
    566 % Part of the difficulty results from not having any guarantees about the current state of an object.
    567 % Objects can be internally composed of pointers that may reference resources which may or may not need to be manually released, and keeping track of that state for each object can be difficult for the end user.
    568 
    569 % Constructors and destructors provide a mechanism to bookend the lifetime of an object, allowing the designer of a type to establish invariants for objects of that type.
    570 % Constructors guarantee that object initialization code is run before the object can be used, while destructors provide a mechanism that is guaranteed to be run immediately before an object's lifetime ends.
    571 % Constructors and destructors can help to simplify resource management when used in a disciplined way.
    572 % In particular, when all resources are acquired in a constructor, and all resources are released in a destructor, no resource leaks are possible.
    573 % This pattern is a popular idiom in several languages, such as \CC, known as RAII (Resource Acquisition Is Initialization).
    574557
    575558This chapter details the design of constructors and destructors in \CFA, along with their current implementation in the translator.
     
    592575Next, @x@ is assigned the value of @y@.
    593576In the last line, @z@ is implicitly initialized to 0 since it is marked @static@.
    594 The key difference between assignment and initialization being that assignment occurs on a live object (i.e. an object that contains data).
     577The key difference between assignment and initialization being that assignment occurs on a live object (i.e., an object that contains data).
    595578It is important to note that this means @x@ could have been used uninitialized prior to being assigned, while @y@ could not be used uninitialized.
    596 Use of uninitialized variables yields undefined behaviour, which is a common source of errors in C programs. % TODO: *citation*
    597 
    598 Declaration initialization is insufficient, because it permits uninitialized variables to exist and because it does not allow for the insertion of arbitrary code before the variable is live.
    599 Many C compilers give good warnings most of the time, but they cannot in all cases.
    600 \begin{cfacode}
    601 int f(int *);  // never reads the parameter, only writes
    602 int g(int *);  // reads the parameter - expects an initialized variable
     579Use of uninitialized variables yields undefined behaviour, which is a common source of errors in C programs.
     580
     581Declaration initialization is insufficient, because it permits uninitialized variables to exist and because it does not allow for the insertion of arbitrary code before a variable is live.
     582Many C compilers give good warnings for uninitialized variables most of the time, but they cannot in all cases.
     583\begin{cfacode}
     584int f(int *);  // output parameter: never reads, only writes
     585int g(int *);  // input parameter: never writes, only reads,
     586               // so requires initialized variable
    603587
    604588int x, y;
    605589f(&x);  // okay - only writes to x
    606 g(&y);  // will use y uninitialized
    607 \end{cfacode}
    608 Other languages are able to give errors in the case of uninitialized variable use, but due to backwards compatibility concerns, this cannot be the case in \CFA.
     590g(&y);  // uses y uninitialized
     591\end{cfacode}
     592Other languages are able to give errors in the case of uninitialized variable use, but due to backwards compatibility concerns, this is not the case in \CFA.
    609593
    610594In C, constructors and destructors are often mimicked by providing routines that create and teardown objects, where the teardown function is typically only necessary if the type modifies the execution environment.
     
    614598};
    615599struct array_int create_array(int sz) {
    616   return (struct array_int) { malloc(sizeof(int)*sz) };
     600  return (struct array_int) { calloc(sizeof(int)*sz) };
    617601}
    618602void destroy_rh(struct resource_holder * rh) {
     
    639623
    640624In \CFA, a constructor is a function with the name @?{}@.
     625Like other operators in \CFA, the name represents the syntax used to call the constructor, e.g., @struct S = { ... };@.
    641626Every constructor must have a return type of @void@ and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming-languages (however, a programmer can give it an arbitrary name).
    642627The @this@ parameter must have a pointer type, whose base type is the type of object that the function constructs.
     
    655640
    656641In C, if the user creates an @Array@ object, the fields @data@ and @len@ are uninitialized, unless an explicit initializer list is present.
    657 It is the user's responsibility to remember to initialize both of the fields to sensible values.
     642It is the user's responsibility to remember to initialize both of the fields to sensible values, since there are no implicit checks for invalid values or reasonable defaults.
    658643In \CFA, the user can define a constructor to handle initialization of @Array@ objects.
    659644
     
    671656This constructor initializes @x@ so that its @length@ field has the value 10, and its @data@ field holds a pointer to a block of memory large enough to hold 10 @int@s, and sets the value of each element of the array to 0.
    672657This particular form of constructor is called the \emph{default constructor}, because it is called on an object defined without an initializer.
    673 In other words, a default constructor is a constructor that takes a single argument, the @this@ parameter.
     658In other words, a default constructor is a constructor that takes a single argument: the @this@ parameter.
    674659
    675660In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}!.
     
    680665}
    681666\end{cfacode}
    682 Since the destructor is automatically called at deallocation for all objects of type @Array@, the memory associated with an @Array@ is automatically freed when the object's lifetime ends.
     667The destructor is automatically called at deallocation for all objects of type @Array@.
     668Hence, the memory associated with an @Array@ is automatically freed when the object's lifetime ends.
    683669The exact guarantees made by \CFA with respect to the calling of destructors are discussed in section \ref{sub:implicit_dtor}.
    684670
     
    691677\end{cfacode}
    692678By the previous definition of the default constructor for @Array@, @x@ and @y@ are initialized to valid arrays of length 10 after their respective definitions.
    693 On line 3, @z@ is initialized with the value of @x@, while on line @4@, @y@ is assigned the value of @x@.
     679On line 2, @z@ is initialized with the value of @x@, while on line 3, @y@ is assigned the value of @x@.
    694680The key distinction between initialization and assignment is that a value to be initialized does not hold any meaningful values, whereas an object to be assigned might.
    695681In particular, these cases cannot be handled the same way because in the former case @z@ does not currently own an array, while @y@ does.
     
    712698The first function is called a \emph{copy constructor}, because it constructs its argument by copying the values from another object of the same type.
    713699The second function is the standard copy-assignment operator.
    714 These four functions are special in that they control the state of most objects.
     700The four functions (default constructor, destructor, copy constructor, and assignment operator) are special in that they safely control the state of most objects.
    715701
    716702It is possible to define a constructor that takes any combination of parameters to provide additional initialization options.
     
    729715Array x, y = { 20, 0xdeadbeef }, z = y;
    730716\end{cfacode}
     717
    731718In \CFA, constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with.
    732719One downside of reusing C initialization syntax is that it isn't possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point.
     
    748735Destructors are implicitly called in reverse declaration-order so that objects with dependencies are destructed before the objects they are dependent on.
    749736
    750 \subsection{Syntax}
    751 \label{sub:syntax} % TODO: finish this section
     737\subsection{Calling Syntax}
     738\label{sub:syntax}
    752739There are several ways to construct an object in \CFA.
    753740As previously introduced, every variable is automatically constructed at its definition, which is the most natural way to construct an object.
     
    773760A * y = malloc();  // copy construct: ?{}(&y, malloc())
    774761
    775 ?{}(&x);    // explicit construct x
    776 ?{}(y, x);  // explit construct y from x
    777 ^?{}(&x);   // explicit destroy x
     762?{}(&x);    // explicit construct x, second construction
     763?{}(y, x);  // explit construct y from x, second construction
     764^?{}(&x);   // explicit destroy x, in different order
    778765^?{}(y);    // explicit destroy y
    779766
     
    781768// implicit ^?{}(&x);
    782769\end{cfacode}
    783 Calling a constructor or destructor directly is a flexible feature that allows complete control over the management of a piece of storage.
     770Calling a constructor or destructor directly is a flexible feature that allows complete control over the management of storage.
    784771In particular, constructors double as a placement syntax.
    785772\begin{cfacode}
     
    804791Finally, constructors and destructors support \emph{operator syntax}.
    805792Like other operators in \CFA, the function name mirrors the use-case, in that the first $N$ arguments fill in the place of the question mark.
     793This syntactic form is similar to the new initialization syntax in \CCeleven, except that it is used in expression contexts, rather than declaration contexts.
    806794\begin{cfacode}
    807795struct A { ... };
     
    822810Destructor operator syntax is actually an statement, and requires parentheses for symmetry with constructor syntax.
    823811
     812One of these three syntactic forms should appeal to either C or \CC programmers using \CFA.
     813
    824814\subsection{Function Generation}
    825815In \CFA, every type is defined to have the core set of four functions described previously.
     
    833823There are several options for user-defined types: structures, unions, and enumerations.
    834824To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed.
    835 By auto-generating these functions, it is ensured that legacy C code will continue to work correctly in every context where \CFA expects these functions to exist, since they are generated for every complete type.
     825By auto-generating these functions, it is ensured that legacy C code continues to work correctly in every context where \CFA expects these functions to exist, since they are generated for every complete type.
    836826
    837827The generated functions for enumerations are the simplest.
    838828Since enumerations in C are essentially just another integral type, the generated functions behave in the same way that the builtin functions for the basic types work.
    839 % TODO: examples for enums
    840829For example, given the enumeration
    841830\begin{cfacode}
     
    860849\end{cfacode}
    861850In the future, \CFA will introduce strongly-typed enumerations, like those in \CC.
    862 The existing generated routines will be sufficient to express this restriction, since they are currently set up to take in values of that enumeration type.
     851The existing generated routines are sufficient to express this restriction, since they are currently set up to take in values of that enumeration type.
    863852Changes related to this feature only need to affect the expression resolution phase, where more strict rules will be applied to prevent implicit conversions from integral types to enumeration types, but should continue to permit conversions from enumeration types to @int@.
    864 In this way, it will still be possible to add an @int@ to an enumeration, but the resulting value will be an @int@, meaning that it won't be possible to reassign the value into an enumeration without a cast.
     853In this way, it is still possible to add an @int@ to an enumeration, but the resulting value is an @int@, meaning it cannot be reassigned to an enumeration without a cast.
    865854
    866855For structures, the situation is more complicated.
    867 For a structure @S@ with members @M$_0$@, @M$_1$@, ... @M$_{N-1}$@, each function @f@ in the standard set calls \lstinline{f(s->M$_i$, ...)} for each @$i$@.
    868 That is, a default constructor for @S@ default constructs the members of @S@, the copy constructor with copy construct them, and so on.
    869 For example given the struct definition
     856Given a structure @S@ with members @M$_0$@, @M$_1$@, ... @M$_{N-1}$@, each function @f@ in the standard set calls \lstinline{f(s->M$_i$, ...)} for each @$i$@.
     857That is, a default constructor for @S@ default constructs the members of @S@, the copy constructor copy constructs them, and so on.
     858For example, given the structure definition
    870859\begin{cfacode}
    871860struct A {
     
    893882}
    894883\end{cfacode}
    895 It is important to note that the destructors are called in reverse declaration order to resolve conflicts in the event there are dependencies among members.
     884It is important to note that the destructors are called in reverse declaration order to prevent conflicts in the event there are dependencies among members.
    896885
    897886In addition to the standard set, a set of \emph{field constructors} is also generated for structures.
    898 The field constructors are constructors that consume a prefix of the struct's member list.
     887The field constructors are constructors that consume a prefix of the structure's member-list.
    899888That is, $N$ constructors are built of the form @void ?{}(S *, T$_{\text{M}_0}$)@, @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)@, ..., @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)@, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise.
    900 The addition of field constructors allows structs in \CFA to be used naturally in the same ways that they could be used in C (i.e. to initialize any prefix of the struct), e.g., @A a0 = { b }, a1 = { b, c }@.
     889The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (i.e., to initialize any prefix of the structure), e.g., @A a0 = { b }, a1 = { b, c }@.
    901890Extending the previous example, the following constructors are implicitly generated for @A@.
    902891\begin{cfacode}
     
    911900\end{cfacode}
    912901
    913 For unions, the default constructor and destructor do nothing, as it is not obvious which member if any should be constructed.
     902For unions, the default constructor and destructor do nothing, as it is not obvious which member, if any, should be constructed.
    914903For copy constructor and assignment operations, a bitwise @memcpy@ is applied.
    915904In standard C, a union can also be initialized using a value of the same type as its first member, and so a corresponding field constructor is generated to perform a bitwise @memcpy@ of the object.
     
    947936
    948937% This feature works in the \CFA model, since constructors are simply special functions and can be called explicitly, unlike in \CC. % this sentence isn't really true => placement new
    949 In \CCeleven, this restriction has been loosened to allow unions with managed members, with the caveat that any if there are any members with a user-defined operation, then that operation is not implicitly defined, forcing the user to define the operation if necessary.
     938In \CCeleven, unions may have managed members, with the caveat that if there are any members with a user-defined operation, then that operation is not implicitly defined, forcing the user to define the operation if necessary.
    950939This restriction could easily be added into \CFA once \emph{deleted} functions are added.
    951940
     
    970959Here, @&s@ and @&s2@ are cast to unqualified pointer types.
    971960This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects.
    972 Since this applies only to implicitly generated constructor calls, the language does not allow qualified objects to be re-initialized with a constructor without an explicit cast.
     961This applies only to implicitly generated constructor calls.
     962Hence, explicitly re-initializing qualified objects with a constructor requires an explicit cast.
     963
     964As discussed in Section \ref{sub:c_background}, compound literals create unnamed objects.
     965This mechanism can continue to be used seamlessly in \CFA with managed types to create temporary objects.
     966The object created by a compound literal is constructed using the provided brace-enclosed initializer-list, and is destructed at the end of the scope it is used in.
     967For example,
     968\begin{cfacode}
     969struct A { int x; };
     970void ?{}(A *, int, int);
     971{
     972  int x = (A){ 10, 20 }.x;
     973}
     974\end{cfacode}
     975is equivalent to
     976\begin{cfacode}
     977struct A { int x, y; };
     978void ?{}(A *, int, int);
     979{
     980  A _tmp;
     981  ?{}(&_tmp, 10, 20);
     982  int x = _tmp.x;
     983  ^?{}(&tmp);
     984}
     985\end{cfacode}
    973986
    974987Unlike \CC, \CFA provides an escape hatch that allows a user to decide at an object's definition whether it should be managed or not.
     
    984997A a2 @= { 0 };  // unmanaged
    985998\end{cfacode}
    986 In this example, @a1@ is a managed object, and thus is default constructed and destructed at the end of @a1@'s lifetime, while @a2@ is an unmanaged object and is not implicitly constructed or destructed.
    987 Instead, @a2->x@ is initialized to @0@ as if it were a C object, due to the explicit initializer.
    988 Existing constructors are ignored when \ateq is used, so that any valid C initializer is able to initialize the object.
     999In this example, @a1@ is a managed object, and thus is default constructed and destructed at the start/end of @a1@'s lifetime, while @a2@ is an unmanaged object and is not implicitly constructed or destructed.
     1000Instead, @a2->x@ is initialized to @0@ as if it were a C object, because of the explicit initializer.
    9891001
    9901002In addition to freedom, \ateq provides a simple path to migrating legacy C code to Cforall, in that objects can be moved from C-style initialization to \CFA gradually and individually.
     
    9921004It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary.
    9931005
    994 When the user declares any constructor or destructor, the corresponding intrinsic/generated function and all field constructors for that type are hidden, so that they will not be found during expression resolution unless the user-defined function goes out of scope.
    995 Furthermore, if the user declares any constructor, then the intrinsic/generated default constructor is also hidden, making it so that objects of a type may not be default constructable.
    996 This closely mirrors the rule for implicit declaration of constructors in \CC, wherein the default constructor is implicitly declared if there is no user-declared constructor. % TODO: cite C++98 page 186??
     1006When a user declares any constructor or destructor, the corresponding intrinsic/generated function and all field constructors for that type are hidden, so that they are not found during expression resolution until the user-defined function goes out of scope.
     1007Furthermore, if the user declares any constructor, then the intrinsic/generated default constructor is also hidden, precluding default construction.
     1008These semantics closely mirror the rule for implicit declaration of constructors in \CC, wherein the default constructor is implicitly declared if there is no user-declared constructor \cite[p.~186]{ANSI98:C++}.
    9971009\begin{cfacode}
    9981010struct S { int x, y; };
     
    10011013  S s0, s1 = { 0 }, s2 = { 0, 2 }, s3 = s2;  // okay
    10021014  {
    1003     void ?{}(S * s, int i) { s->x = i*2; }
     1015    void ?{}(S * s, int i) { s->x = i*2; } // locally hide autogen constructors
    10041016    S s4;  // error
    10051017    S s5 = { 3 };  // okay
     
    10581070} // z, y, w implicitly destructed, in this order
    10591071\end{cfacode}
    1060 If at any point, the @this@ parameter is passed directly as the target of another constructor, then it is assumed that constructor handles the initialization of all of the object's members and no implicit constructor calls are added. % TODO: confirm that this is correct. It might be possible to get subtle errors if you initialize some members then call another constructor... -- in fact, this is basically always wrong. if anything, I should check that such a constructor does not initialize any members, otherwise it'll always initialize the member twice (once locally, once by the called constructor).
     1072If at any point, the @this@ parameter is passed directly as the target of another constructor, then it is assumed that constructor handles the initialization of all of the object's members and no implicit constructor calls are added. % TODO: this is basically always wrong. if anything, I should check that such a constructor does not initialize any members, otherwise it'll always initialize the member twice (once locally, once by the called constructor). This might be okay in some situations, but it deserves a warning at the very least.
    10611073To override this rule, \ateq can be used to force the translator to trust the programmer's discretion.
    10621074This form of \ateq is not yet implemented.
     
    10641076Despite great effort, some forms of C syntax do not work well with constructors in \CFA.
    10651077In particular, constructor calls cannot contain designations (see \ref{sub:c_background}), since this is equivalent to allowing designations on the arguments to arbitrary function calls.
    1066 In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition.
    1067 Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names.
    10681078\begin{cfacode}
    10691079// all legal forward declarations in C
     
    10761086f(b:10, a:20, c:30);  // which parameter is which?
    10771087\end{cfacode}
     1088In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition.
     1089Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names.
    10781090As a result, it was decided that any attempt to resolve designated function calls with C's function prototype rules would be brittle, and thus it is not sensible to allow designations in constructor calls.
    1079 % Many other languages do allow named arguments, such as Python and Scala, but they do not allow multiple arbitrarily named forward declarations of a function.
    1080 
    1081 In addition, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one.
     1091
     1092In addition, constructor calls do not support unnamed nesting.
     1093\begin{cfacode}
     1094struct B { int x; };
     1095struct C { int y; };
     1096struct A { B b; C c; };
     1097void ?{}(A *, B);
     1098void ?{}(A *, C);
     1099
     1100A a = {
     1101  { 10 },  // construct B? - invalid
     1102};
     1103\end{cfacode}
     1104In C, nesting initializers means that the programmer intends to initialize subobjects with the nested initializers.
     1105The reason for this omission is to both simplify the mental model for using constructors, and to make initialization simpler for the expression resolver.
     1106If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument in one of the available constructors, making the problem highly recursive and potentially much more expensive.
     1107That is, in the previous example the line marked as an error could mean construct using @?{}(A *, B)@ or with @?{}(A *, C)@, since the inner initializer @{ 10 }@ could be taken as an intermediate object of type @B@ or @C@.
     1108In practice, however, there could be many objects that can be constructed from a given @int@ (or, indeed, any arbitrary parameter list), and thus a complete solution to this problem would require fully exploring all possibilities.
     1109
     1110More precisely, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one.
    10821111For example,
    10831112\begin{cfacode}
     
    10981127% TODO: in CFA if the array dimension is empty, no object constructors are added -- need to fix this.
    10991128The body of @A@ has been omitted, since only the constructor interfaces are important.
    1100 In C, having a greater nesting depth means that the programmer intends to initialize subobjects with the nested initializer.
    1101 The reason for this omission is to both simplify the mental model for using constructors, and to make initialization simpler for the expression resolver.
    1102 If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument in one of the available constructors, making the problem highly recursive and potentially much more expensive.
    1103 That is, in the previous example the line marked as an error could mean construct using @?{}(A *, A, A)@, since the inner initializer @{ 11 }@ could be taken as an intermediate object of type @A@ constructed with @?{}(A *, int)@.
    1104 In practice, however, there could be many objects that can be constructed from a given @int@ (or, indeed, any arbitrary parameter list), and thus a complete solution to this problem would require fully exploring all possibilities.
     1129
    11051130It should be noted that unmanaged objects can still make use of designations and nested initializers in \CFA.
     1131It is simple to overcome this limitation for managed objects by making use of compound literals, so that the arguments to the constructor call are explicitly typed.
    11061132
    11071133\subsection{Implicit Destructors}
     
    11301156\end{cfacode}
    11311157
    1132 %% having this feels excessive, but it's here if necessary
    1133 % This procedure generates the following code.
    1134 % \begin{cfacode}
    1135 % void f(int i){
    1136 %   struct A x;
    1137 %   ?{}(&x);
    1138 %   {
    1139 %     struct A y;
    1140 %     ?{}(&y);
    1141 %     {
    1142 %       struct A z;
    1143 %       ?{}(&z);
    1144 %       {
    1145 %         if ((i==0)!=0) {
    1146 %           ^?{}(&z);
    1147 %           ^?{}(&y);
    1148 %           ^?{}(&x);
    1149 %           return;
    1150 %         }
    1151 %       }
    1152 %       if (((i==1)!=0) {
    1153 %           ^?{}(&z);
    1154 %           ^?{}(&y);
    1155 %           ^?{}(&x);
    1156 %           return ;
    1157 %       }
    1158 %       ^?{}(&z);
    1159 %     }
    1160 
    1161 %     if ((i==2)!=0) {
    1162 %       ^?{}(&y);
    1163 %       ^?{}(&x);
    1164 %       return;
    1165 %     }
    1166 %     ^?{}(&y);
    1167 %   }
    1168 
    1169 %   ^?{}(&x);
    1170 % }
    1171 % \end{cfacode}
    1172 
    11731158The next example illustrates the use of simple continue and break statements and the manner that they interact with implicit destructors.
    11741159\begin{cfacode}
     
    11831168\end{cfacode}
    11841169Since a destructor call is automatically inserted at the end of the block, nothing special needs to happen to destruct @x@ in the case where control reaches the end of the loop.
    1185 In the case where @i@ is @2@, the continue statement runs the loop update expression and attemps to begin the next iteration of the loop.
     1170In the case where @i@ is @2@, the continue statement runs the loop update expression and attempts to begin the next iteration of the loop.
    11861171Since continue is a C statement, which does not understand destructors, a destructor call is added just before the continue statement to ensure that @x@ is destructed.
    11871172When @i@ is @3@, the break statement moves control to just past the end of the loop.
     
    11931178L1: for (int i = 0; i < 10; i++) {
    11941179  A x;
    1195   L2: for (int j = 0; j < 10; j++) {
     1180  for (int j = 0; j < 10; j++) {
    11961181    A y;
    1197     if (j == 0) {
    1198       continue;    // destruct y
    1199     } else if (j == 1) {
    1200       break;       // destruct y
    1201     } else if (i == 1) {
     1182    if (i == 1) {
    12021183      continue L1; // destruct y
    12031184    } else if (i == 2) {
     
    12091190The statement @continue L1@ begins the next iteration of the outer for-loop.
    12101191Since the semantics of continue require the loop update expression to execute, control branches to the \emph{end} of the outer for loop, meaning that the block destructor for @x@ can be reused, and it is only necessary to generate the destructor for @y@.
     1192% TODO: "why not do this all the time? fix or justify"
    12111193Break, on the other hand, requires jumping out of the loop, so the destructors for both @x@ and @y@ are generated and inserted before the @break L1@ statement.
    12121194
     
    12771259Exempt from these rules are intrinsic and builtin functions.
    12781260It should be noted that unmanaged objects are subject to copy constructor calls when passed as arguments to a function or when returned from a function, since they are not the \emph{target} of the copy constructor call.
     1261That is, since the parameter is not marked as an unmanaged object using \ateq, it will be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments.
    12791262This is an important detail to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed.
    12801263\begin{cfacode}
     
    12841267void ^?{}(A *);
    12851268
    1286 A f(A x) {
    1287   return x;
     1269A identity(A x) { // pass by value => need local copy
     1270  return x;       // return by value => make call-site copy
    12881271}
    12891272
    12901273A y, z @= {};
    1291 identity(y);
    1292 identity(z);
     1274identity(y);  // copy construct y into x
     1275identity(z);  // copy construct z into x
    12931276\end{cfacode}
    12941277Note that @z@ is copy constructed into a temporary variable to be passed as an argument, which is also destructed after the call.
    1295 A special syntactic form, such as a variant of \ateq, could be implemented to specify at the call site that an argument should not be copy constructed, to regain some control for the C programmer.
    12961278
    12971279This generates the following
    12981280\begin{cfacode}
    12991281struct A f(struct A x){
    1300   struct A _retval_f;
    1301   ?{}((&_retval_f), x);
     1282  struct A _retval_f;    // return value
     1283  ?{}((&_retval_f), x);  // copy construct return value
    13021284  return _retval_f;
    13031285}
    13041286
    13051287struct A y;
    1306 ?{}(&y);
    1307 struct A z = { 0 };
    1308 
    1309 struct A _tmp_cp1;     // argument 1
    1310 struct A _tmp_cp_ret0; // return value
    1311 _tmp_cp_ret0=f((?{}(&_tmp_cp1, y) , _tmp_cp1)), _tmp_cp_ret0;
    1312 ^?{}(&_tmp_cp_ret0);   // return value
    1313 ^?{}(&_tmp_cp1);       // argument 1
    1314 
    1315 struct A _tmp_cp2;     // argument 1
    1316 struct A _tmp_cp_ret1; // return value
    1317 _tmp_cp_ret1=f((?{}(&_tmp_cp2, z), _tmp_cp2)), _tmp_cp_ret1;
    1318 ^?{}(&_tmp_cp_ret1);   // return value
    1319 ^?{}(&_tmp_cp2);       // argument 1
     1288?{}(&y);                 // default construct
     1289struct A z = { 0 };      // C default
     1290
     1291struct A _tmp_cp1;       // argument 1
     1292struct A _tmp_cp_ret0;   // return value
     1293_tmp_cp_ret0=f(
     1294  (?{}(&_tmp_cp1, y) , _tmp_cp1)  // argument is a comma expression
     1295), _tmp_cp_ret0;         // return value for cascading
     1296^?{}(&_tmp_cp_ret0);     // destruct return value
     1297^?{}(&_tmp_cp1);         // destruct argument 1
     1298
     1299struct A _tmp_cp2;       // argument 1
     1300struct A _tmp_cp_ret1;   // return value
     1301_tmp_cp_ret1=f(
     1302  (?{}(&_tmp_cp2, z), _tmp_cp2)  // argument is a common expression
     1303), _tmp_cp_ret1;         // return value for cascading
     1304^?{}(&_tmp_cp_ret1);     // destruct return value
     1305^?{}(&_tmp_cp2);         // destruct argument 1
    13201306^?{}(&y);
    13211307\end{cfacode}
     1308
     1309A special syntactic form, such as a variant of \ateq, can be implemented to specify at the call site that an argument should not be copy constructed, to regain some control for the C programmer.
     1310\begin{cfacode}
     1311identity(z@);  // do not copy construct argument
     1312               // - will copy construct/destruct return value
     1313A@ identity_nocopy(A @ x) {  // argument not copy constructed or destructed
     1314  return x;  // not copy constructed
     1315             // return type marked @ => not destructed
     1316}
     1317\end{cfacode}
     1318It should be noted that reference types will allow specifying that a value does not need to be copied, however reference types do not provide a means of preventing implicit copy construction from uses of the reference, so the problem is still present when passing or returning the reference by value.
    13221319
    13231320A known issue with this implementation is that the return value of a function is not guaranteed to have the same address for its entire lifetime.
    13241321Specifically, since @_retval_f@ is allocated and constructed in @f@ then returned by value, the internal data is bitwise copied into the caller's stack frame.
    13251322This approach works out most of the time, because typically destructors need to only access the fields of the object and recursively destroy.
    1326 It is currently the case that constructors and destructors which use the @this@ pointer as a unique identifier to store data externally will not work correctly for return value objects.
    1327 Thus is it not safe to rely on an object's @this@ pointer to remain constant throughout execution of the program.
     1323It is currently the case that constructors and destructors that use the @this@ pointer as a unique identifier to store data externally do not work correctly for return value objects.
     1324Thus, it is not safe to rely on an object's @this@ pointer to remain constant throughout execution of the program.
    13281325\begin{cfacode}
    13291326A * external_data[32];
     
    13411338  }
    13421339}
     1340
     1341A makeA() {
     1342  A x;  // stores &x in external_data
     1343  return x;
     1344}
     1345makeA();  // return temporary has a different address than x
     1346// equivalent to:
     1347//   A _tmp;
     1348//   _tmp = makeA(), _tmp;
     1349//   ^?{}(&_tmp);
    13431350\end{cfacode}
    13441351In the above example, a global array of pointers is used to keep track of all of the allocated @A@ objects.
    1345 Due to copying on return, the current object being destructed will not exist in the array if an @A@ object is ever returned by value from a function.
    1346 
    1347 This problem could be solved in the translator by mutating the function signatures so that the return value is moved into the parameter list.
     1352Due to copying on return, the current object being destructed does not exist in the array if an @A@ object is ever returned by value from a function.
     1353
     1354This problem could be solved in the translator by changing the function signatures so that the return value is moved into the parameter list.
    13481355For example, the translator could restructure the code like so
    13491356\begin{cfacode}
     
    13631370\end{cfacode}
    13641371This transformation provides @f@ with the address of the return variable so that it can be constructed into directly.
    1365 It is worth pointing out that this kind of signature rewriting already occurs in polymorphic functions which return by value, as discussed in \cite{Bilson03}.
     1372It is worth pointing out that this kind of signature rewriting already occurs in polymorphic functions that return by value, as discussed in \cite{Bilson03}.
    13661373A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, e.g.
    13671374\begin{cfacode}
    13681375struct A { int v; };
    1369 A x; // unmanaged
     1376A x; // unmanaged, since only trivial constructors are available
    13701377{
    13711378  void ?{}(A * a) { ... }
     
    13751382A z; // unmanaged
    13761383\end{cfacode}
    1377 Hence there is not enough information to determine at function declaration to determine whether a type is managed or not, and thus it is the case that all signatures have to be rewritten to account for possible copy constructor and destructor calls.
     1384Hence there is not enough information to determine at function declaration whether a type is managed or not, and thus it is the case that all signatures have to be rewritten to account for possible copy constructor and destructor calls.
    13781385Even with this change, it would still be possible to declare backwards compatible function prototypes with an @extern "C"@ block, which allows for the definition of C-compatible functions within \CFA code, however this would require actual changes to the way code inside of an @extern "C"@ function is generated as compared with normal code generation.
    1379 Furthermore, it isn't possible to overload C functions, so using @extern "C"@ to declare functions is of limited use.
    1380 
    1381 It would be possible to regain some control by adding an attribute to structs which specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type.
     1386Furthermore, it is not possible to overload C functions, so using @extern "C"@ to declare functions is of limited use.
     1387
     1388It would be possible to regain some control by adding an attribute to structs that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type.
    13821389Ideally, structs should be manageable by default, since otherwise the default case becomes more verbose.
    13831390This means that in general, function signatures would have to be rewritten, and in a select few cases the signatures would not be rewritten.
     
    14081415\section{Implementation}
    14091416\subsection{Array Initialization}
    1410 Arrays are a special case in the C type system.
     1417Arrays are a special case in the C type-system.
    14111418C arrays do not carry around their size, making it impossible to write a standalone \CFA function that constructs or destructs an array while maintaining the standard interface for constructors and destructors.
    14121419Instead, \CFA defines the initialization and destruction of an array recursively.
     
    15251532By default, objects within a translation unit are constructed in declaration order, and destructed in the reverse order.
    15261533The default order of construction of objects amongst translation units is unspecified.
    1527 % TODO: not yet implemented, but g++ provides attribute init_priority, which allows specifying the order of global construction on a per object basis
    1528 %   https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes
    1529 % suggestion: implement this in CFA by picking objects with a specified priority and pulling them into their own init functions (could even group them by priority level -> map<int, list<ObjectDecl*>>) and pull init_priority forward into constructor and destructor attributes with the same priority level
    15301534It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in the user program.
    15311535
    1532 This feature is implemented in the \CFA translator by grouping every global constructor call into a function with the GCC attribute \emph{constructor}, which performs most of the heavy lifting. % CITE: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
     1536This feature is implemented in the \CFA translator by grouping every global constructor call into a function with the GCC attribute \emph{constructor}, which performs most of the heavy lifting. % TODO: CITE: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
    15331537A similar function is generated with the \emph{destructor} attribute, which handles all global destructor calls.
    15341538At the time of writing, initialization routines in the library are specified with priority \emph{101}, which is the highest priority level that GCC allows, whereas initialization routines in the user's code are implicitly given the default priority level, which ensures they have a lower priority than any code with a specified priority level.
     
    15591563\end{cfacode}
    15601564
     1565%   https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes
     1566% suggestion: implement this in CFA by picking objects with a specified priority and pulling them into their own init functions (could even group them by priority level -> map<int, list<ObjectDecl*>>) and pull init_priority forward into constructor and destructor attributes with the same priority level
     1567GCC provides an attribute @init_priority@, which specifies allows specifying the relative priority for initialization of global objects on a per-object basis in \CC.
     1568A similar attribute can be implemented in \CFA by pulling marked objects into global constructor/destructor-attribute functions with the specified priority.
     1569For example,
     1570\begin{cfacode}
     1571struct A { ... };
     1572void ?{}(A *, int);
     1573void ^?{}(A *);
     1574__attribute__((init_priority(200))) A x = { 123 };
     1575\end{cfacode}
     1576would generate
     1577\begin{cfacode}
     1578A x;
     1579__attribute__((constructor(200))) __init_x() {
     1580  ?{}(&x, 123);  // construct x with priority 200
     1581}
     1582__attribute__((destructor(200))) __destroy_x() {
     1583  ?{}(&x);       // destruct x with priority 200
     1584}
     1585\end{cfacode}
     1586
    15611587\subsection{Static Local Variables}
    15621588In standard C, it is possible to mark variables that are local to a function with the @static@ storage class.
    15631589Unlike normal local variables, a @static@ local variable is defined to live for the entire duration of the program, so that each call to the function has access to the same variable with the same address and value as it had in the previous call to the function. % TODO: mention dynamic loading caveat??
    1564 Much like global variables, in C @static@ variables must be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it before the program begins running.
     1590Much like global variables, in C @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time.
    15651591
    15661592Yet again, this rule is too restrictive for a language with constructors and destructors.
     
    15731599Construction of @static@ local objects is implemented via an accompanying @static bool@ variable, which records whether the variable has already been constructed.
    15741600A conditional branch checks the value of the companion @bool@, and if the variable has not yet been constructed then the object is constructed.
    1575 The object's destructor is scheduled to be run when the program terminates using @atexit@, and the companion @bool@'s value is set so that subsequent invocations of the function will not reconstruct the object.
     1601The object's destructor is scheduled to be run when the program terminates using @atexit@, and the companion @bool@'s value is set so that subsequent invocations of the function do not reconstruct the object.
    15761602Since the parameter to @atexit@ is a parameter-less function, some additional tweaking is required.
    15771603First, the @static@ variable must be hoisted up to global scope and uniquely renamed to prevent name clashes with other global objects.
     
    16301656\end{cfacode}
    16311657
     1658% TODO: move this section forward?? maybe just after constructor syntax? would need to remove _tmp_cp_ret0, since copy constructors are not discussed yet, but this might not be a big issue.
    16321659\subsection{Constructor Expressions}
    16331660In \CFA, it is possible to use a constructor as an expression.
    16341661Like other operators, the function name @?{}@ matches its operator syntax.
    16351662For example, @(&x){}@ calls the default constructor on the variable @x@, and produces @&x@ as a result.
    1636 The significance of constructors as expressions rather than as statements is that the result of a constructor expression can be used as part of a larger expression.
    1637 A key example is the use of constructor expressions to initialize the result of a call to standard C routine @malloc@.
     1663A key example for this capability is the use of constructor expressions to initialize the result of a call to standard C routine @malloc@.
    16381664\begin{cfacode}
    16391665struct X { ... };
Note: See TracChangeset for help on using the changeset viewer.