Changeset 12d3187 for doc/rob_thesis

Ignore:
Timestamp:
May 1, 2017, 1:39:55 PM (6 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
2055098
Parents:
b3d70eba
Message:

final revisions to thesis

Location:
doc/rob_thesis
Files:
8 edited

Unmodified
Removed
• doc/rob_thesis/cfa-format.tex

• doc/rob_thesis/conclusions.tex

 rb3d70eba On the surface, the work may appear as a rehash of similar mechanisms in \CC. However, every added feature is different than its \CC counterpart, often with extended functionality, better integration with C and its programmers, and always supports separate compilation. All of these new features are being used by the \CFA development-team to build the \CFA runtime system. All of these new features are being used extensively by the \CFA development-team to build the \CFA runtime system. In particular, the concurrency system is built on top of RAII, library functions @new@ and @delete@ are used to manage dynamically allocated objects, and tuples are used to provide uniform interfaces to C library routines such as @div@ and @remquo@. \section{Constructors and Destructors} That is, structure fields can be accessed and modified by any block of code without restriction, so while it is possible to ensure that an object is initially set to a valid state, it is not possible to ensure that it remains in a consistent state throughout its lifetime. A popular technique for ensuring consistency in object-oriented programming languages is to provide access modifiers such as @private@, which provides compile-time checks that only privileged code accesses private data. This approach could be added to \CFA, but it requires an idiomatic way of specifying what code is privileged. This approach could be added to \CFA, but it requires an idiomatic way of specifying what code is privileged and what data is protected. One possibility is to tie access control into an eventual module system. \begin{sloppypar} The current implementation of implicit subobject-construction is currently an all-or-nothing check. That is, if a subobject is conditionally constructed, \eg within an if-statement, no implicit constructors for that object are added. \begin{cfacode} struct A { ... }; void ?{}(A * a) { ... } struct B { A a; }; void ?{}(B * b) { if (...) { (&b->a){};  // explicitly constructed } // does not construct in else case } \end{cfacode} This behaviour is unsafe and breaks the guarantee that constructors fully initialize objects. This situation should be properly handled, either by examining all paths and inserting implicit constructor calls only in the paths missing construction, or by emitting an error or warning. \end{sloppypar} \subsection{Tuples} This feature ties nicely into named tuples, as seen in D and Swift. Currently, tuple flattening and structuring conversions are 0-cost. Currently, tuple flattening and structuring conversions are 0-cost conversions in the resolution algorithm. This makes tuples conceptually very simple to work with, but easily causes unnecessary ambiguity in situations where the type system should be able to differentiate between alternatives. Adding an appropriate cost function to tuple conversions will allow tuples to interact with the rest of the programming language more cohesively.
• doc/rob_thesis/ctordtor.tex

 rb3d70eba % doesn't seem possible to do this without allowing ttype on generic structs? Since \CFA is a true systems language, it does not provide a garbage collector. As well, \CFA is not an object-oriented programming language, \ie, structures cannot have routine members. Since \CFA is a true systems language, it does not require a garbage collector. As well, \CFA is not an object-oriented programming language, \ie, structures cannot have methods. While structures can have function pointer members, this is different from methods, since methods have implicit access to structure members and methods cannot be reassigned. Nevertheless, one important goal is to reduce programming complexity and increase safety. To that end, \CFA provides support for implicit pre/post-execution of routines for objects, via constructors and destructors. The key difference between assignment and initialization being that assignment occurs on a live object (\ie, an object that contains data). It is important to note that this means @x@ could have been used uninitialized prior to being assigned, while @y@ could not be used uninitialized. Use of uninitialized variables yields undefined behaviour, which is a common source of errors in C programs. Use of uninitialized variables yields undefined behaviour \cite[p.~558]{C11}, which is a common source of errors in C programs. Initialization of a declaration is strictly optional, permitting uninitialized variables to exist. int x2 = opaque_get(x, 2); \end{cfacode} This pattern is cumbersome to use since every access becomes a function call. This pattern is cumbersome to use since every access becomes a function call, requiring awkward syntax and a performance cost. While useful in some situations, this compromise is too restrictive. Furthermore, even with this idiom it is easy to make mistakes, such as forgetting to destroy an object or destroying it multiple times. A constructor provides a way of ensuring that the necessary aspects of object initialization is performed, from setting up invariants to providing compile- and run-time checks for appropriate initialization parameters. This goal is achieved through a guarantee that a constructor is called implicitly after every object is allocated from a type with associated constructors, as part of an object's definition. Since a constructor is called on every object of a managed type, it is impossible to forget to initialize such objects, as long as all constructors perform some sensible form of initialization. This goal is achieved through a \emph{guarantee} that a constructor is called \emph{implicitly} after every object is allocated from a type with associated constructors, as part of an object's \emph{definition}. Since a constructor is called on every object of a managed type, it is \emph{impossible} to forget to initialize such objects, as long as all constructors perform some sensible form of initialization. In \CFA, a constructor is a function with the name @?{}@. In other words, a default constructor is a constructor that takes a single argument: the @this@ parameter. In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! and it takes only one argument. In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! \footnote{Originally, the name @~?{}@ was chosen for destructors, to provide familiarity to \CC programmers. Unforunately, this name causes parsing conflicts with the bitwise-not operator when used with operator syntax (see section \ref{sub:syntax}.)} and it takes only one argument. A destructor for the @Array@ type can be defined as: \begin{cfacode} On line 2, @z@ is initialized with the value of @x@, while on line 3, @y@ is assigned the value of @x@. The key distinction between initialization and assignment is that a value to be initialized does not hold any meaningful values, whereas an object to be assigned might. In particular, these cases cannot be handled the same way because in the former case @z@ does not currently own an array, while @y@ does. In particular, these cases cannot be handled the same way because in the former case @z@ has no array, while @y@ does. A \emph{copy constructor} is used to perform initialization using another object of the same type. \begin{cfacode}[emph={other}, emphstyle=\color{red}] } \end{cfacode} The two functions above handle these cases. The first function is called a \emph{copy constructor}, because it constructs its argument by copying the values from another object of the same type. The two functions above handle the cases of initialization and assignment. The first function is called a copy constructor, because it constructs its argument by copying the values from another object of the same type. The second function is the standard copy-assignment operator. \CFA does not currently have the concept of reference types, so the most appropriate type for the source object in copy constructors and assignment operators is a value type. Appropriate care is taken in the implementation to avoid recursive calls to the copy constructor. The four functions (default constructor, destructor, copy constructor, and assignment operator) are special in that they safely control the state of most objects. A * y = malloc();  // copy construct: ?{}(&y, malloc()) ^?{}(&x);   // explicit destroy x, in different order ?{}(&x);    // explicit construct x, second construction ^?{}(y);    // explicit destroy y ?{}(y, x);  // explit construct y from x, second construction ^?{}(&x);   // explicit destroy x, in different order ^?{}(y);    // explicit destroy y // implicit ^?{}(&y); \end{cfacode} In this example, @malloc@ dynamically allocates storage and initializes it using a constructor, all before assigning it into the variable @x@. Intuitively, the expression-resolver determines that @malloc@ returns some type @T *@, as does the constructor expression since it returns the type of its argument. This type flows outwards to the declaration site where the expected type is known to be @X *@, thus the first argument to the constructor must be @X *@, narrowing the search space. If this extension is not present, constructing dynamically allocated objects is much more cumbersome, requiring separate initialization of the pointer and initialization of the pointed-to memory. \begin{cfacode} It should be noted that this technique is not exclusive to @malloc@, and allows a user to write a custom allocator that can be idiomatically used in much the same way as a constructed @malloc@ call. It should be noted that while it is possible to use operator syntax with destructors, destructors invalidate their argument, thus operator syntax with destructors is a statement and does not produce a value. While it is possible to use operator syntax with destructors, destructors invalidate their argument, thus operator syntax with destructors is void-typed expression. \subsection{Function Generation} If the translator can expect these functions to exist, then it can unconditionally attempt to resolve them. Moreover, the existence of a standard interface allows polymorphic code to interoperate with new types seamlessly. While automatic generation of assignment functions is present in previous versions of \CFA, the the implementation has been largely rewritten to accomodate constructors and destructors. To mimic the behaviour of standard C, the default constructor and destructor for all of the basic types and for all pointer types are defined to do nothing, while the copy constructor and assignment operator perform a bitwise copy of the source parameter (as in \CC). This default is intended to maintain backwards compatibility and performance, by not imposing unexpected operations for a C programmer, as a zero-default behaviour would. However, it is possible for a user to define such constructors so that variables are safely zeroed by default, if desired. %%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%% \begin{cfacode} void ?{}(int * i) { *i = 0; } forall(dtype T) void ?{}(T ** p) { *p = 0; }  // any pointer type void f() { int x;    // initialized to 0 int * p;  // initialized to 0 } \end{cfacode} %%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%% There are several options for user-defined types: structures, unions, and enumerations. To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed. By auto-generating these functions, it is ensured that legacy C code continues to work correctly in every context where \CFA expects these functions to exist, since they are generated for every complete type. As well, these functions are always generated, since they may be needed by polymorphic functions. With that said, the generated functions are not called implicitly unless they are non-trivial, and are never exported, making it simple for the optimizer to strip them away when they are not used. The generated functions for enumerations are the simplest. } \end{cfacode} In the future, \CFA will introduce strongly-typed enumerations, like those in \CC. In the future, \CFA will introduce strongly-typed enumerations, like those in \CC, wherein enumerations create a new type distinct from @int@ so that integral values require an explicit cast to be stored in an enumeration variable. The existing generated routines are sufficient to express this restriction, since they are currently set up to take in values of that enumeration type. Changes related to this feature only need to affect the expression resolution phase, where more strict rules will be applied to prevent implicit conversions from integral types to enumeration types, but should continue to permit conversions from enumeration types to @int@. In addition to freedom, \ateq provides a simple path for migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually. It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants are established on an unmanaged object. It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary. It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary, such as memory-mapped devices, trigger devices, I/O controllers, etc. When a user declares any constructor or destructor, the corresponding intrinsic/generated function and all field constructors for that type are hidden, so that they are not found during expression resolution until the user-defined function goes out of scope. \end{cfacode} However, if the translator sees a sub-object used within the body of a constructor, but does not see a constructor call that uses the sub-object as the target of a constructor, then the translator assumes the object is to be implicitly constructed (copy constructed in a copy constructor and default constructed in any other constructor). To override this rule, \ateq can be used to force the translator to trust the programmer's discretion. This form of \ateq is not yet implemented. \begin{cfacode} void ?{}(A * a) { } void ?{}(A * a, int x) { // object forwarded to another constructor, // does not implicitly construct any members (&a){}; } void ^?{}(A * a) { ^(&a->x){}; // explicit destructor call } // z, y, w implicitly destructed, in this order \end{cfacode} If at any point, the @this@ parameter is passed directly as the target of another constructor, then it is assumed that constructor handles the initialization of all of the object's members and no implicit constructor calls are added. To override this rule, \ateq can be used to force the translator to trust the programmer's discretion. This form of \ateq is not yet implemented. If at any point, the @this@ parameter is passed directly as the target of another constructor, then it is assumed the other constructor handles the initialization of all of the object's members and no implicit constructor calls are added to the current constructor. Despite great effort, some forms of C syntax do not work well with constructors in \CFA. The body of @A@ has been omitted, since only the constructor interfaces are important. It should be noted that unmanaged objects can still make use of designations and nested initializers in \CFA. It should be noted that unmanaged objects, i.e. objects that have only trivial constructors, can still make use of designations and nested initializers in \CFA. It is simple to overcome this limitation for managed objects by making use of compound literals, so that the arguments to the constructor call are explicitly typed. %%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%% \begin{cfacode} struct B { int x; }; struct C { int y; }; struct A { B b; C c; }; void ?{}(A *, B); void ?{}(A *, C); A a = { (C){ 10 } // disambiguate with compound literal }; \end{cfacode} %%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Implicit Destructors} \end{cfacode} While \CFA supports the GCC computed-goto extension, the behaviour of managed objects in combination with computed-goto is undefined. \begin{cfacode} void f(int val) { void * l = val == 0 ? &&L1 : &&L2; { A x; L1: ; goto *l;  // branches differently depending on argument } L2: ; } \end{cfacode} Likewise, destructors are not executed at scope-exit due to a computed-goto in \CC, as of g++ version 6.2. \subsection{Implicit Copy Construction} \label{s:implicit_copy_construction} Exempt from these rules are intrinsic and built-in functions. It should be noted that unmanaged objects are subject to copy constructor calls when passed as arguments to a function or when returned from a function, since they are not the \emph{target} of the copy constructor call. That is, since the parameter is not marked as an unmanaged object using \ateq, it is be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments. That is, since the parameter is not marked as an unmanaged object using \ateq, it is copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments. These semantics are important to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed. \begin{cfacode} struct A; struct A { ... }; void ?{}(A *); void ?{}(A *, A); It should be noted that reference types will allow specifying that a value does not need to be copied, however reference types do not provide a means of preventing implicit copy construction from uses of the reference, so the problem is still present when passing or returning the reference by value. Adding implicit copy construction imposes the additional runtime cost of the copy constructor for every argument and return value in a function call. This cost is necessary to maintain appropriate value semantics when calling a function. In the future, return-value-optimization (RVO) can be implemented for \CFA to elide unnecessary copy construction and destruction of temporary objects. This cost is not present for types with trivial copy constructors and destructors. A known issue with this implementation is that the argument and return value temporaries are not guaranteed to have the same address for their entire lifetimes. In the previous example, since @_retval_f@ is allocated and constructed in @f@, then returned by value, the internal data is bitwise copied into the caller's stack frame. \subsection{Array Initialization} Arrays are a special case in the C type-system. C arrays do not carry around their size, making it impossible to write a standalone \CFA function that constructs or destructs an array while maintaining the standard interface for constructors and destructors. Type checking largely ignores size information for C arrays, making it impossible to write a standalone \CFA function that constructs or destructs an array, while maintaining the standard interface for constructors and destructors. Instead, \CFA defines the initialization and destruction of an array recursively. That is, when an array is defined, each of its elements is constructed in order from element 0 up to element $n-1$. \end{cfacode} This implementation comes at the runtime cost of an additional branch for every @static@ local variable, each time the function is called. Since initializers are not required to be compile-time constant expressions, they can involve global variables, function arguments, function calls, etc. As a direct consequence, @static@ local variables cannot be initialized with an attribute-constructor routines like global variables can. However, in the case where the variable is unmanaged and has a compile-time constant initializer, a C-compliant initializer is generated and the additional cost is not present. \CC shares the same semantics for its @static@ local variables. \subsection{Polymorphism} As mentioned in section \ref{sub:polymorphism}, \CFA currently has 3 type-classes that are used to designate polymorphic data types: @otype@, @dtype@, and @ftype@. These additions allow @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules. A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types and prevent potential reuse. These additional assertion parameters impose a runtime cost on all managed temporary objects created in polymorphic code, even those with trivial constructors and destructors. This cost is necessary because polymorphic code does not know the actual type at compile-time, due to separate compilation. Since trivial constructors and destructors either do not perform operations or are simply bit-wise copy operations, the imposed cost is essentially the cost of the function calls. \section{Summary} When creating a new object of a managed type, it is guaranteed that a constructor is be called to initialize the object at its definition point, and is destructed when the object's lifetime ends. Destructors are called in the reverse order of construction. Every argument passed to a function is copy constructed into a temporary object that is passed by value to the functions and destructed at the end of the statement. Function return values are copy constructed inside the function at the return statement, passed by value to the call-site, and destructed at the call-site at the end of the statement. Every complete object type has a default constructor, copy constructor, assignment operator, and destructor. To accomplish this, these functions are generated as appropriate for new types. User-defined functions shadow built-in and automatically generated functions, so it is possible to specialize the behaviour of a type. Furthermore, default constructors and aggregate field constructors are hidden when \emph{any} constructor is defined. Objects dynamically allocated with @malloc@, \ateq objects, and objects with only trivial constructors and destructors are unmanaged. Unmanaged objects are never the target of an implicit constructor or destructor call.
• doc/rob_thesis/intro.tex

 rb3d70eba \end{cfacode} \begin{sloppypar} In addition to variables of tuple type, it is also possible to have pointers to tuples, and arrays of tuples. Tuple types can be composed of any types, except for array types, since arrays do not carry their size around, which makes tuple assignment difficult when a tuple contains an array. Tuple types can be composed of any types, except for array types, since array assignment is disallowed, which makes tuple assignment difficult when a tuple contains an array. \begin{cfacode} [double, int] di; \end{cfacode} This examples declares a variable of type @[double, int]@, a variable of type pointer to @[double, int]@, and an array of ten @[double, int]@. \end{sloppypar} \subsection{Tuple Indexing} The flexible structure of tuples permits a simple and expressive function-call syntax to work seamlessly with both single- and multiple-return-value functions, and with any number of arguments of arbitrarily complex structure. In \KWC \cite{Buhr94a,Till89}, a precursor to \CFA, there were 4 tuple coercions: opening, closing, flattening, and structuring. In \KWC \cite{Buhr94a,Till89}, there were 4 tuple coercions: opening, closing, flattening, and structuring. Opening coerces a tuple value into a tuple of values, while closing converts a tuple of values into a single tuple value. Flattening coerces a nested tuple into a flat tuple, \ie it takes a tuple with tuple components and expands it into a tuple with only non-tuple components. In \CFA, the design has been simplified to require only the two conversions previously described, which trigger only in function call and return situations. This simplification is a primary contribution of this thesis to the design of tuples in \CFA. Specifically, the expression resolution algorithm examines all of the possible alternatives for an expression to determine the best match. In resolving a function call expression, each combination of function value and list of argument alternatives is examined. Let $L_i$ for $i$ in $[0, n)$ represent each component of the flattened left side, $R_i$ represent each component of the flattened right side of a multiple assignment, and $R$ represent the right side of a mass assignment. For a multiple assignment to be valid, both tuples must have the same number of elements when flattened. Multiple assignment assigns $R_i$ to $L_i$ for each $i$. For a multiple assignment to be valid, both tuples must have the same number of elements when flattened. For example, the following is invalid because the number of components on the left does not match the number of components on the right. \begin{cfacode} [int, int] x, y, z; [x, y] = z;   // multiple assignment, invalid 4 != 2 \end{cfacode} Multiple assignment assigns $R_i$ to $L_i$ for each $i$. That is, @?=?(&$L_i$, $R_i$)@ must be a well-typed expression. In the previous example, @[x, y] = z@, @z@ is flattened into @z.0, z.1@, and the assignments @x = z.0@ and @y = z.1@ happen. Both kinds of tuple assignment have parallel semantics, such that each value on the left side and right side is evaluated \emph{before} any assignments occur. As a result, it is possible to swap the values in two variables without explicitly creating any temporary variables or calling a function, As a result, it is possible to swap the values in two variables without explicitly creating any temporary variables or calling a function. \begin{cfacode} int x = 10, y = 20; \subsection{Tuple Construction} Tuple construction and destruction follow the same rules and semantics as tuple assignment, except that in the case where there is no right side, the default constructor or destructor is called on each component of the tuple. As constructors and destructors did not exist in previous versions of \CFA or in \KWC, this is a primary contribution of this thesis to the design of tuples. \begin{cfacode} struct S; \section{Casting} In C, the cast operator is used to explicitly convert between types. In \CFA, the cast operator has a secondary use, which is type ascription, since it force the expression resolution algorithm to choose the lowest cost conversion to the target type. In \CFA, the cast operator has a secondary use, which is type ascription, since it forces the expression resolution algorithm to choose the lowest cost conversion to the target type. That is, a cast can be used to select the type of an expression when it is ambiguous, as in the call to an overloaded function. \begin{cfacode} \section{Polymorphism} Due to the implicit flattening and structuring conversions involved in argument passing, @otype@ and @dtype@ parameters are restricted to matching only with non-tuple types. The integration of polymorphism, type assertions, and monomorphic specialization of tuple-assertions are a primary contribution of this thesis to the design of tuples. \begin{cfacode} forall(otype T, dtype U) It is also important to note that these calls could be disambiguated if the function return types were different, as they likely would be for a reasonable implementation of @?+?@, since the return type is used in overload resolution. Still, these semantics are a deficiency of the current argument matching algorithm, and depending on the function, differing return values may not always be appropriate. These issues could be rectified by applying an appropriate cost to the structuring and flattening conversions, which are currently 0-cost conversions. These issues could be rectified by applying an appropriate conversion cost to the structuring and flattening conversions, which are currently 0-cost conversions in the expression resolver. Care would be needed in this case to ensure that exact matches do not incur such a cost. \begin{cfacode} \section{Implementation} Tuples are implemented in the \CFA translator via a transformation into generic types. Generic types are an independent contribution developed at the same time. The transformation into generic types and the generation of tuple-specific code are primary contributions of this thesis to tuples. The first time an $N$-tuple is seen for each $N$ in a scope, a generic type with $N$ type parameters is generated. For example,