Changeset 0eb18557 for doc/rob_thesis

Ignore:
Timestamp:
Apr 12, 2017, 3:54:28 PM (5 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
e869e434
Parents:
eaa2f3a1
Message:

thesis updates based on Peter's feedback

Location:
doc/rob_thesis
Files:
6 edited

Unmodified
Removed
• doc/rob_thesis/cfa-format.tex

 reaa2f3a1 style=defaultStyle } \lstMakeShortInline[basewidth=0.5em,breaklines=true]@  % single-character for \lstinline \lstMakeShortInline[basewidth=0.5em,breaklines=true,basicstyle=\normalsize\ttfamily\color{basicCol}]@  % single-character for \lstinline \lstnewenvironment{cfacode}[1][]{
• doc/rob_thesis/conclusions.tex

 reaa2f3a1 A caveat of this approach is that the @cleanup@ attribute only permits a name that refers to a function that consumes a single argument of type @T *@ for a variable of type @T@. This means that any destructor that consumes multiple arguments (e.g., because it is polymorphic) or any destructor that is a function pointer (e.g., because it is an assertion parameter) must be called through a local thunk. This means that any destructor that consumes multiple arguments (\eg, because it is polymorphic) or any destructor that is a function pointer (\eg, because it is an assertion parameter) must be called through a local thunk. For example, \begin{cfacode}
• doc/rob_thesis/ctordtor.tex

 reaa2f3a1 Since \CFA is a true systems language, it does not provide a garbage collector. As well, \CFA is not an object-oriented programming language, i.e., structures cannot have routine members. As well, \CFA is not an object-oriented programming language, \ie, structures cannot have routine members. Nevertheless, one important goal is to reduce programming complexity and increase safety. To that end, \CFA provides support for implicit pre/post-execution of routines for objects, via constructors and destructors. Next, @x@ is assigned the value of @y@. In the last line, @z@ is implicitly initialized to 0 since it is marked @static@. The key difference between assignment and initialization being that assignment occurs on a live object (i.e., an object that contains data). The key difference between assignment and initialization being that assignment occurs on a live object (\ie, an object that contains data). It is important to note that this means @x@ could have been used uninitialized prior to being assigned, while @y@ could not be used uninitialized. Use of uninitialized variables yields undefined behaviour, which is a common source of errors in C programs. In \CFA, a constructor is a function with the name @?{}@. Like other operators in \CFA, the name represents the syntax used to call the constructor, e.g., @struct S = { ... };@. Like other operators in \CFA, the name represents the syntax used to call the constructor, \eg, @struct S = { ... };@. Every constructor must have a return type of @void@ and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming-languages (however, a programmer can give it an arbitrary name). The @this@ parameter must have a pointer type, whose base type is the type of object that the function constructs. In other words, a default constructor is a constructor that takes a single argument: the @this@ parameter. In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! and it take only one argument. A destructor for the @Array@ type can be defined as such. In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! and it takes only one argument. A destructor for the @Array@ type can be defined as: \begin{cfacode} void ^?{}(Array * arr) { } \end{cfacode} In \CFA, constructors are called implicitly in initialization contexts. \begin{cfacode} Array x, y = { 20, 0xdeadbeef }, z = y; \end{cfacode} In \CFA, constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with. One downside of reusing C initialization syntax is that it isn't possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point. Constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with. One downside of reusing C initialization syntax is that it is not possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point in the program. This example generates the following code \end{cfacode} Finally, constructors and destructors support \emph{operator syntax}. Like other operators in \CFA, the function name mirrors the use-case, in that the first $N$ arguments fill in the place of the question mark. Like other operators in \CFA, the function name mirrors the use-case, in that the question marks are placeholders for the first $N$ arguments. This syntactic form is similar to the new initialization syntax in \CCeleven, except that it is used in expression contexts, rather than declaration contexts. \begin{cfacode} Like other operators, the function name @?{}@ matches its operator syntax. For example, @(&x){}@ calls the default constructor on the variable @x@, and produces @&x@ as a result. A key example for this capability is the use of constructor expressions to initialize the result of a call to standard C routine @malloc@. A key example for this capability is the use of constructor expressions to initialize the result of a call to @malloc@. \begin{cfacode} struct X { ... }; void ?{}(X *, double); X * x = malloc(sizeof(X)){ 1.5 }; X * x = malloc(){ 1.5 }; \end{cfacode} In this example, @malloc@ dynamically allocates storage and initializes it using a constructor, all before assigning it into the variable @x@. If this extension is not present, constructing dynamically allocated objects is much more cumbersome, requiring separate initialization of the pointer and initialization of the pointed-to memory. \begin{cfacode} X * x = malloc(sizeof(X)); X * x = malloc(); x{ 1.5 }; \end{cfacode} struct X *_tmp_ctor; struct X *x = ?{}(  // construct result of malloc _tmp_ctor=malloc(sizeof(struct X)), // store result of malloc _tmp_ctor=malloc_T(sizeof(struct X), _Alignof(struct X)), // store result of malloc 1.5 ), _tmp_ctor; // produce constructed result of malloc It should be noted that this technique is not exclusive to @malloc@, and allows a user to write a custom allocator that can be idiomatically used in much the same way as a constructed @malloc@ call. It is also possible to use operator syntax with destructors. Unlike constructors, operator syntax with destructors is a statement and thus does not produce a value, since the destructed object is invalidated by the use of a destructor. For example, \lstinline!^(&x){}! calls the destructor on the variable @x@. It should be noted that while it is possible to use operator syntax with destructors, destructors invalidate their argument, thus operator syntax with destructors is a statement and does not produce a value. \subsection{Function Generation} The field constructors are constructors that consume a prefix of the structure's member-list. That is, $N$ constructors are built of the form @void ?{}(S *, T$_{\text{M}_0}$)@, @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)@, ..., @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)@, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise. The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (i.e., to initialize any prefix of the structure), e.g., @A a0 = { b }, a1 = { b, c }@. The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (\ie, to initialize any prefix of the structure), \eg, @A a0 = { b }, a1 = { b, c }@. Extending the previous example, the following constructors are implicitly generated for @A@. \begin{cfacode} \subsection{Using Constructors and Destructors} Implicitly generated constructor and destructor calls ignore the outermost type qualifiers, e.g. @const@ and @volatile@, on a type by way of a cast on the first argument to the function. Implicitly generated constructor and destructor calls ignore the outermost type qualifiers, \eg @const@ and @volatile@, on a type by way of a cast on the first argument to the function. For example, \begin{cfacode} Here, @&s@ and @&s2@ are cast to unqualified pointer types. This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects. This applies only to implicitly generated constructor calls. This rule applies only to implicitly generated constructor calls. Hence, explicitly re-initializing qualified objects with a constructor requires an explicit cast. Instead, @a2->x@ is initialized to @0@ as if it were a C object, because of the explicit initializer. In addition to freedom, \ateq provides a simple path to migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually. In addition to freedom, \ateq provides a simple path for migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually. It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants are established on an unmanaged object. It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary. { void ?{}(S * s, int i) { s->x = i*2; } // locally hide autogen constructors S s4;  // error S s5 = { 3 };  // okay S s6 = { 4, 5 };  // error S s4;  // error, no default constructor S s5 = { 3 };  // okay, local constructor S s6 = { 4, 5 };  // error, no field constructor S s7 = s5; // okay } In this example, the inner scope declares a constructor from @int@ to @S@, which hides the default constructor and field constructors until the end of the scope. When defining a constructor or destructor for a struct @S@, any members that are not explicitly constructed or destructed are implicitly constructed or destructed automatically. When defining a constructor or destructor for a structure @S@, any members that are not explicitly constructed or destructed are implicitly constructed or destructed automatically. If an explicit call is present, then that call is taken in preference to any implicitly generated call. A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of sub-objects on a per-constructor basis, whereas in \CC sub-object initialization and destruction is always performed based on the declaration order. In practice, however, there could be many objects that can be constructed from a given @int@ (or, indeed, any arbitrary parameter list), and thus a complete solution to this problem would require fully exploring all possibilities. More precisely, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one. More precisely, constructor calls cannot have a nesting depth greater than the number of array dimensions in the type of the initialized object, plus one. For example, \begin{cfacode} { {14 }, { 15 } }   // a2[1] }; A a3[4] = { { { 11 }, { 12 } },  // error A a3[4] = { // 1 dimension => max depth 2 { { 11 }, { 12 } },  // error, three levels deep { 80 }, { 90 }, { 100 } } \label{sub:implicit_dtor} Destructors are automatically called at the end of the block in which the object is declared. In addition to this, destructors are automatically called when statements manipulate control flow to leave a block in which the object is declared, e.g., with return, break, continue, and goto statements. In addition to this, destructors are automatically called when statements manipulate control flow to leave a block in which the object is declared, \eg, with return, break, continue, and goto statements. The example below demonstrates a simple routine with multiple return statements. \begin{cfacode} Exempt from these rules are intrinsic and built-in functions. It should be noted that unmanaged objects are subject to copy constructor calls when passed as arguments to a function or when returned from a function, since they are not the \emph{target} of the copy constructor call. That is, since the parameter is not marked as an unmanaged object using \ateq, it will be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments. This is an important detail to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed. That is, since the parameter is not marked as an unmanaged object using \ateq, it is be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments. These semantics are important to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed. \begin{cfacode} struct A; identity(z);  // copy construct z into x \end{cfacode} Note that @z@ is copy constructed into a temporary variable to be passed as an argument, which is also destructed after the call. Note that unmanaged argument @z@ is logically copy constructed into managed parameter @x@; however, the translator must copy construct into a temporary variable to be passed as an argument, which is also destructed after the call. A compiler could by-pass the argument temporaries since it is in control of the calling conventions and knows exactly where the called-function's parameters live. This generates the following This transformation provides @f@ with the address of the return variable so that it can be constructed into directly. It is worth pointing out that this kind of signature rewriting already occurs in polymorphic functions that return by value, as discussed in \cite{Bilson03}. A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, e.g. A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, \eg \begin{cfacode} struct A { int v; }; Furthermore, it is not possible to overload C functions, so using @extern "C"@ to declare functions is of limited use. It would be possible to regain some control by adding an attribute to structs that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type. Ideally, structs should be manageable by default, since otherwise the default case becomes more verbose. It would be possible to regain some control by adding an attribute to structures that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type. Ideally, structures should be manageable by default, since otherwise the default case becomes more verbose. This means that in general, function signatures would have to be rewritten, and in a select few cases the signatures would not be rewritten. \begin{cfacode} C h();  // rewritten void h(C *); \end{cfacode} An alternative is to instead make the attribute \emph{identifiable}, which states that objects of this type use the @this@ parameter as an identity. An alternative is to make the attribute \emph{identifiable}, which states that objects of this type use the @this@ parameter as an identity. This strikes more closely to the visible problem, in that only types marked as identifiable would need to have the return value moved into the parameter list, and every other type could remain the same. Furthermore, no restrictions would need to be placed on whether objects can be constructed. \subsection{Global Initialization} In standard C, global variables can only be initialized to compile-time constant expressions. This places strict limitations on the programmer's ability to control the default values of objects. In standard C, global variables can only be initialized to compile-time constant expressions, which places strict limitations on the programmer's ability to control the default values of objects. In \CFA, constructors and destructors are guaranteed to be run on global objects, allowing arbitrary code to be run before and after the execution of the main routine. By default, objects within a translation unit are constructed in declaration order, and destructed in the reverse order. The default order of construction of objects amongst translation units is unspecified. It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in the user program. It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in a user program. This feature is implemented in the \CFA translator by grouping every global constructor call into a function with the GCC attribute \emph{constructor}, which performs most of the heavy lifting \cite[6.31.1]{GCCExtensions}. %   https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes % suggestion: implement this in CFA by picking objects with a specified priority and pulling them into their own init functions (could even group them by priority level -> map>) and pull init_priority forward into constructor and destructor attributes with the same priority level GCC provides an attribute @init_priority@, which allows specifying the relative priority for initialization of global objects on a per-object basis in \CC. GCC provides an attribute @init_priority@ in \CC, which allows specifying the relative priority for initialization of global objects on a per-object basis. A similar attribute can be implemented in \CFA by pulling marked objects into global constructor/destructor-attribute functions with the specified priority. For example, In standard C, it is possible to mark variables that are local to a function with the @static@ storage class. Unlike normal local variables, a @static@ local variable is defined to live for the entire duration of the program, so that each call to the function has access to the same variable with the same address and value as it had in the previous call to the function. Much like global variables, in C @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time. Much like global variables, @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time. Yet again, this rule is too restrictive for a language with constructors and destructors. Instead, \CFA modifies the definition of a @static@ local variable so that objects are guaranteed to be live from the time control flow reaches their declaration, until the end of the program, since the initializer expression is not necessarily a compile-time constant, but can depend on the current execution state of the function. Since standard C does not allow access to a @static@ local variable before the first time control flow reaches the declaration, this restriction does not preclude any valid C code. Since the initializer expression is not necessarily a compile-time constant and can depend on the current execution state of the function, \CFA modifies the definition of a @static@ local variable so that objects are guaranteed to be live from the time control flow reaches their declaration, until the end of the program. Since standard C does not allow access to a @static@ local variable before the first time control flow reaches the declaration, this change does not preclude any valid C code. Local objects with @static@ storage class are only implicitly constructed and destructed once for the duration of the program. The object is constructed when its declaration is reached for the first time. Since the parameter to @atexit@ is a parameter-less function, some additional tweaking is required. First, the @static@ variable must be hoisted up to global scope and uniquely renamed to prevent name clashes with other global objects. Second, a function is built which calls the destructor for the newly hoisted variable. If necessary, a local structure may need to be hoisted, as well. Second, a function is built that calls the destructor for the newly hoisted variable. Finally, the newly generated function is registered with @atexit@, instead of registering the destructor directly. Since @atexit@ calls functions in the reverse order in which they are registered, @static@ local variables are guaranteed to be destructed in the reverse order that they are constructed, which may differ between multiple executions of the same program. void f(T); \end{cfacode} This allows easily specifying constraints that are common to all complete object types very simply. Now that \CFA has constructors and destructors, more of a complete object's behaviour can be specified by than was previously possible. This allows easily specifying constraints that are common to all complete object-types very simply. Now that \CFA has constructors and destructors, more of a complete object's behaviour can be specified than was previously possible. As such, @otype@ has been augmented to include assertions for a default constructor, copy constructor, and destructor. That is, the previous example is now equivalent to \begin{cfacode} forall(dtype T | sized(T) | { T ?=?(T *, T); void ?{}(T *); void ?{}(T *, T); void ^?{}(T *); }) forall(dtype T | sized(T) | { T ?=?(T *, T); void ?{}(T *); void ?{}(T *, T); void ^?{}(T *); }) void f(T); \end{cfacode} This allows @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules. A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types. These additions allow @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules. A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types and prevent potential reuse.
• doc/rob_thesis/intro.tex

 reaa2f3a1 Therefore, these design principles must be kept in mind throughout the design and development of new language features. In order to appeal to existing C programmers, great care must be taken to ensure that new features naturally feel like C. These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated. The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail. \end{cfacode} Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}. Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and is never an lvalue. Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and coercions and is never an lvalue. \subsection{Overloading} Overloading is the ability to specify multiple entities with the same name. The most common form of overloading is function overloading, wherein multiple functions can be defined with the same name, but with different signatures. Like in \CC, \CFA allows overloading based both on the number of parameters and on the types of parameters. C provides a small amount of built-in overloading, \eg + is overloaded for the basic types. Like in \CC, \CFA allows user-defined overloading based both on the number of parameters and on the types of parameters. \begin{cfacode} void f(void);  // (1) There are times when a function should logically return multiple values. Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure to package multiple return-values. For example, the first approach: \begin{cfacode} int f(int * ret) {        // returns a value through parameter ret int res1 = g(&res2);      // explicitly pass storage \end{cfacode} The former solution is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all. The latter approach: is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all. The second approach: \begin{cfacode} struct A { ... res3.x ... res3.y ... // use result values \end{cfacode} requires the caller to either learn the field names of the structure or learn the names of helper routines to access the individual return values. Both solutions are syntactically unnatural. is awkward because the caller has to either learn the field names of the structure or learn the names of helper routines to access the individual return values. Both approaches are syntactically unnatural. In \CFA, it is possible to directly declare a function returning multiple values. \begin{cfacode} struct A { int i; }; int ?+?(A x, A y); int ?+?(A x, A y);    // '?'s represent operands bool ?}). Of notable exception are the logical operators (\eg @||@), the sequence operator (\ie @,@), and the member-access operators (\eg @.@ and \lstinline{->}). Finally, \CFA also permits overloading variable identifiers. template T sum(T *arr, int n) { T t; T t;  // default construct => 0 for (; n > 0; n--) t += arr[n-1]; return t; \end{cfacode} The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@. In particular, the assertions above specify that there must be a an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@. In particular, the assertions above specify that there must be an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@. The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class. In addition to @otype@, there are currently two other type-classes. A major difference between the approaches of \CC and \CFA to polymorphism is that the set of assumed properties for a type is \emph{explicit} in \CFA. One of the major limiting factors of \CC's approach is that templates cannot be separately compiled. In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled. In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled, as the function prototype states all necessary requirements separate from the implementation. For example, the prototype for the previous sum function is \begin{cfacode} forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**) T sum(T *arr, int n); \end{cfacode} With this prototype, a caller in another translation unit knows all of the constraints on @T@, and thus knows all of the operations that need to be made available to @sum@. In \CFA, a set of assertions can be factored into a \emph{trait}. This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration. An interesting application of return-type resolution and polymorphism is with type-safe @malloc@. An interesting application of return-type resolution and polymorphism is a type-safe version of @malloc@. \begin{cfacode} forall(dtype T | sized(T)) In object-oriented programming languages, type invariants are typically established in a constructor and maintained throughout the object's lifetime. These assertions are typically achieved through a combination of access control modifiers and a restricted interface. These assertions are typically achieved through a combination of access-control modifiers and a restricted interface. Typically, data which requires the maintenance of an invariant is hidden from external sources using the \emph{private} modifier, which restricts reads and writes to a select set of trusted routines, including member functions. It is these trusted routines that perform all modifications to internal data in a way that is consistent with the invariant, by ensuring that the invariant holds true at the end of the routine call. In other languages, a hybrid situation exists where resources escape the allocation block, but ownership is precisely controlled by the language. This pattern requires a strict interface and protocol for a data structure, consisting of a pre-initialization and a post-termination call, and all intervening access is done via interface routines. This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource management cases. This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource-management cases. For example, \CC directly supports this pattern through class types and an idiom known as RAII \footnote{Resource Acquisition is Initialization} by means of constructors and destructors. In the context of \CFA, a non-trivial constructor is either a user defined constructor or an auto-generated constructor that calls a non-trivial constructor. For the remaining resource ownership cases, programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol implemented via the programming language. For the remaining resource ownership cases, a programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol enforced by the programming language. In garbage collected languages, such as Java, resources are largely managed by the garbage collector. Still, garbage collectors are typically focus only on memory management. Still, garbage collectors typically focus only on memory management. There are many kinds of resources that the garbage collector does not understand, such as sockets, open files, and database connections. In particular, Java supports \emph{finalizers}, which are similar to destructors. Sadly, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious. Unfortunately, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious. Due to operating-system resource-limits, this is unacceptable for many long running programs. Instead, the paradigm in Java requires programmers to manually keep track of all resources \emph{except} memory, leading many novices and experts alike to forget to close files, etc. \end{javacode} Variables declared as part of a try-with-resources statement must conform to the @AutoClosable@ interface, and the compiler implicitly calls @close@ on each of the variables at the end of the block. Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is appropriately guarded and conditionally executed to prevent null-pointer exceptions. Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is automatically guarded and conditionally executed to prevent null-pointer exceptions. While Rust \cite{Rust} does not enforce the use of a garbage collector, it does provide a manual memory management environment, with a strict ownership model that automatically frees allocated memory and prevents common memory management errors. There is no runtime cost imposed on these restrictions, since they are enforced at compile-time. Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, allowing Rust programs to automatically clean up auxiliary resources much like a \CC program. Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, providing automatic clean up of auxiliary resources, much like a \CC program. \begin{rustcode} struct S { impl Drop for S {  // RAII for S fn drop(&mut self) { fn drop(&mut self) {  // destructor println!("dropped {}", self.name); } tuple triple(10, 20, 30); auto & [t1, t2, t3] = triple; t2 = 0; // changes triple t2 = 0; // changes middle element of triple struct S { int x; double y; }; auto [x, y] = s; // unpack s \end{cppcode} Structured bindings allow unpacking any struct with all public non-static data members into fresh local variables. Structured bindings allow unpacking any structure with all public non-static data members into fresh local variables. The use of @&@ allows declaring new variables as references, which is something that cannot be done with @std::tie@, since \CC references do not support rebinding. This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism. Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables. Like \CC, D provides tuples through a library variadic template struct. Like \CC, D provides tuples through a library variadic-template structure. In D, it is possible to name the fields of a tuple type, which creates a distinct type. % http://dlang.org/phobos/std_typecons.html \end{smlcode} Here, the function @binco@ appears to take 2 arguments, but it actually takes a single argument which is implicitly decomposed via pattern matching. Tuples are a foundational tool in SML, allowing the creation of arbitrarily complex structured data types. Tuples are a foundational tool in SML, allowing the creation of arbitrarily-complex structured data-types. Scala, like \CC, provides tuple types through the standard library \cite{Scala}. Since the variadic arguments are untyped, it is up to the function to interpret any data that is passed in. Additionally, the interface to manipulate @va_list@ objects is essentially limited to advancing to the next argument, without any built-in facility to determine when the last argument is read. This requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types. This limitation requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types. The format string in @printf@ is one such example of an argument descriptor. \begin{cfacode}
• doc/rob_thesis/tuples.tex

 reaa2f3a1 Furthermore, while many C routines that accept pointers are designed so that it is safe to pass @NULL@ as a parameter, there are many C routines that are not null-safe. On a related note, C does not provide a standard mechanism to state that a parameter is going to be used as an additional return value, which makes the job of ensuring that a value is returned more difficult for the compiler. There is a subtle bug in the previous example, in that @ret_ch@ is never assigned for a string that does not contain any letters, which can lead to undefined behaviour. Interestingly, there is a subtle bug in the previous example, in that @ret_ch@ is never assigned for a string that does not contain any letters, which can lead to undefined behaviour. In this particular case, it turns out that the frequency return value also doubles as an error code, where a frequency of 0 means the character return value should be ignored. Still, not every routine with multiple return values should be required to return an error code, and error codes are easily ignored, so this is not a satisfying solution. As with the previous approach, this technique can simulate multiple return values, but in practice it is verbose and error prone. char freqs [26] = { 0 }; int ret_freq = 0; char ret_ch = 'a'; char ret_ch = 'a';  // arbitrary default value for consistent results for (int i = 0; str[i] != '\0'; ++i) { if (isalpha(str[i])) {        // only count letters } \end{cfacode} This approach provides the benefits of compile-time checking for appropriate return statements as in aggregation, but without the required verbosity of declaring a new named type, which precludes the bug seen with out parameters. This approach provides the benefits of compile-time checking for appropriate return statements as in aggregation, but without the required verbosity of declaring a new named type, which precludes the bug seen with out-parameters. The addition of multiple-return-value functions necessitates a syntax for accepting multiple values at the call-site. For the call to @g@, the values @y@ and @10@ are structured into a single argument of type @[int, int]@ to match the type of the parameter of @g@. Finally, in the call to @h@, @y@ is flattened to yield an argument list of length 3, of which the first component of @x@ is passed as the first parameter of @h@, and the second component of @x@ and @y@ are structured into the second argument of type @[int, int]@. The flexible structure of tuples permits a simple and expressive function call syntax to work seamlessly with both single- and multiple-return-value functions, and with any number of arguments of arbitrarily complex structure. The flexible structure of tuples permits a simple and expressive function-call syntax to work seamlessly with both single- and multiple-return-value functions, and with any number of arguments of arbitrarily complex structure. In \KWC \cite{Buhr94a,Till89}, a precursor to \CFA, there were 4 tuple coercions: opening, closing, flattening, and structuring. Opening coerces a tuple value into a tuple of values, while closing converts a tuple of values into a single tuple value. Flattening coerces a nested tuple into a flat tuple, i.e. it takes a tuple with tuple components and expands it into a tuple with only non-tuple components. Structuring moves in the opposite direction, i.e. it takes a flat tuple value and provides structure by introducing nested tuple components. Flattening coerces a nested tuple into a flat tuple, \ie it takes a tuple with tuple components and expands it into a tuple with only non-tuple components. Structuring moves in the opposite direction, \ie it takes a flat tuple value and provides structure by introducing nested tuple components. In \CFA, the design has been simplified to require only the two conversions previously described, which trigger only in function call and return situations. A mass assignment assigns the value $R$ to each $L_i$. For a mass assignment to be valid, @?=?(&$L_i$, $R$)@ must be a well-typed expression. These semantics differ from C cascading assignment (e.g. @a=b=c@) in that conversions are applied to $R$ in each individual assignment, which prevents data loss from the chain of conversions that can happen during a cascading assignment. These semantics differ from C cascading assignment (\eg @a=b=c@) in that conversions are applied to $R$ in each individual assignment, which prevents data loss from the chain of conversions that can happen during a cascading assignment. For example, @[y, x] = 3.14@ performs the assignments @y = 3.14@ and @x = 3.14@, which results in the value @3.14@ in @y@ and the value @3@ in @x@. On the other hand, the C cascading assignment @y = x = 3.14@ performs the assignments @x = 3.14@ and @y = x@, which results in the value @3@ in @x@, and as a result the value @3@ in @y@ as well. These semantics allow cascading tuple assignment to work out naturally in any context where a tuple is permitted. These semantics are a change from the original tuple design in \KWC \cite{Till89}, wherein tuple assignment was a statement that allows cascading assignments as a special case. Restricting tuple assignment to statements was an attempt to to fix what was seen as a problem with assignment, wherein it can be used in many different locations, such as in function-call argument position. Restricting tuple assignment to statements was an attempt to to fix what was seen as a problem with side-effects, wherein assignment can be used in many different locations, such as in function-call argument position. While permitting assignment as an expression does introduce the potential for subtle complexities, it is impossible to remove assignment expressions from \CFA without affecting backwards compatibility. Furthermore, there are situations where permitting assignment as an expression improves readability by keeping code succinct and reducing repetition, and complicating the definition of tuple assignment puts a greater cognitive burden on the user. \end{cfacode} The tuple expression begins with a mass assignment of @1.5@ into @[b, d]@, which assigns @1.5@ into @b@, which is truncated to @1@, and @1.5@ into @d@, producing the tuple @[1, 1.5]@ as a result. That tuple is used as the right side of the multiple assignment (i.e., @[c, a] = [1, 1.5]@) that assigns @1@ into @c@ and @1.5@ into @a@, which is truncated to @1@, producing the result @[1, 1]@. That tuple is used as the right side of the multiple assignment (\ie, @[c, a] = [1, 1.5]@) that assigns @1@ into @c@ and @1.5@ into @a@, which is truncated to @1@, producing the result @[1, 1]@. Finally, the tuple @[1, 1]@ is used as an expression in the call to @f@. In this example, @x@ is initialized by the multiple constructor calls @?{}(&x.0, 3)@ and @?{}(&x.1, 6.28)@, while @y@ is initialized by two default constructor calls @?{}(&y.0)@ and @?{}(&y.1)@. @z@ is initialized by mass copy constructor calls @?{}(&z.0, x.0)@ and @?{}(&z.1, x.0)@. Finally, @x@, @y@, and @z@ are destructed, i.e. the calls @^?{}(&x.0)@, @^?{}(&x.1)@, @^?{}(&y.0)@, @^?{}(&y.1)@, @^?{}(&z.0)@, and @^?{}(&z.1)@. Finally, @x@, @y@, and @z@ are destructed, \ie the calls @^?{}(&x.0)@, @^?{}(&x.1)@, @^?{}(&y.0)@, @^?{}(&y.1)@, @^?{}(&z.0)@, and @^?{}(&z.1)@. It is possible to define constructors and assignment functions for tuple types that provide new semantics, if the existing semantics do not fit the needs of an application. Then the type of @a.[x, y, z]@ is @[T_x, T_y, T_z]@. Since tuple index expressions are a form of member-access expression, it is possible to use tuple-index expressions in conjunction with member tuple expressions to manually restructure a tuple (e.g., rearrange components, drop components, duplicate components, etc.). Since tuple index expressions are a form of member-access expression, it is possible to use tuple-index expressions in conjunction with member tuple expressions to manually restructure a tuple (\eg, rearrange components, drop components, duplicate components, etc.). \begin{cfacode} [int, int, long, double] x; As for @z.y@, one interpretation is to extend the meaning of member tuple expressions. That is, currently the tuple must occur as the member, i.e. to the right of the dot. That is, currently the tuple must occur as the member, \ie to the right of the dot. Allowing tuples to the left of the dot could distribute the member across the elements of the tuple, in much the same way that member tuple expressions distribute the aggregate across the member tuple. In this example, @z.y@ expands to @[z.0.y, z.1.y]@, allowing what is effectively a very limited compile-time field-sections map operation, where the argument must be a tuple containing only aggregates having a member named @y@. struct A { int x; }; (struct A)f();  // invalid (struct A)f();  // invalid, int cannot be converted to A \end{cfacode} In C, line 4 is a valid cast, which calls @f@ and discards its result. [int, [int, int], int] g(); ([int, double])f();           // (1) ([int, int, int])g();         // (2) ([void, [int, int]])g();      // (3) ([int, int, int, int])g();    // (4) ([int, [int, int, int]])g();  // (5) ([int, double])f();           // (1) valid ([int, int, int])g();         // (2) valid ([void, [int, int]])g();      // (3) valid ([int, int, int, int])g();    // (4) invalid ([int, [int, int, int]])g();  // (5) invalid \end{cfacode} If @g@ is free of side effects, this is equivalent to @[(int)(g().0), (int)(g().1.0), (int)(g().2)]@. Since @void@ is effectively a 0-element tuple, (3) discards the first and third return values, which is effectively equivalent to @[(int)(g().1.0), (int)(g().1.1)]@). % will this always hold true? probably, as constructors should give all of the conversion power we need. if casts become function calls, what would they look like? would need a way to specify the target type, which seems awkward. Also, C++ basically only has this because classes are closed to extension, while we don't have that problem (can have floating constructors for any type). Note that a cast is not a function call in \CFA, so flattening and structuring conversions do not occur for cast expressions. \end{cfacode} Until this point, it has been assumed that assertion arguments must match the parameter type exactly, modulo polymorphic specialization (i.e., no implicit conversions are applied to assertion arguments). Until this point, it has been assumed that assertion arguments must match the parameter type exactly, modulo polymorphic specialization (\ie, no implicit conversions are applied to assertion arguments). This decision presents a conflict with the flexibility of tuples. \subsection{Assertion Inference} } \end{cfacode} Is transformed into is transformed into \begin{cfacode} forall(dtype T0, dtype T1 | sized(T0) | sized(T1)) struct _tuple2 {  // generated before the first 2-tuple struct _tuple2_ {  // generated before the first 2-tuple T0 field_0; T1 field_1; _tuple2_(double, double) x; forall(dtype T0, dtype T1, dtype T2 | sized(T0) | sized(T1) | sized(T2)) struct _tuple3 {  // generated before the first 3-tuple struct _tuple3_ {  // generated before the first 3-tuple T0 field_0; T1 field_1; [5, 'x', 1.24]; \end{cfacode} Becomes becomes \begin{cfacode} (_tuple3_(int, char, double)){ 5, 'x', 1.24 }; f(x, 'z'); \end{cfacode} Is transformed into is transformed into \begin{cfacode} void f(int, _tuple2_(double, char)); It is possible that lazy evaluation could be exposed to the user through a lazy keyword with little additional effort. Tuple member expressions are recursively expanded into a list of member access expressions. Tuple-member expressions are recursively expanded into a list of member-access expressions. \begin{cfacode} [int, [double, int, double], int]] x; x.[0, 1.[0, 2]]; \end{cfacode} which becomes becomes \begin{cfacode} [x.0, [x.1.0, x.1.2]]; [x, y, z] = 1.5;            // mass assignment \end{cfacode} Generates the following generates the following \begin{cfacode} // [x, y, z] = 1.5; }); \end{cfacode} A variable is generated to store the value produced by a statement expression, since its fields may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, e.g., in a unique expression. A variable is generated to store the value produced by a statement expression, since its fields may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, \eg, in a unique expression. $N$ LHS variables are generated and constructed using the address of the tuple components, and a single RHS variable is generated to store the value of the RHS without any loss of precision. A nested statement expression is generated that performs the individual assignments and constructs the return value using the results of the individual assignments. [x, y, z] = [f(), 3];       // multiple assignment \end{cfacode} Generates generates the following \begin{cfacode} // [x, y, z] = [f(), 3];
 reaa2f3a1 In addition, C requires the programmer to hard code all of the possible expected types. As a result, it is cumbersome to write a function that is open to extension. For example, a simple function which sums $N$ @int@s, For example, a simple function to sum $N$ @int@s, \begin{cfacode} int sum(int N, ...) { sum(3, 10, 20, 30);  // need to keep counter in sync \end{cfacode} The @va_list@ type is a special C data type that abstracts variadic argument manipulation. The @va_list@ type is a special C data type that abstracts variadic-argument manipulation. The @va_start@ macro initializes a @va_list@, given the last named parameter. Each use of the @va_arg@ macro allows access to the next variadic argument, given a type. In the case where the provided type is not compatible with the argument's actual type after default argument promotions, or if too many arguments are accessed, the behaviour is undefined \cite[p.~81]{C11}. Furthermore, there is no way to perform the necessary error checks in the @sum@ function at run-time, since type information is not carried into the function body. Since they rely on programmer convention rather than compile-time checks, variadic functions are generally unsafe. Since they rely on programmer convention rather than compile-time checks, variadic functions are unsafe. In practice, compilers can provide warnings to help mitigate some of the problems. For example, GCC provides the @format@ attribute to specify that a function uses a format string, which allows the compiler to perform some checks related to the standard format specifiers. Unfortunately, this approach does not permit extensions to the format string syntax, so a programmer cannot extend the attribute to warn for mismatches with custom types. For example, GCC provides the @format@ attribute to specify that a function uses a format string, which allows the compiler to perform some checks related to the standard format-specifiers. Unfortunately, this approach does not permit extensions to the format-string syntax, so a programmer cannot extend the attribute to warn for mismatches with custom types. As a result, C's variadic functions are a deficient language feature. Similarly, in order to pass 0 variadic arguments, an explicit empty tuple must be passed into the argument list, otherwise the exact matching rule would not have an argument to bind against. It should be otherwise noted that the addition of an exact matching rule only affects the outcome for polymorphic type binding when tuples are involved. It should be otherwise noted that the addition of an exact matching rule only affects the outcome for polymorphic type-binding when tuples are involved. For non-tuple arguments, exact matching and flattening and structuring are equivalent. For tuple arguments to a function without polymorphic formal parameters, flattening and structuring work whenever an exact match would have worked, since the tuple is flattened and implicitly restructured to its original structure. For tuple arguments to a function without polymorphic formal-parameters, flattening and structuring work whenever an exact match would have worked, since the tuple is flattened and implicitly restructured to its original structure. Thus there is nothing to be gained from permitting the exact matching rule to take effect when a function does not contain polymorphism and none of the arguments are tuples. return x+y; } forall(otype T1, otype T2, otype T3, ttype Params, otype R forall(otype T1, otype T2, otype T3, otype R, ttype Params | summable(T1, T2, T3) | { R sum(T3, Params); }) \CFA does not need an ellipsis in either case, since the type class @ttype@ is only used for variadics. An alternative design is to use an ellipsis combined with an existing type class. This approach was not taken because the largest benefit of the ellipsis token in \CC is the ability to expand a parameter pack within an expression, e.g., in fold expressions, which requires compile-time knowledge of the structure of the parameter pack, which is not available in \CFA. This approach was not taken because the largest benefit of the ellipsis token in \CC is the ability to expand a parameter pack within an expression, \eg, in fold expressions, which requires compile-time knowledge of the structure of the parameter pack, which is not available in \CFA. \begin{cppcode} template Array * x = new(1, 2, 3); \end{cfacode} The @new@ function provides the combination of type-safe @malloc@ with a constructor call, so that it becomes impossible to forget to construct dynamically allocated objects. In the call to @new@, @Array@ is selected to match @T@, and @Params@ is expanded to match @[int, int, int, int]@. To satisfy the assertions, a constructor with an interface compatible with @void ?{}(Array *, int, int, int)@ must exist in the current scope. The @new@ function provides the combination of type-safe @malloc@ with a constructor call, so that it becomes impossible to forget to construct dynamically-allocated objects. This approach provides the type-safety of @new@ in \CC, without the need to specify the allocated type, thanks to return-type inference. In the call to @new@, @Array@ is selected to match @T@, and @Params@ is expanded to match @[int, int, int, int]@. To satisfy the assertions, a constructor with an interface compatible with @void ?{}(Array *, int, int, int)@ must exist in the current scope. \section{Implementation} } \end{cfacode} Generates the following generates the following \begin{cfacode} void *malloc(long unsigned int _sizeof_T, long unsigned int _alignof_T); \end{cfacode} The constructor for @T@ is called indirectly through the adapter function on the result of @malloc@ and the parameter pack. The variable that was allocated and constructed is then returned from @new@. The variable that is allocated and constructed is then returned from @new@. A call to @new@ } \end{cfacode} Generates generates the following \begin{cfacode} void print_variadic( print("x = ", 123, ".\n"); \end{cfacode} Generates the following generates the following \begin{cfacode} void print_string(const char *x){