source: doc/theses/aaron_moss_PhD/phd/background.tex @ 093a5d7

ADTarm-ehast-experimentalenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 093a5d7 was f845e80, checked in by Aaron Moss <a3moss@…>, 6 years ago

thesis: apply round 2 revisions and strip change bars

  • Property mode set to 100644
File size: 33.7 KB
RevLine 
[7ff01ff]1\chapter{\CFA{}}
[0e6a0beb]2\label{cfa-chap}
[2a9d12d]3
[0cf9ffd]4\CFA{} adds a number of features to C, some of them providing significant increases to the expressive power of the language, but all designed to maintain the existing procedural programming paradigm of C and to be as orthogonal as possible to each other.
[ce00317]5To provide background for the contributions in subsequent chapters, this chapter provides a summary of the features of \CFA{} at the time this work was conducted.
6
[397848f5]7Glen Ditchfield laid out the core design of \CFA{} in his 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
[a2971cc]8Building on Ditchfield's design for contextual polymorphism as well as KW-C \cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson \cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
[1b1a8da]9This early \CFACC{} provided basic functionality, but incorporated a number of algorithmic choices that have failed to scale as \CFA{} has developed, lacking the runtime performance for practical use; this thesis is substantially concerned with rectifying those deficits.
[ce00317]10
[1b1a8da]11The \CFA{} project was revived in 2015 with the intention of building a production-ready language and compiler; at the time of this writing, both \CFA{} and \CFACC{} remain under active development.
12As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal~\cite{Moss18} provides a reasonable summary of the current design.
[a2971cc]13Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors \cite{Schluntz17}, improved support for tuples \cite{Schluntz17}, reference types \cite{Moss18}, first-class concurrent and parallel programming support \cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library \cite{Moss18}.
[ce00317]14
[c1f3d1a8]15This thesis is primarily concerned with the \emph{expression resolution} portion of \CFA{} type-checking; resolution is discussed in more detail in Chapter~\ref{resolution-chap}, but is essentially determining which declarations the identifiers in each expression correspond to.
[f845e80]16In C, no simultaneously-visible declarations share identifiers, hence expression resolution in C is not difficult.
17In \CFA{}, multiple added features make the resolution process significantly more complex.
18Due to this complexity, the expression-resolution pass in \CFACC{} requires 95\% of compiler runtime on some source files, making a new, more efficient procedure for expression resolution a requirement for a performant \CFA{} compiler.
[c1f3d1a8]19
[cf01d0b]20The features presented in this chapter are chosen to elucidate the design constraints of the work presented in this thesis.
[1b1a8da]21In some cases the interactions of multiple features make this design a significantly more complex problem than any individual feature; in other cases a feature that does not by itself add any complexity to expression resolution triggers previously rare edge cases more frequently.
[0cf9ffd]22
[1b1a8da]23\section{Procedural Paradigm}
[0cf9ffd]24
25It is important to note that \CFA{} is not an object-oriented language.
[1b1a8da]26This is a deliberate choice intended to maintain the applicability of the programming model and language idioms already possessed by C programmers.
27This choice is in marked contrast to \CC{}, which is a much larger and more complex language, and requires extensive developer re-training to write idiomatic, efficient code in \CC{}'s object-oriented paradigm.
[0cf9ffd]28
29Particularly, \CFA{} has no concept of \emph{subclass}, and thus no need to integrate an inheritance-based form of polymorphism with its parametric and overloading-based polymorphism.
[1b1a8da]30While \CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} and these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
[a2971cc]31The graph structure of the \CFA{} type conversions (discussed in Section~\ref{conv-cost-sec}) is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
[0cf9ffd]32
33Another aspect of \CFA{}'s procedural paradigm is that it retains C's translation-unit-based encapsulation model, rather than class-based encapsulation such as in \CC{}.
[397848f5]34As such, any language feature that requires code to be exposed in header files (\eg{} \CC{} templates) also eliminates encapsulation in \CFA{}.
35Given this constraint, \CFA{} is carefully designed to allow separate compilation for its added language features under the existing C usage patterns.
[0cf9ffd]36
[1b1a8da]37\section{Name Overloading} \label{overloading-sec}
[0cf9ffd]38
[a2971cc]39In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names \cite[\S{}6.2.3]{C11}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
[1b1a8da]40This restriction makes finding the proper declaration to match to a variable expression or function application a simple matter of lexically-scoped name lookup, which can be easily and efficiently implemented.
[0cf9ffd]41\CFA{}, on the other hand, allows overloading of variable and function names so long as the overloaded declarations do not have the same type, avoiding the multiplication of variable and function names for different types common in the C standard library, as in the following example:
42
43\begin{cfa}
44#include <limits.h>
45
46const int max = INT_MAX;  $\C[1.75in]{// (1)}$
47const double max = DBL_MAX;  $\C[1.75in]{// (2)}$
48int max(int a, int b) { return a < b ? b : a; }  $\C[1.75in]{// (3)}$
49double max(double a, double b) { return a < b ? b : a; }  $\C[1.75in]{// (4)}$
50
51max( 7, -max );  $\C[3.75in]{// uses (3) and (1), by matching int from 7}$
52max( max, 3.14 );  $\C[3.75in]{// uses (4) and (2), by matching double from 3.14}$
53max( max, -max );  $\C[3.75in]{// ERROR, ambiguous}$
54int m = max( max, -max );  $\C[3.75in]{// uses (3) and (1) twice, by matching return type}$
55\end{cfa}
56
[1b1a8da]57The final expression in the preceding example includes a feature of \CFA{} name overloading not shared by \CC{}, the ability to disambiguate expressions based on their return type. This provides greater flexibility and power than the parameter-based overload selection of \CC{}, though at the cost of greater complexity in the resolution algorithm.
58
59While the wisdom of giving both the maximum value of a type and the function to take the maximum of two values the same name in the example above is debatable, \eg{} some developers may prefer !MAX! for the former, the guiding philosophy of \CFA{} is ``describe, don't prescribe'' --- we prefer to trust programmers with powerful tools, and there is no technical reason to restrict overloading between variables and functions.
[a2971cc]60However, the expressivity of \CFA{}'s name overloading does have the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
[0cf9ffd]61
[1b1a8da]62\subsection{Operator Overloading}
[0cf9ffd]63
64C does allow name overloading in one context: operator overloading.
65From the perspective of the type system, there is nothing special about operators as opposed to other functions, nor is it desirable to restrict the clear and readable syntax of operators to only the built-in types.
[1b1a8da]66For these reasons, \CFA{}, like \CC{} and many other programming languages, allows overloading of operators by writing specially-named functions where !?! stands in for the operands.
67This syntax is more natural than the operator overloading syntax of \CC{}, which requires ``dummy'' parameters to disambiguate overloads of similarly-named pre- and postfix operators\footnote{This example uses \CFA{}'s reference types, described in Section~\ref{type-features-sec}}:
[0cf9ffd]68
69\begin{cfa}
70struct counter { int x; };
71
[f9c7d27]72counter& `++?`(counter& c) { ++c.x; return c; }  $\C[2in]{// pre-increment}$
73counter `?++`(counter& c) {  $\C[2in]{// post-increment}$
[0cf9ffd]74        counter tmp = c; ++c; return tmp;
75}
[f9c7d27]76bool `?<?`(const counter& a, const counter& b) {  $\C[2in]{// comparison}$
[0cf9ffd]77        return a.x < b.x;
78}
79\end{cfa}
80
[a2971cc]81Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11}, so as to present user programmers with only a single set of overloading rules.
[0cf9ffd]82
[1b1a8da]83\subsection{Special Literal Types}
[91a950c]84
85Literal !0! is also used polymorphically in C; it may be either integer zero or the null value of any pointer type.
[1b1a8da]86\CFA{} provides a special type for the !0! literal, !zero_t!, so that users can define a zero value for their own types without being forced to create a conversion from an integer or pointer type; \CFA{} also includes implicit conversions from !zero_t! to the !int! and pointer type constructors\footnote{See Section~\ref{type-features-sec}} from !zero_t! for backward compatibility.
[91a950c]87
[a2971cc]88According to the C standard \cite[\S{}6.8.4.1]{C11}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
89By this rule, Boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
[1b1a8da]90\CFACC{} applies this rewriting in all Boolean contexts, so any type !T! can be made ``truthy'' (that is, given a Boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}.
91\CC{} takes a different approach to user-defined truthy types, allowing definition of an implicit conversion operator to !bool!; prior to the introduction of the !explicit! keyword for conversion operators in \CCeleven{} this approach also allowed undesired implicit conversions to all other arithmetic types, a shortcoming not shared by the \CFA{} design.
[91a950c]92
93\CFA{} also includes a special type for !1!, !one_t!; like !zero_t!, !one_t! has built-in implicit conversions to the various integral types so that !1! maintains its expected semantics in legacy code.
94The addition of !one_t! allows generic algorithms to handle the unit value uniformly for types where it is meaningful; a simple example of this is that polymorphic functions\footnote{discussed in Section~\ref{poly-func-sec}} in the \CFA{} prelude define !++x! and !x++! in terms of !x += 1!, allowing users to idiomatically define all forms of increment for a type !T! by defining the single function !T& ?+=?(T&, one_t)!; analogous overloads for the decrement operators are also present, and programmers can override any of these functions for a particular type if desired.
95
[cf01d0b]96\CFA{} previously allowed !0! and !1! to be the names of polymorphic variables, with separate overloads for !int 0!, !int 1!, and the polymorphic variable !forall(dtype T) T* 0!.
[9d9a451]97While designing \CFA{} generic types (see Chapter~\ref{generic-chap}), it was discovered that the parametric polymorphic zero variable is not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
[cf01d0b]98As such, polymorphic variables, and in particular variables for !0! and !1!, were phased out in favour of functions that could generate those values for a given type as appropriate.
[91a950c]99
[1b1a8da]100\section{Polymorphic Functions} \label{poly-func-sec}
[0cf9ffd]101
[1b1a8da]102The most significant type-system feature \CFA{} adds is parametric-polymorphic functions.
[0cf9ffd]103Such functions are written using a !forall! clause (which gives the language its name):
104
105\begin{cfa}
106`forall(otype T)`
107T identity(T x) { return x; }
108\end{cfa}
109
110The !identity! function above can be applied to any complete object type (or ``!otype!'').
111The type variable !T! is transformed into a set of additional implicit parameters to !identity!, which encode sufficient information about !T! to create and return a variable of that type.
112\CFA{} passes the size and alignment of the type represented by an !otype! parameter, as well as a default constructor, copy constructor, assignment operator, and destructor.
[69c37cc]113Types that do not have one or more of these operators visible cannot be bound to !otype! parameters, but may be bound to un-constrained !dtype! (``data type'') type variables.
[a2971cc]114In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; the experiments in Chapter~\ref{generic-chap} indicate that this overhead is comparable to that of \CC{} virtual function calls.
115% \TODO{rerun experiments, possibly look at vtable variant}
[0cf9ffd]116
117One benefit of this design is that it allows polymorphic functions to be separately compiled.
118The forward declaration !forall(otype T) T identity(T);! uniquely defines a single callable function, which may be implemented in a different file.
[f9c7d27]119The fact that there is only one implementation of each polymorphic function also reduces compile times relative to the template-expansion approach taken by \CC{}, as well as reducing binary sizes and runtime pressure on instruction cache by re-using a single version of each function.
[0cf9ffd]120
[1b1a8da]121\subsection{Type Assertions}
[0cf9ffd]122
[397848f5]123Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{This example introduces a convention used throughout this thesis of disambiguating overloaded names with subscripts; the subscripts do not appear in \CFA{} source code.}:
[0cf9ffd]124
125\begin{cfa}
126forall(otype T `| { T twice(T); }`)
127T four_times(T x) { return twice( twice(x) ); }
[1b1a8da]128double twice$\(_1\)$(double d) { return d * 2.0; }
[0cf9ffd]129
130double ans = four_times( 10.5 )$\C[2.75in]{// T bound to double, ans == 42.0}$
131\end{cfa}
132
133These type assertions may be either variable or function declarations that depend on a polymorphic type variable.
134!four_times! may only be called with an argument for which there exists a function named !twice! that can take that argument and return another value of the same type; a pointer to the appropriate function is passed as an additional implicit parameter of the call to !four_times!.
135
136Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
[397848f5]137For instance, !twice! could have been defined like this:
[0cf9ffd]138
139\begin{cfa}
140forall(otype S | { S ?+?(S, S); })
[1b1a8da]141S twice$\(_2\)$(S x) { return x + x; }
[0cf9ffd]142\end{cfa}
143
[69c37cc]144Specializing this polymorphic function with !S = double! produces a monomorphic function which can be used to satisfy the type assertion on !four_times!.
[1b1a8da]145\CFACC{} accomplishes this by creating a wrapper function calling !twice!$_2$ with !S! bound to !double!, then providing this wrapper function to !four_times!\footnote{\lstinline{twice}$_2$ could also have had a type parameter named \lstinline{T}; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline{T} of \lstinline{four_times}}.
146However, !twice!$_2$ also works for any type !S! that has an addition operator defined for it.
[0cf9ffd]147
[c1f3d1a8]148Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope\footnote{\CFACC{} actually performs a type-unification computation for assertion satisfaction rather than an expression resolution computation; see Section~\ref{assn-sat-sec} for details.}.
[f9c7d27]149If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own assertion unboundedly repeatedly.
[1b1a8da]150To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some otherwise semantically well-typed expressions that cannot be resolved by \CFACC{}.
[0cf9ffd]151
[1b1a8da]152\subsection{Traits}
[0cf9ffd]153
[a2971cc]154\CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types and constructors, described in Section~\ref{type-features-sec}.}:
[0cf9ffd]155
156\begin{cfa}
157`trait has_magnitude(otype T)` {
158        bool ?<?(T, T)$\C[4in]{// comparison operator}$
159        T -?(T)$\C[4in]{// negation operator}$
160        void ?{}(T&, zero_t)$\C[4in]{// constructor from 0}$
161};
162
163forall(otype M | `has_magnitude(M)`)
164M abs(M m) { return m < (M){0} ? -m : m; }
165
166forall(otype M | `has_magnitude(M)`)
167M max_magnitude(M a, M b) { return abs(a) < abs(b) ? b : a; }
168\end{cfa}
169
170Semantically, traits are simply a named list of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC{} are used for.
[1b1a8da]171Unlike Java interfaces or \CC{} base classes, \CFA{} types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to interface implementation in Go, as opposed to the nominal inheritance model of Java and \CC{}.
[0cf9ffd]172Nominal inheritance can be simulated in \CFA{} using marker variables in traits:
173
174\begin{cfa}
175trait nominal(otype T) {
176        `T is_nominal;`
177};
178
179int is_nominal;  $\C{// int now satisfies nominal}$
180{
181        char is_nominal;  $\C{// char only satisfies nominal in this scope}$
182}
183\end{cfa}
184
185Traits, however, are significantly more powerful than nominal-inheritance interfaces; firstly, due to the scoping rules of the declarations that satisfy a trait's type assertions, a type may not satisfy a trait everywhere that that type is declared, as with !char! and the !nominal! trait above.
[3b40801b]186Secondly, because \CFA{} is not object-oriented and types do not have a closed set of methods, existing C library types can be extended to implement a trait simply by writing the requisite functions\footnote{\CC{} only allows partial extension of C types, because constructors, destructors, conversions, and the assignment, indexing, and function-call operators may only be defined in a class; \CFA{} does not have any of these restrictions.}.
187Finally, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems\footnote{This example uses \CFA{}'s reference types, described in Section~\ref{type-features-sec}.}:
[0cf9ffd]188
189\begin{cfa}
190trait pointer_like(`otype Ptr, otype El`) {
[f9c7d27]191        El& *?(Ptr)$\C{// Ptr can be dereferenced to El}$
[0cf9ffd]192};
193
194struct list {
195        int value;
[f9c7d27]196        list* next; $\C{// may omit struct on type names}$
[0cf9ffd]197};
198
199typedef list* list_iterator;
200
201int& *?(list_iterator it) {
202        return it->value;
203}
204\end{cfa}
205
[1b1a8da]206In this example above, !(list_iterator, int)! satisfies !pointer_like! by the user-defined dereference function, and !(list_iterator, list)! also satisfies !pointer_like! by the built-in dereference operator for pointers.
[0cf9ffd]207Given a declaration !list_iterator it!, !*it! can be either an !int! or a !list!, with the meaning disambiguated by context (\eg{} !int x = *it;! interprets !*it! as !int!, while !(*it).value = 42;! interprets !*it! as !list!).
[3b40801b]208While a nominal-inheritance system with associated types could model one of those two relationships by making !El! an associated type of !Ptr! in the !pointer_like! implementation,
[f845e80]209I am unaware of any nominal-inheritance system that can model both relationships simultaneously.
[cf01d0b]210Further comparison of \CFA{} polymorphism with other languages can be found in Section~\ref{generic-related-sec}.
[0cf9ffd]211
212The flexibility of \CFA{}'s implicit trait-satisfaction mechanism provides programmers with a great deal of power, but also blocks some optimization approaches for expression resolution.
[cf01d0b]213The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgments difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
214
215\subsection{Deleted Declarations}
216
[f845e80]217Particular type combinations can be exempted from matching a given polymorphic function through the use of a \emph{deleted function declaration}:
[cf01d0b]218
219\begin{cfa}
220int somefn(char) = void;
221\end{cfa}
222
223This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers.}.
224Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
225If this deleted declaration is selected as the unique minimal-cost interpretation of an expression then an error is produced, allowing \CFA{} programmers to guide the expression resolver away from undesirable solutions.
[0cf9ffd]226
[1b1a8da]227\section{Implicit Conversions} \label{implicit-conv-sec}
[0cf9ffd]228
[1b1a8da]229In addition to the multiple interpretations of an expression produced by name overloading and polymorphic functions, \CFA{} must support all of the implicit conversions present in C for backward compatibility, producing further candidate interpretations for expressions.
230As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} define which of the built-in types are implicitly convertible to which other types, as well as which implicit conversions to apply for mixed arguments to binary operators.
[f9c7d27]231\CFA{} adds rules to the usual arithmetic conversions defining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg{} !int! to !char!, but more expensive than any \emph{safe} (widening) conversion, \eg{} !int! to !double!.
[a2971cc]232One contribution of this thesis, discussed in Section~\ref{conv-cost-sec}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
[0cf9ffd]233
[c1f3d1a8]234In the context of these implicit conversions, the expression resolution problem can be restated as finding the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
[1b1a8da]235While semantically valid \CFA{} code always has such a unique minimal-cost interpretation, \CFACC{} must also be able to detect and report as errors expressions that have either no interpretation or multiple ambiguous minimal-cost interpretations.
[0cf9ffd]236Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
237For instance, in the example in Section~\ref{overloading-sec}, !max(max, -max)! cannot be unambiguously resolved, but !int m = max(max, -max)! has a single minimal-cost resolution.
[1b1a8da]238While the interpretation !int m = (int)max((double)max, -(double)max)! is also a valid interpretation, it is not minimal-cost due to the unsafe cast from the !double! result of !max! to !int!\footnote{The two \lstinline{double} casts function as type ascriptions selecting \lstinline{double max} rather than casts from \lstinline{int max} to \lstinline{double}, and as such are zero-cost. The \lstinline{int} to \lstinline{double} conversion could be forced if desired with two casts: \lstinline{(double)(int)max}}.
[f9c7d27]239These contextual effects make the expression resolution problem for \CFA{} both theoretically and practically difficult, but the observation driving the work in Chapter~\ref{resolution-chap} is that of the many top-level expressions in a given program, most are straightforward and idiomatic so that programmers writing and maintaining the code can easily understand them; it follows that effective heuristics for common cases can bring down compiler runtime enough that a small proportion of harder-to-resolve expressions does not inordinately increase overall compiler runtime or memory usage.
[0cf9ffd]240
[1b1a8da]241\section{Type Features} \label{type-features-sec}
[0cf9ffd]242
[1b1a8da]243The name overloading and polymorphism features of \CFA{} have the greatest effect on language design and compiler runtime, but there are a number of other features in the type system that have a smaller effect but are useful for code examples.
[f9c7d27]244These features are described here.
245
[1b1a8da]246\subsection{Reference Types}
[0cf9ffd]247
[a2971cc]248One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz \cite{Schluntz17}.
[f9c7d27]249Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer.
[1b1a8da]250These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy aliasing of nested values with a similarly convenient syntax.
[397848f5]251The addition of reference types also eliminated two syntactic special-cases present in previous versions of \CFA{}.
[f845e80]252Consider the a call !a += b! to a compound assignment operator.
253The previous declaration for that operator is !lvalue int ?+=?(int*, int)!.
254To mutate the left argument, the built-in operators were special-cased to implicitly take the address of that argument, while the special !lvalue! syntax was used to mark the return type of a function as a mutable reference.
255With references, this declaration is re-written as !int& ?+=?(int&, int)!.
256The reference semantics generalize the implicit address-of on the left argument and allow it to be used in user-declared functions, while also subsuming the (now removed) !lvalue! syntax for function return types.
[f9c7d27]257
[1b1a8da]258The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named in the standard.
[a2971cc]259In \CFA{}, the distinction between lvalue and rvalue can be re-framed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
[cf01d0b]260\CFA{} references preserve the existing qualifier-dropping implicit lvalue-to-rvalue conversion from C (\eg{} a !const volatile int&! can be implicitly copied to a bare !int!).
[1b1a8da]261To make reference types more easily usable in legacy pass-by-value code, \CFA{} also adds an implicit rvalue-to-lvalue conversion, implemented by storing the value in a compiler-generated temporary variable and passing a reference to that temporary.
262To mitigate the ``!const! hell'' problem present in \CC{}, there is also a qualifier-dropping lvalue-to-lvalue conversion implemented by copying into a temporary:
[f9c7d27]263
264\begin{cfa}
265const int magic = 42;
266void inc_print( int& x ) { printf("%d\n", ++x); }
267
[cf01d0b]268inc_print( magic ); $\C{// legal; implicitly generated code in red below:}$
[f9c7d27]269
270`int tmp = magic;` $\C{// to safely strip const-qualifier}$
[cf01d0b]271`inc_print( tmp );` $\C{// tmp is incremented, magic is unchanged}$
[f9c7d27]272\end{cfa}
273
274Despite the similar syntax, \CFA{} references are significantly more flexible than \CC{} references.
275The primary issue with \CC{} references is that it is impossible to extract the address of the reference variable rather than the address of the referred-to variable.
276This breaks a number of the usual compositional properties of the \CC{} type system, \eg{} a reference cannot be re-bound to another variable, nor is it possible to take a pointer to, array of, or reference to a reference.
[a2971cc]277\CFA{} supports all of these use cases without further added syntax.
[f9c7d27]278The key to this syntax-free feature support is an observation made by the author that the address of a reference is a lvalue.
[f845e80]279In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference\footnote{The syntactic difference between reference initialization and reference assignment is unfortunate, but preserves the ability to pass function arguments by reference (a reference initialization context) without added syntax.}, and pointers to references by repeating the address-of operator:
[f9c7d27]280
281\begin{cfa}
282int x = 2, y = 3;
283int& r = x;  $\C{// r aliases x}$
284&r = &y; $\C{// r now aliases y}$
285int** p = &&r; $\C{// p points to r}$
286\end{cfa}
287
[cf01d0b]288For better compatibility with C, the \CFA{} team has chosen not to differentiate function overloads based on top-level reference types, and as such their contribution to the difficulty of \CFA{} expression resolution is largely restricted to the implementation details of matching reference to non-reference types during type-checking.
[f9c7d27]289
[397848f5]290\subsection{Resource Management} \label{ctor-sec}
[f9c7d27]291
[a2971cc]292\CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz \cite{Schluntz17}.
[f9c7d27]293This idiom allows a safer and more principled approach to resource management by tying acquisition of a resource to object initialization, with the corresponding resource release executed automatically at object finalization.
294A wide variety of conceptual resources may be conveniently managed by this scheme, including heap memory, file handles, and software locks.
295
296\CFA{}'s implementation of RAII is based on special constructor and destructor operators, available via the !x{ ... }! constructor syntax and !^x{ ... }! destructor syntax.
297Each type has an overridable compiler-generated zero-argument constructor, copy constructor, assignment operator, and destructor, as well as a field-wise constructor for each appropriate prefix of the member fields of !struct! types.
298For !struct! types the default versions of these operators call their equivalents on each field of the !struct!.
299The main implication of these object lifetime functions for expression resolution is that they are all included as implicit type assertions for !otype! type variables, with a secondary effect being an increase in code size due to the compiler-generated operators.
300Due to these implicit type assertions, assertion resolution is pervasive in \CFA{} polymorphic functions, even those without explicit type assertions.
301Implicitly-generated code is shown in red in the following example:
302
303\begin{cfa}
304struct kv {
305        int key;
306        char* value;
307};
308
309`void ?{} (kv& this) {` $\C[3in]{// default constructor}$
310`       this.key{};` $\C[3in]{// call recursively on members}$
311`       this.value{};
312}
313
314void ?{} (kv& this, int key) {` $\C[3in]{// partial field constructor}$
315`       this.key{ key };
316        this.value{};` $\C[3in]{// default-construct missing fields}$
317`}
318
319void ?{} (kv& this, int key, char* value) {` $\C[3in]{// complete field constructor}$
320`       this.key{ key };
321        this.value{ value };
322}
323
324void ?{} (kv& this, kv that) {` $\C[3in]{// copy constructor}$
325`       this.key{ that.key };
326        this.value{ that.value };
327}
328
329kv ?=? (kv& this, kv that) {` $\C[3in]{// assignment operator}$
330`       this.key = that.key;
331        this.value = that.value;
332}
333
334void ^?{} (kv& this) {` $\C[3in]{// destructor}$
335`       ^this.key{};
336        ^this.value{};
337}`
338
339forall(otype T `| { void ?{}(T&); void ?{}(T&, T); T ?=?(T&, T); void ^?{}(T&); }`)
340void foo(T);
341\end{cfa}
342
[1b1a8da]343\subsection{Tuple Types}
[f9c7d27]344
[91a950c]345\CFA{} adds \emph{tuple types} to C, a syntactic facility for referring to lists of values anonymously or with a single identifier.
346An identifier may name a tuple, a function may return one, and a tuple may be implicitly \emph{destructured} into its component values.
347The implementation of tuples in \CFACC{}'s code generation is based on the generic types introduced in Chapter~\ref{generic-chap}, with one compiler-generated generic type for each tuple arity.
[1b1a8da]348This reuse allows tuples to take advantage of the same runtime optimizations available to generic types, while reducing code bloat.
[a2971cc]349An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example demonstrates the basic features:
[f9c7d27]350
[91a950c]351\begin{cfa}
[1b1a8da]352[char, char] x$\(_1\)$ = ['!', '?']; $\C{// tuple type and expression syntax}$
353int x$\(_2\)$ = 2;
[0cf9ffd]354
[91a950c]355forall(otype T)
[1b1a8da]356[T, T] swap$\(_1\)$( T a, T b ) {
[91a950c]357        return [b, a]; $\C{// one-line swap syntax}$
358}
359
[1b1a8da]360x = swap( x ); $\C{// destructure x\(_1\) into two elements}$
361$\C{// cannot use x\(_2\), not enough arguments}$
[91a950c]362
[1b1a8da]363void swap$\(_2\)$( int, char, char );
[91a950c]364
[1b1a8da]365swap( x, x ); $\C{// swap\(_2\)( x\(_2\), x\(_1\) )}$
366$\C{// not swap\(_1\)( x\(_2\), x\(_2\) ) due to polymorphism cost}$
[91a950c]367\end{cfa}
[0cf9ffd]368
[91a950c]369Tuple destructuring breaks the one-to-one relationship between identifiers and values.
[1b1a8da]370Hence, some argument-parameter matching strategies for expression resolution are precluded, as well as cheap interpretation filters based on comparing number of parameters and arguments.
371As an example, in the call to !swap( x, x )! above, the second !x! can be resolved starting at the second or third parameter of !swap!$_2$, depending which interpretation of !x! is chosen for the first argument.
[a2971cc]372
373\section{Conclusion}
374
375\CFA{} adds a significant number of features to standard C, increasing the expressivity and re-usability of \CFA{} code while maintaining backwards compatibility for both code and larger language paradigms.
[39de1c5]376This flexibility does incur significant added compilation costs, however, the mitigation of which are the primary concern of this thesis.
Note: See TracBrowser for help on using the repository browser.