Changeset 752dc70 for doc/aaron_comp_II


Ignore:
Timestamp:
Aug 6, 2016, 6:35:17 PM (8 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, ctor, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, memory, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
3bb195cb
Parents:
d1ef1b0
Message:

Incorporate Peter's suggestions for Comp II section 2

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/aaron_comp_II/comp_II.tex

    rd1ef1b0 r752dc70  
    116116The ©identity© function above can be applied to any complete object type (or ``©otype©'').
    117117The type variable ©T© is transformed into a set of additional implicit parameters to ©identity© which encode sufficient information about ©T© to create and return a variable of that type.
    118 The current \CFA implementation passes the size and alignment of the type represented by an ©otype© parameter, as well as an assignment operator, constructor, copy constructor and destructor.
     118The current \CFA implementation passes the size and alignment of the type represented by an ©otype© parameter, as well as an assignment operator, constructor, copy constructor and destructor.
     119Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls.
     120Determining if packaging all polymorphic arguments to a function into a virtual function table would reduce the runtime overhead of polymorphic calls is an open research question.
    119121
    120122Since bare polymorphic types do not provide a great range of available operations, \CFA also provides a \emph{type assertion} mechanism to provide further information about a type:
     
    129131double magic = four_times(10.5); // T is bound to double, uses (1) to satisfy type assertion
    130132\end{lstlisting}
    131 These type assertions may be either variable or function declarations which depend on a polymorphic type variable.
    132 ©four_times© can only be called with an argument for which there exists a function named ©twice© that can take that argument and return another value of the same type; a pointer to the appropriate ©twice© function will be passed as an additional implicit parameter to the call to ©four_times©.
     133These type assertions may be either variable or function declarations that depend on a polymorphic type variable.
     134©four_times© can only be called with an argument for which there exists a function named ©twice© that can take that argument and return another value of the same type; a pointer to the appropriate ©twice© function is passed as an additional implicit parameter to the call to ©four_times©.
    133135
    134136Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
    135 For instance, ©twice© could have been defined as below, using the \CFA syntax for operator overloading:
     137For instance, ©twice© could have been defined using the \CFA syntax for operator overloading as:
    136138\begin{lstlisting}
    137139forall(otype S | { ®S ?+?(S, S);® })
    138140S twice(S x) { return x + x; }  // (2)
    139141\end{lstlisting}
    140 This version of ©twice© will work for any type ©S© that has an addition operator defined for it, and it could have been used to satisfy the type assertion on ©four_times©.
     142This version of ©twice© works for any type ©S© that has an addition operator defined for it, and it could have been used to satisfy the type assertion on ©four_times©.
    141143The compiler accomplishes this by creating a wrapper function calling ©twice // (2)© with ©S© bound to ©double©, then providing this wrapper function to ©four_times©\footnote{©twice // (2)© could also have had a type parameter named ©T©; \CFA specifies renaming of the type parameters, which would avoid the name conflict with the type variable ©T© of ©four_times©.}.
    142144
    143145Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope.
    144 If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that function will be examined as a candidate for its own type assertion unboundedly repeatedly.
    145 To avoid infinite loops, the current CFA compiler imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC compilers for template expansion; this restriction means that there are some semantically well-typed expressions which cannot be resolved by CFA.
    146 One area of potential improvement this project proposes to investigate is the possibility of using the compiler's knowledge of the current set of declarations to more precicely determine when further type assertion satisfaction recursion will not produce a well-typed expression.
     146If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that function is examined as a candidate for its own type assertion unboundedly repeatedly.
     147To avoid infinite loops, the current CFA compiler imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC compilers for template expansion; this restriction means that there are some semantically well-typed expressions that cannot be resolved by CFA.
     148One area of potential improvement this project proposes to investigate is the possibility of using the compiler's knowledge of the current set of declarations to more precicely determine when further type assertion satisfaction recursion does not produce a well-typed expression.
    147149
    148150\subsubsection{Traits}
     
    168170\end{lstlisting}
    169171
    170 Semantically, a trait is simply a named list of type assertions, but one can be used for many of the same purposes that an interface in Java or an abstract base class in \CC would be used for.
     172Semantically, traits are simply a named lists of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC are used for.
    171173Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to interface implementation in Go, as opposed to the nominal inheritance model of Java and \CC.
    172174Nominal inheritance can be simulated with traits using marker variables or functions:
     
    192194struct list {
    193195    int value;
    194     list *next;  // may omit "struct" on type names in \CFA
     196    list *next;  // may omit "struct" on type names
    195197};
    196198
     
    202204\end{lstlisting}
    203205
    204 In the example above, ©list_iterator, int© satisfies ©pointer_like© by the given function, and ©list_iterator, list© also satisfies ©pointer_like© by the default pointer dereference operator.
     206In the example above, ©(list_iterator, int)© satisfies ©pointer_like© by the given function, and ©(list_iterator, list)© also satisfies ©pointer_like© by the built-in pointer dereference operator.
    205207While a nominal-inheritance system with associated types could model one of those two relationships by making ©El© an associated type of ©Ptr© in the ©pointer_like© implementation, few such systems could model both relationships simultaneously.
    206208
     
    210212
    211213\subsection{Name Overloading}
    212 In C, no more than one function or variable in the same scope may share the same name, and function or variable declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration. 
    213 This makes finding the proper declaration to match to a function application or variable expression a simple matter of symbol table lookup, which can be easily and efficiently implemented.
    214 \CFA, on the other hand, allows overloading of variable and function names, so long as the overloaded declarations do not have the same type, avoiding the multiplication of function names for different types common in the C standard library, as in the following example:
    215 \begin{lstlisting}
    216 int three = 3;
    217 double three = 3.0;
    218 
    219 int thrice(int i) { return i * three; } // uses int three
    220 double thrice(double d) { return d * three; } // uses double three
    221 
    222 // thrice(three); // ERROR: ambiguous
    223 int nine = thrice(three);    // uses int thrice and three, based on return type
    224 double nine = thrice(three); // uses double thrice and three, based on return type
    225 \end{lstlisting}
    226 
    227 The presence of name overloading in \CFA means that simple table lookup is not sufficient to match identifiers to declarations, and a type matching algorithm must be part of expression resolution.
     214In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding ©struct©, ©union©, and ©enum© tags, one holding labels, one holding typedef names, variable, function, and enumerator identifiers, and one for each ©struct© or ©union© type holding the field names.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
     215This makes finding the proper declaration to match to a variable expression or function application a simple matter of symbol table lookup, which can be easily and efficiently implemented.
     216\CFA, on the other hand, allows overloading of variable and function names, so long as the overloaded declarations do not have the same type, avoiding the multiplication of variable and function names for different types common in the C standard library, as in the following example:
     217\begin{lstlisting}
     218#include <limits.h>
     219
     220int max(int a, int b) { return a < b ? b : a; }  // (1)
     221double max(double a, double b) { return a < b ? b : a; }  // (2)
     222
     223int max = INT_MAX;     // (3)
     224double max = DBL_MAX;  // (4)
     225
     226max(7, -max);   // uses (1) and (3), by matching int type of 7
     227max(max, 3.14); // uses (2) and (4), by matching double type of 3.14
     228
     229max(max, -max);  // ERROR: ambiguous
     230int m = max(max, -max); // uses (1) and (3) twice, by return type
     231\end{lstlisting}
     232
     233The presence of name overloading in \CFA means that simple table lookup is insufficient to match identifiers to declarations, and a type matching algorithm must be part of expression resolution.
    228234
    229235\subsection{Implicit Conversions}
     
    231237C does not have a traditionally-defined inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'' define which of the built-in types are implicitly convertable to which other types, and the relative cost of any pair of such conversions from a single source type.
    232238\CFA adds to the usual arithmetic conversions rules for determining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg ©int© to ©char©, but more expensive than any \emph{safe} (widening) conversion, \eg ©int© to ©double©.
     239
    233240The expression resolution problem, then, is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
    234 Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
     241Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
     242For instance, in the example in the previous subsection, ©max(max, -max)© cannot be unambiguously resolved, but ©int m = max(max, -max);© has a single minimal-cost resolution.
     243©int m = (int)max((double)max, -(double)max)© is also be a valid interpretation, but is not minimal-cost due to the unsafe cast from the ©double© result of ©max© to ©int© (the two ©double© casts function as type ascriptions selecting ©double max© rather than casts from ©int max© to ©double©, and as such are zero-cost).
    235244
    236245\subsubsection{User-generated Implicit Conversions}
     
    238247Such a conversion system should be simple for user programmers to utilize, and fit naturally with the existing design of implicit conversions in C; ideally it would also be sufficiently powerful to encode C's usual arithmetic conversions itself, so that \CFA only has one set of rules for conversions.
    239248
    240 Ditchfield\cite{Ditchfield:conversions} has laid out a framework for using polymorphic conversion constructor functions to create a directed acyclic graph (DAG) of conversions.
     249Ditchfield~\cite{Ditchfield:conversions} has laid out a framework for using polymorphic-conversion-constructor functions to create a directed acyclic graph (DAG) of conversions.
    241250A monomorphic variant of these functions can be used to mark a conversion arc in the DAG as only usable as the final step in a conversion.
    242251With these two types of conversion arcs, separate DAGs can be created for the safe and the unsafe conversions, and conversion cost can be represented as path length through the DAG.
    243 Open research questions on this topic include whether a conversion graph can be generated that represents each allowable conversion in C with a unique minimal-length path, such that the path lengths accurately represent the relative costs of the conversions, whether such a graph representation can be usefully augmented to include user-defined types as well as built-in types, and whether the graph can be efficiently represented and included in the expression resolver.
     252\begin{figure}[h]
     253\centering
     254\includegraphics{conversion_dag}
     255\caption{A portion of the implicit conversion DAG for built-in types.}
     256\end{figure}
     257As can be seen in the example DAG above, there are either safe or unsafe paths between each of the arithmetic types listed; the ``final'' arcs are important both to avoid creating cycles in the signed-unsigned conversions, and to disambiguate potential diamond conversions (\eg, if the ©int© to ©unsigned int© conversion was not marked final there would be two length-two paths from ©int© to ©unsigned long©, and it would be impossible to choose which one; however, since the ©unsigned int© to ©unsigned long© arc can not be traversed after the final ©int© to ©unsigned int© arc, there is a single unambiguous conversion path from ©int© to ©unsigned long©).
     258
     259Open research questions on this topic include:
     260\begin{itemize}
     261\item Can a conversion graph be generated that represents each allowable conversion in C with a unique minimal-length path such that the path lengths accurately represent the relative costs of the conversions?
     262\item Can such a graph representation can be usefully augmented to include user-defined types as well as built-in types?
     263\item Can the graph can be efficiently represented and used in the expression resolver?
     264\end{itemize}
    244265
    245266\subsection{Constructors and Destructors}
    246267Rob Shluntz, a current member of the \CFA research team, has added constructors and destructors to \CFA.
    247 Each type has an overridable default-generated zero-argument constructor, copy constructor, assignment operator, and destructor; for struct types these functions each call their equivalents on each field of the struct.
     268Each type has an overridable default-generated zero-argument constructor, copy constructor, assignment operator, and destructor; for ©struct© types these functions each call their equivalents on each field of the ©struct©.
    248269This affects expression resolution because an ©otype© type variable ©T© implicitly adds four type assertions, one for each of these four functions, so assertion resolution is pervasive in \CFA polymorphic functions, even those without any explicit type assertions.
     270The following example shows the implicitly-generated code in green:
     271\begin{lstlisting}
     272struct kv {
     273    int key;
     274    char *value;
     275};
     276
     277¢void ?{}(kv *this) {
     278    ?{}(&this->key);
     279    ?{}(&this->value);
     280}
     281void ?{}(kv *this, kv that) {
     282    ?{}(&this->key, that.key);
     283    ?{}(&this->value, that.value);
     284}
     285kv ?=?(kv *this, kv that) {
     286    ?=?(&this->key, that.key);
     287    ?=?(&this->value, that.value);
     288    return *this;
     289}
     290void ^?{}(kv *this) {
     291    ^?{}(&this->key);
     292    ^?{}(&this->value);
     293
     294
     295forall(otype T ¢| { void ?{}(T*); void ?{}(T*, T); T ?=?(T*, T); void ^?{}(T*); }¢)
     296void foo(T);
     297\end{lstlisting}
    249298
    250299\subsection{Generic Types}
    251300I have already added a generic type capability to \CFA, designed to efficiently and naturally integrate with \CFA's existing polymorphic functions.
    252 A generic type can be declared by placing a ©forall© specifier on a struct or union declaration, and instantiated using a parenthesized list of types after the type name:
     301A generic type can be declared by placing a ©forall© specifier on a ©struct© or ©union© declaration, and instantiated using a parenthesized list of types after the type name:
    253302\begin{lstlisting}
    254303forall(otype R, otype S) struct pair {
     
    266315The default-generated constructors, destructor and assignment operator for a generic type are polymorphic functions with the same list of type parameters as the generic type definition.
    267316
    268 Aside from giving users the ability to create more parameterized types than just the built-in pointer, array and function types, the combination of generic types with polymorphic functions and implicit conversions makes the edge case where a polymorphic function can match its own assertions much more common, as follows:
     317Aside from giving users the ability to create more parameterized types than just the built-in pointer, array and function types, the combination of generic types with polymorphic functions and implicit conversions makes the edge case where the resolver may enter an infinite loop much more common, as in the following code example:
     318\begin{lstlisting}
     319forall(otype T) struct box { T x; };
     320
     321void f(void*); // (1)
     322
     323forall(otype S)
     324void f(box(S)* b) { // (2)
     325        f(®(void*)0®);
     326}
     327\end{lstlisting}
     328
     329The loop in the resolver happens as follows:
    269330\begin{itemize}
    270 \item Given an expression in an untyped context, such as a top-level function call with no assignment of return values, apply a polymorphic implicit conversion to the expression that can produce multiple types (the built-in conversion from ©void*© to any other pointer type is one, but not the only).
    271 \item When attempting to use a generic type with ©otype© parameters (such as ©box© above) for the result type of the expression, the resolver will also need to decide what type to use for the ©otype© parameters on the constructors and related functions, and will have no constraints on what they may be.
    272 \item Attempting to match some yet-to-be-determined specialization of the generic type to this ©otype© parameter will create a recursive case of the default constructor, \etc matching their own type assertions, creating an unboundedly deep nesting of the generic type inside itself.
     331\item Since there is an implicit conversion from ©void*© to any pointer type, the highlighted expression can be interpreted as either a ©void*©, matching ©f // (1)©, or a ©box(S)*© for some type ©S©, matching ©f // (2)©.
     332\item To determine the cost of the ©box(S)© interpretation, a type must be found for ©S© which satisfies the ©otype© implicit type assertions (assignment operator, default and copy constructors, and destructor); one option is ©box(S2)© for some type ©S2©.
     333\item The assignment operator, default and copy constructors, and destructor of ©box(T)© are also polymorphic functions, each of which require the type parameter ©T© to have an assignment operator, default and copy constructors, and destructor. When choosing an interpretation for ©S2©, one option is ©box(S3)©, for some type ©S3©.
     334\item The previous step repeats until stopped, with four times as much work performed at each step.
    273335\end{itemize}
    274 As discussed above, any \CFA expression resolver must handle this possible infinite recursion somehow, but the combination of generic types with other language features makes this particular edge case occur somewhat frequently in user code.
     336This problem can occur in any resolution context where a polymorphic function that can satisfy its own type assertions is required for a possible interpretation of an expression with no constraints on its type, and is thus not limited to combinations of generic types with ©void*© conversions, though constructors for generic types often satisfy their own assertions and a polymorphic conversion such as the ©void*© conversion to a polymorphic variable can create an expression with no constraints on its type.
     337As discussed above, the \CFA expression resolver must handle this possible infinite recursion somehow, and it occurs fairly naturally in code like the above that uses generic types.
    275338
    276339\subsection{Tuple Types}
    277 \CFA adds \emph{tuple types} to C, a facility for referring to multiple values with a single identifier.
    278 A variable may name a tuple, and a function may return one.
     340\CFA adds \emph{tuple types} to C, a syntactic facility for referring to lists of values anonymously or with a single identifier.
     341An identifier may name a tuple, and a function may return one.
    279342Particularly relevantly for resolution, a tuple may be implicitly \emph{destructured} into a list of values, as in the call to ©swap© below:
    280343\begin{lstlisting}
     
    285348
    286349x = swap( x ); // destructure [char, char] x into two elements of parameter list
    287 // can't use int x for parameter, not enough arguments to swap
     350// cannot use int x for parameter, not enough arguments to swap
    288351\end{lstlisting}
    289352Tuple destructuring means that the mapping from the position of a subexpression in the argument list to the position of a paramter in the function declaration is not straightforward, as some arguments may be expandable to different numbers of parameters, like ©x© above.
     
    293356Given some type ©T©, a ©T&© (``reference to ©T©'') is essentially an automatically dereferenced pointer; with these semantics most of the C standard's discussions of lvalues can be expressed in terms of references instead, with the benefit of being able to express the difference between the reference and non-reference version of a type in user code.
    294357References preserve C's existing qualifier-dropping lvalue-to-rvalue conversion (\eg a ©const volatile int&© can be implicitly converted to a bare ©int©); the reference proposal also adds a rvalue-to-lvalue conversion to \CFA, implemented by storing the value in a new compiler-generated temporary and passing a reference to the temporary.
    295 These two conversions can chain, producing a qualifier-dropping conversion for references, for instance converting a reference to a ©const int© into a reference to a non-©const int© by copying the originally refered to value into a fresh temporary and taking a reference to this temporary.
     358These two conversions can chain, producing a qualifier-dropping conversion for references, for instance converting a reference to a ©const int© into a reference to a non-©const int© by copying the originally refered to value into a fresh temporary and taking a reference to this temporary, as below:
     359\begin{lstlisting}
     360const int magic = 42;
     361
     362void inc_print( int& x ) { printf("%d\n", ++x); }
     363
     364print_inc( magic ); // legal; implicitly generated code in green below:
     365
     366¢int tmp = magic;¢ // copies to safely strip const-qualifier
     367¢print_inc( tmp );¢ // tmp is incremented, magic is unchanged
     368\end{lstlisting}
    296369These reference conversions may also chain with the other implicit type conversions.
    297370The main implication of this for expression resolution is the multiplication of available implicit conversions, though in a restricted context that may be able to be treated efficiently as a special case.
    298371
    299 \subsection{Literal Types}
    300 Another proposal currently under consideration for the \CFA type-system is assigning special types to the literal values ©0© and ©1©.%, say ©zero_t© and ©one_t©.
    301 Implicit conversions from these types would allow ©0© and ©1© to be considered as values of many different types, depending on context, allowing expression desugarings like ©if ( x ) {}© $\Rightarrow$ ©if ( x != 0 ) {}© to be implemented efficiently and precicely.
    302 This is a generalization of C's existing behaviour of treating ©0© as either an integer zero or a null pointer constant, and treating either of those values as boolean false.
    303 The main implication for expression resolution is that the frequently encountered expressions ©0© and ©1© may have a significant number of valid interpretations.
     372\subsection{Special Literal Types}
     373Another proposal currently under consideration for the \CFA type-system is assigning special types to the literal values ©0© and ©1©.
     374Implicit conversions from these types allow ©0© and ©1© to be considered as values of many different types, depending on context, allowing expression desugarings like ©if ( x ) {}© $\Rightarrow$ ©if ( x != 0 ) {}© to be implemented efficiently and precicely.
     375This approach is a generalization of C's existing behaviour of treating ©0© as either an integer zero or a null pointer constant, and treating either of those values as boolean false.
     376The main implication for expression resolution is that the frequently encountered expressions ©0© and ©1© may have a large number of valid interpretations.
    304377
    305378\subsection{Deleted Function Declarations}
     
    308381int somefn(char) = delete;
    309382\end{lstlisting}
     383This feature is typically used in \CCeleven to make a type non-copyable by deleting its copy constructor and assignment operator, or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads.
    310384To add a similar feature to \CFA would involve including the deleted function declarations in expression resolution along with the normal declarations, but producing a compiler error if the deleted function was the best resolution.
    311385How conflicts should be handled between resolution of an expression to both a deleted and a non-deleted function is a small but open research question.
Note: See TracChangeset for help on using the changeset viewer.