Changeset 9fbc40e


Ignore:
Timestamp:
Apr 8, 2025, 2:14:36 PM (6 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
597ddfeb
Parents:
7d405eb
Message:

proofread future chapter

Location:
doc/theses/fangren_yu_MMath
Files:
1 deleted
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/fangren_yu_MMath/future.tex

    r7d405eb r9fbc40e  
    11\chapter{Future Work}
    22
    3 These are some feature requests related to type system enhancement that come up during the development of \CFA language and library, but have not been implemented yet. A lot of the current implementations have to work around the limits of the language features and sometimes lead to inefficiency.
     3\vspace*{-18pt}
     4The following are feature requests related to type-system enhancements that have surfaced during the development of the \CFA language and library, but have not been implemented yet.
     5Currently, developers must work around these missing features, sometimes resulting in inefficiency.
     6
    47
    58\section{Closed trait types}
    69
    7 \CFA as it currently is does not have any closed types, as new functions can be added at any time. It is also possible to locally declare a function,\footnote{Local functions are not a standard feature in C but supported by mainstream C compilers such as gcc, and allowed in \CFA too.} or a function pointer variable to make a type satisfy a certain trait temporarily and be used as such in this limited scope. However, the lack of closed types in such a "duck typing" scheme proposes two problems. For library implementors, it is common to not want the defined set of operations to be overwritten and cause the behaviour of polymorphic invocations to change. For the compiler, it means caching and reusing the result of resolution is not reliable as newly introduced declarations can participate in assertion resolution, making a previously invalid expression valid, or the other way around by introducing ambiguity in assertions. Sometimes those interfaces are fairly complicated, for example the I/O library traits \textbf{istream} and \textbf{ostream} each has over 20 operations. Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation in the source code (mainly the pipe operator \textbf{?|?} used to represent stream operations in \CFA) it has to resolve the same set of assertions again, causing a lot of repetitive work. Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit. Introducing a way to eliminate the need of doing such repetitive assertion resolutions that are very unlikely to change by new overloads can therefore provide significant improvement to the performance of the compiler.
     10Currently, \CFA does not have any closed types, as open type are the basis of its unique type-system, allowing new functions to be added at any time to override existing ones for trait satisfaction.
     11Locally-declared nested-functions,\footnote{
     12Nested functions are not a feature in C but supported by \lstinline{gcc} for multiple decades and are used heavily in \CFA.}
     13or a function-pointer variable can be used to make a type satisfy a certain trait temporarily in a limited lexical scope.
     14However, the lack of closed types in such a \emph{duck typing} scheme presents two problems.
     15\begin{enumerate}[leftmargin=*]
     16\item
     17Library implementers normally do not want users to override certain operations and cause the behaviour of polymorphic invocations to change.
     18\item
     19Caching and reusing resolution results in the compiler is effected, as newly introduced declarations can participate in assertion resolution;
     20as a result, previously invalid subexpressions suddenly become valid, or alternatively cause ambiguity in assertions.
     21\end{enumerate}
     22Sometimes those interfaces are fairly complicated, \eg the \CFA I/O library traits @istream@ and @ostream@ each have over 20 operations.
     23Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation (denoted by the pipe operator @?|?@ like the @<<@, @>>@ stream operators in \CC) it has to resolve the same set of assertions again, causing a lot of repetitive work.
     24Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit.
     25Hence, eliminating repetitive assertion resolution, which are very unlikely to change by new overloads, can provide a significant performance benefit.
    826
    9 The output stream trait in \CFA looks like this:
     27\VRef[Figure]{f:basicOstreamTrait} shows the basic output-stream trait in \CFA and the overloaded I/O operator is declared as:
     28\begin{cfa}
     29forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype &, int ); // one for each basic type
     30\end{cfa}
     31The \CFA polymorphic calling convention means all of these trait functions are repeatedly resolved at each output-operator call-site and then become implicit arguments of the call.
     32This creates both compile-time and run-time overheads.
     33As stated, resolving assertions takes the most amount of time during compilation, and it is common to see in the current codebase and test suite that the majority of compilation time is spent in I/O statements.
     34With closed types, the I/O assertions can be resolved once and this information reused everywhere.
     35Furthermore, closed types can have a different calling convention of pushing a virtual-table pointer rather than individual arguments onto the stack.
     36Although for I/O, this benefit might not be significant, since these operations involve system calls that are inherently slow.
    1037
     38\begin{figure}
     39\begin{cfa}[escapechar={\#}]
     40forall( ostype & ) @trait@ basic_ostream {
     41        // private
     42        bool sepPrt$( ostype & );                                       #\C[3.35in]{// get separator state (on/off)}#
     43        void sepReset$( ostype & );                                     #\C{// set separator state to default state}#
     44        void sepReset$( ostype &, bool );                       #\C{// set separator and default state}#
     45        const char * sepGetCur$( ostype & );            #\C{// get current separator string}#
     46        void sepSetCur$( ostype &, const char [] );     #\C{// set current separator string}#
     47        bool getNL$( ostype & );                                        #\C{// get newline}#
     48        bool setNL$( ostype &, bool );                          #\C{// set newline}#
     49        bool getANL$( ostype & );                                       #\C{// get auto newline (on/off)}#
     50        bool setANL$( ostype &, bool );                         #\C{// set auto newline (on/off)}#
     51        bool getPrt$( ostype & );                                       #\C{// get fmt called in output cascade}#
     52        bool setPrt$( ostype &, bool );                         #\C{// set fmt called in output cascade}#
     53        // public
     54        void nlOn( ostype & );                                          #\C{// turn auto-newline state on}#
     55        void nlOff( ostype & );                                         #\C{// turn auto-newline state off}#
     56        void sep( ostype & );                                           #\C{// turn separator state on}#
     57        void nosep( ostype & );                                         #\C{// turn separator state off}#
     58        bool sepOn( ostype & );                                         #\C{// set default state to on}#
     59        bool sepOff( ostype & );                                        #\C{// set default state to off}#
     60        const char * sepGet( ostype & );                        #\C{// get separator string}#
     61        void sepSet( ostype &, const char [] );         #\C{// set separator to string (15 maximum)}#
     62        const char * sepGetTuple( ostype & );           #\C{// get tuple separator string}#
     63        void sepSetTuple( ostype &, const char [] );#\C{// set tuple separator to string (15 maximum)}#
     64        void ends( ostype & );                                          #\C{// end of output statement}\CRT#
     65        int fmt( ostype &, const char format[], ... ) __attribute__(( format(printf, 2, 3) ));
     66};
     67\end{cfa}
     68\caption{\lstinline{basic_ostream} trait}
     69\label{f:basicOstreamTrait}
     70\end{figure}
     71
     72A \CFA closed trait type is similar to a Haskell type class requiring an explicit instance declaration.
     73The syntax for the closed trait might look like:
    1174\begin{cfa}
    12 forall( ostype & )
    13 trait basic_ostream {
    14         // private
    15         bool sepPrt$\$$( ostype & );                                                    // get separator state (on/off)
    16    
    17         void sepReset$\$$( ostype & );                                                  // set separator state to default state
    18         void sepReset$\$$( ostype &, bool );                                    // set separator and default state
    19         const char * sepGetCur$\$$( ostype & );                         // get current separator string
    20         void sepSetCur$\$$( ostype &, const char [] );                  // set current separator string
    21         bool getNL$\$$( ostype & );                                                     // get newline
    22         bool setNL$\$$( ostype &, bool );                                               // set newline
    23         bool getANL$\$$( ostype & );                                                    // get auto newline (on/off)
    24         bool setANL$\$$( ostype &, bool );                                              // set auto newline (on/off), and return previous state
    25         bool getPrt$\$$( ostype & );                                                    // get fmt called in output cascade
    26         bool setPrt$\$$( ostype &, bool );                                              // set fmt called in output cascade
    27         // public
    28         void nlOn( ostype & );                                                          // turn auto-newline state on
    29         void nlOff( ostype & );                                                         // turn auto-newline state off
    30 
    31         void sep( ostype & );                                                           // turn separator state on
    32         void nosep( ostype & );                                                         // turn separator state off
    33         bool sepOn( ostype & );                                                         // set default state to on, and return previous state
    34         bool sepOff( ostype & );                                                        // set default state to off, and return previous state
    35         const char * sepGet( ostype & );                                        // get separator string
    36         void sepSet( ostype &, const char [] );                         // set separator to string (15 character maximum)
    37         const char * sepGetTuple( ostype & );                           // get tuple separator string
    38         void sepSetTuple( ostype &, const char [] );            // set tuple separator to string (15 character maximum)
    39 
    40         void ends( ostype & );                                                          // end of output statement
    41         int fmt( ostype &, const char format[], ... ) __attribute__(( format(printf, 2, 3) ));
    42 }; // basic_ostream
    43 
     75forall( ostype & ) @closed@ trait basic_ostream { /* above declarations */ };
     76@trait@ basic_ostream struct ofstream { ... }; // bind trait to type, implement basic_ostream
     77\end{cfa}
     78At the trait declaration, all the required trait functions must be inferred and specialized in the usual way.
     79The list of inferred and specialized functions is then saved as an array of function pointers (vtable), and polymorphic functions using the closed type are passed the vtable, instead of a list of implicit parameters.
     80\begin{cfa}
     81ostype ?|?( ostype &, int, @basic_ostream_vtable *@ ); // parametric vtable
     82sout | 42;   /* becomes */   ?|?( sout, 42, @ofstream_trait@ );  // generated for closed type ofstream
    4483\end{cfa}
    4584
    46 and the overloaded output operator is declared as
    47 
    48 \begin{cfa}
    49 forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype &, int ); // also defined for all other basic types
    50 \end{cfa}
    51 
    52 The \CFA polymorphic calling convention makes the output operator take all the trait functions above as implicit arguments, and they are repeatedly resolved every time an output operator is called. This creates both compile-time and run-time overheads. \CFA compilation takes the most amount of time in resolving assertions, and it is not uncommon to see in the current codebase and test suite that the majority of time spent in compiling a program is due to a series of I/O statements, while resolving those assertions for I/O operators could have been a once and done job. Repeatedly pushing tens of arguments onto the stack to call these operators at run-time can also have some impact on performance, although in the I/O case it is not as significant, since these operations involve system calls which are already slow.
    53 
    54 An addition of closed trait type will make a \CFA trait behave similarly to a Haskell type class and require an explicit instance declaration as well. The syntax for closed trait is not settled yet, but it may look something like the following:
    55 
    56 \begin{cfa}
    57     forall(ostype &)
    58     closed trait basic_ostream{
    59         // all the above declarations
    60     };
    61 
    62     struct ofstream {
    63         // ...
    64     };
    65 
    66     // implementation of trait functions for ofstream
    67 
    68     __cfa_trait_instance(ofstream, basic_ostream);
    69 \end{cfa}
    70 
    71 At the point of instance declaration, all the required trait functions must be declared in scope, and they are inferred and specialized in the usual way. The list of inferred and specialized functions will then be saved into an array of function pointers, and polymorphic functions using the closed trait type can simply take the instance pointer, instead of a list of implicit parameters.
    72 
    73 \begin{cfa}
    74     // original declaration
    75     forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype & os, int i);
    76 
    77     // translated code with closed trait
    78     void* ?|?(void* os, int i, void* _instance_basic_ostream);
    79     // polymorphic pointer and reference types are erased
    80 
    81     // call site
    82     sout | 42;
    83    
    84     // generated code
    85     ?|?(sout, 42, __cfa_instance_ofstream__basic_ostream /* generated by __cfa_trait_instance decl */);
    86 \end{cfa}
    87 
    88 Introducing closed trait types also comes with the additional benefit of the ability to achieve better encapsulation. In the above @basic_ostream@ example, the functions ending with dollar sign \$ are meant to be private (internal to the library) and the client code should not be able to see or call those functions directly. This is impossible with the current trait and assertion system, because the satisfying declarations must be visible at call site for the polymorphic call to resolve. With closed traits and instance types, it will be possible to forward declare a closed trait (which would not even make sense otherwise) and the instance pointer of a type implementing the trait without exposing any details to the client, since all the information necessary to call the trait functions are now captured in a single pointer to a table of functions built at compile time, and the type of that pointer is made opaque to the client.
    89 
    90 
     85Closed types can provide additional encapsulation.
     86In \VRef[Figure]{f:basicOstreamTrait}, the functions ending with dollar sign @$@ are meant to be private (internal to the library) so client code should not be able to see or call these functions directly.
     87This kind of information hiding is impossible with the current trait and assertion system, because the trait information must be visible at the call site for the polymorphic call to resolve.
     88With closed traits and types, it is possible to forward declare a closed trait (which does not make sense otherwise) and its vtable without exposing any details to the client.
     89That is, all the information necessary to call trait functions is now captured in a single vtable pointer built at compile time and the type of the vtable pointer is opaque.
    9190
    9291
    9392\section{Associated Types}
    9493
    95 The analysis presented in section 4.3 shows that if we mandate that all type parameters have to be bound before assertion resolution, the complexity of resolving assertions become much lower as every assertion parameter can be resolved independently. By fully utilizing information from higher up the expression tree for return value overloading, most of the type bindings can be resolved. However there are certain scenarios where we would like to have some intermediate types to be involved in certain operations, that are neither input nor output types.
     94The analysis presented in \VRef{s:AssertionSatisfaction} shows if all type parameters have to be bound before assertion resolution, the complexity of resolving assertions become much lower as every assertion parameter can be resolved independently.
     95That is, by utilizing information from higher up the expression tree for return value overloading, most of the type bindings can be resolved.
     96However, there are scenarios where some intermediate types need to be involved in certain operations, which are neither input nor output types.
    9697
    97 \CFA standard library provides a few polymorphic container types similar to those found in \CC standard template library. Naturally, we would like to have a harmonized iterator interface for different container types. The feature is still under development and has not been finalized, but for the purpose of this discussion, we will be using a similar approach to the \CC standard iterator contract. The equivalent type signatures can be given in \CFA trait as:
     98\CFA standard library provides a few polymorphic container types similar to those found in the \CC standard library.
     99Like \CC, \CFA would like to have a harmonized iterator interface for different container types.
     100While this feature is still under development, the following trait outline is used to present the problem.
     101\begin{cfa}
     102forall( Container, Elem, Iter ) trait iterable {
     103        @Iter@ begin( Container );
     104        @Iter@ end( Container );
     105        @Elem@ & *?( Iter );
     106        Iter ++?( Iter );
     107        bool ?==?( Iter, Iter );
     108};
     109\end{cfa}
     110These are the \CC operators required for for-loop comprehension, \eg @for (Elem & e : container)@.
     111Note, the iterable trait is meant to describe a property for a container type, but involves two additional types, one for the element type and one for the iterator type.
     112However, @Iter@ is intended to be deduced from the @begin@ and @end@ operations on the container type;
     113the same can be said for the element type @Elem@.
     114The alternative is to make @Elem@ and @Iter@ unbound type-parameters by removing them from the @forall@.
     115As mentioned, unbound type-parameters cause problems, but that is the only way to describe the iterable property.
    98116
     117To solve this problem, I propose adding associate-type declarations in addition to the existing unbounded @forall@ type parameter.
     118In the body of a trait declaration, new types can be introduced as the return type of an expression involving type parameters.
    99119\begin{cfa}
    100     forall (Container, Iter, Elem) trait iterable {
    101         Iter begin(Container);
    102         Iter end(Container);
    103         Elem& *?(Iter);
    104         Iter ++?(Iter);
    105         bool ?==?(Iter,Iter);
    106     }
     120forall( Container ) trait iterable {
     121        @type@ Iter begin( Container );
     122        Iter end( Container );
     123        @type@ Elem & *?( Iter );
     124        Iter ++?( Iter );
     125        bool ?==?( Iter, Iter );
     126};
    107127\end{cfa}
     128This matches conceptually with the iterable trait having only the container type, rather than three types.
     129The iterator type and element type are now viewed as properties of the container type not independent type parameters.
    108130
    109 These are the exact operators required in \CC for the for-loop comprehension
    110 @for (Elem & e: container)@. The problem with this trait declaration is that the iterator type @Iter@ has to be explicitly given in the trait parameter list, but is intended to be deduced from the @begin@ and @end@ operations on the container type; and the same can be said for the element type. The iterable trait is meant to describe a property for the container type but involves two additional types, one for the iterator type and one for the element type. If we were to disallow unbound type parameters in assertions without adding any features to the type system, we will not be able to describe this iterable property in \CFA.
     131There remains one design decision to be made: whether the expression that defines an associated type needs to be uniquely resolved.
     132If the associated type does not appear in the parameter or return types, this requirement seems reasonable.
     133The reason is that the expression cost does not consider any conversions that occur in assertion parameter matching, which essentially means having multiple ways to resolve an associated type always results in ambiguity.
     134The interesting case is when the associated type also appears in the return type, where both resolving the return type overload based on context and resolving the assertion that defines the associate type can help;
     135for the whole expression to resolve, they must agree on a common result.
    111136
    112 To solve this problem, I propose adding associate type declarations in addition to the existing (free) @forall@ type parameters. In the body of a trait declaration, new types can be introduced as the return type of an expression involving the type parameters. Following the above example, the iterator contract could be rewritten to
     137Moss gave the following example to illustrate \CFA assertion system's expressiveness.
    113138\begin{cfa}
    114     forall (Container) trait iterable {
    115         type Iter = begin(Container);
    116         type Elem& = *?(Iter);
    117         Iter end(Container);
    118         Iter ++?(Iter);
    119         bool ?==?(Iter,Iter);
    120     }
     139forall( Ptr, Elem ) trait pointer_like {
     140        Elem & *?( Ptr ); // Ptr can be dereferenced to Elem
     141};
     142struct list {
     143        int value;
     144        list * next;
     145};
     146typedef list * list_iterator;
     147int & *?( list_iterator it ) {
     148        return it->value;
     149}
    121150\end{cfa}
     151This can be viewed as having an associated type, @Elem@, as return, which can potentially vary based on context.
     152Note that the type @list *@ satisfies both @pointer_like( list *, int )@ and @pointer_like( list *,@ @list )@ (the latter by the built-in pointer dereference operator) and the expression @*it@ can be either a @struct list@ or an @int@.
     153Requiring associated types to be unique makes the @pointer_like@ trait not applicable to @list *@, which is undesirable.
     154I have not attempted to implement associated types in \CFA compiler, but based on the above discussions, one option is to make associated type resolution and return type overloading coexist:
     155when the associated type appears in returns, it is deduced from the context and then verify the trait with ordinary assertion resolution;
     156when it does not appear in the returns, the type is required to be uniquely determined by the expression that defines the associated type.
    122157
    123 This matches conceptually better that the iterable trait is about one type (the container) rather than about three types. The iterator type and element type can be viewed as properties of the container types, not independent type parameters.
    124 
    125 There remains one design decision to be made, that is whether the expressions that define associated types need to be uniquely resolved. If the associated type does not appear in parameter or return types, having such requirement seems to be reasonable, because the expression cost does not consider any conversions that occur in assertion parameter matching, which essentially means having multiple ways to resolve an associated type always results in ambiguity. A more interesting case would be that the associated type also appears in the return type, where both resolving the return type overload based on context and resolving the assertion that defines the associate type can help, and for the whole expression to resolve, they must agree on a common result. Moss gave the following example to illustrate \CFA assertion system's expressiveness that could be viewed as having an associated type as return, and can potentially vary based on the context:
    126 
    127 \begin{cfa}
    128     forall(Ptr, Elem) trait pointer_like {
    129         Elem& *?(Ptr); // Ptr can be dereferenced to Elem
    130     };
    131     struct list {
    132         int value;
    133         list* next; // may omit struct on type names
    134     };
    135     typedef list* list_iterator;
    136     int& *?(list_iterator it) {
    137         return it->value;
    138     }
    139 \end{cfa}
    140 
    141 Note that the type @list*@ satisfies both @pointer_like(list*, int)@ and @pointer_like(list*, list)@ (the latter by the built-in pointer dereference operator) and the expression @*it@ could be either a @struct list@ or an @int@. Requiring associated types to be unique would make the @pointer_like@ trait inapplicable to @list*@ here, which is not desirable. I have not attempted to implement associated types in \CFA compiler, but based on the above discussions, one option could be to make associated type resolution and return type overloading coexist: when the associated type appears in the returns, we deduce it from the context and then verify the trait with ordinary assertion resolution; when it does not appear in the returns, we instead require the type to be uniquely determined by the expression that defines the associated type.
    142158
    143159\section{User-defined conversions}
     160
     161Missing type-system feature is a scheme for user-defined conversions.
     162Conversion means one type goes through an arbitrary complex process of changing its value to some meaningful value in another type.
     163Because the conversion process can be arbitrarily complex, it requires the power of a function.
     164Arbitrary conversions are already possible through standalone named functions.
     165\begin{cfa}
     166S convert( T );  $\C[2in]{// change T to S}$
     167S s;   T t;
     168s = convert( t );
     169\end{cfa}
     170C has implicit/explicit conversions for the builtin types.
     171\begin{cfa}
     172int i = 3.5;  $\C{// implicit, change the floating-point bits to integral bits}$
     173int i = (int)3.5 + 17;  $\C{// explicit, change the floating-point bits to integral bits}$
     174\end{cfa}
     175When the conversion occurs in expression evaluation can significantly affect the result, where function call has higher precedence than cast.
     176Overloading the cast operator would allow user-defined conversions to look and feel, like regular C casts, as is provided in \CC.
     177\begin{cfa}[escapechar={}]
     178S ?@$@( T ) { /* change T to S */ }
     179s = t;
     180\end{cfa}
     181% $ stop formatting
     182Similarly, constructors can be used for implicit conversion, as in \CC.
     183\begin{cfa}
     184void ?{}( S s, T t ) { ... } $\C{// convert T to S}$
     185s = t;  $\C{// rewritten as s = ?{}( s, t )}\CRT$
     186\end{cfa}
     187To integrate properly with the \CFA conversion model, there would need to distinguish between safe and unsafe conversion, and possibly a way to denote conversions as explicit-only or non-chainable, otherwise conversion cycles are possible.
  • doc/theses/fangren_yu_MMath/resolution.tex

    r7d405eb r9fbc40e  
    478478
    479479\section{Assertion Satisfaction}
     480\label{s:AssertionSatisfaction}
    480481
    481482The assertion-satisfaction problem greatly increases the complexity of \CFA expression resolution.
Note: See TracChangeset for help on using the changeset viewer.