Ignore:
Timestamp:
Dec 8, 2024, 9:32:45 AM (10 days ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
50be6444
Parents:
fbb5bdd
Message:

found some initial material for section Polymorphism

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/fangren_yu_MMath/intro.tex

    rfbb5bdd r18a7dcf1  
    291291\section{Polymorphism}
    292292
     293\CFA provides polymorphic functions and types, where the polymorphic function can be the constraints types using assertions based on traits.
     294
     295\subsection{\texorpdfstring{\protect\lstinline{forall} functions}{forall functions}}
     296\label{sec:poly-fns}
     297
     298The signature feature of \CFA is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96} with functions generalized using a @forall@ clause (giving the language its name).
     299\begin{cfa}
     300@forall( T )@ T identity( T val ) { return val; }
     301int forty_two = identity( 42 );         $\C{// T is bound to int, forty\_two == 42}$
     302\end{cfa}
     303This @identity@ function can be applied to any complete \newterm{object type} (or @otype@).
     304The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type.
     305The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor, and destructor.
     306If this extra information is not needed, for instance, for a pointer, the type parameter can be declared as a \newterm{data type} (or @dtype@).
     307
     308In \CFA, the polymorphic runtime cost is spread over each polymorphic call, because more arguments are passed to polymorphic functions;
     309the experiments in Section~\ref{sec:eval} show this overhead is similar to \CC virtual function calls.
     310A design advantage is that, unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
     311
     312Since bare polymorphic types provide a restricted set of available operations, \CFA provides a \newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
     313For example, the function @twice@ can be defined using the \CFA syntax for operator overloading.
     314\begin{cfa}
     315forall( T @| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }  $\C{// ? denotes operands}$
     316int val = twice( twice( 3.7 ) );  $\C{// val == 14}$
     317\end{cfa}
     318This works for any type @T@ with a matching addition operator.
     319The polymorphism is achieved by creating a wrapper function for calling @+@ with the @T@ bound to @double@ and then passing this function to the first call of @twice@.
     320There is now the option of using the same @twice@ and converting the result into @int@ on assignment or creating another @twice@ with the type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Cormack81,Baker82,Ada} in its type analysis.
     321The first approach has a late conversion from @double@ to @int@ on the final assignment, whereas the second has an early conversion to @int@.
     322\CFA minimizes the number of conversions and their potential to lose information;
     323hence, it selects the first approach, which corresponds with C programmer intuition.
     324
     325Crucial to the design of a new programming language are the libraries to access thousands of external software features.
     326Like \CC, \CFA inherits a massive compatible library base, where other programming languages must rewrite or provide fragile interlanguage communication with C.
     327A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted float array.
     328\begin{cfa}
     329void * bsearch( const void * key, const void * base, size_t nmemb, size_t size,
     330                                int (* compar)( const void *, const void * ));
     331int comp( const void * t1, const void * t2 ) {
     332         return *(double *)t1 < *(double *)t2 ? -1 : *(double *)t2 < *(double *)t1 ? 1 : 0;
     333}
     334double key = 5.0, vals[10] = { /* 10 sorted float values */ };
     335double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp ); $\C{// search sorted array}$
     336\end{cfa}
     337This can be augmented simply with generalized, type-safe, \CFA-overloaded wrappers.
     338\begin{cfa}
     339forall( T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) {
     340        int comp( const void * t1, const void * t2 ) { /* as above with double changed to T */ }
     341        return (T *)bsearch( &key, arr, size, sizeof(T), comp );
     342}
     343forall( T | { int ?<?( T, T ); } ) unsigned int bsearch( T key, const T * arr, size_t size ) {
     344        T * result = bsearch( key, arr, size ); $\C{// call first version}$
     345        return result ? result - arr : size; $\C{// pointer subtraction includes sizeof(T)}$
     346}
     347double * val = bsearch( 5.0, vals, 10 ); $\C{// selection based on return type}$
     348int posn = bsearch( 5.0, vals, 10 );
     349\end{cfa}
     350The nested function @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
     351% FIX
     352Providing a hidden @comp@ function in \CC is awkward as lambdas do not use C calling conventions and template declarations cannot appear in block scope.
     353In addition, an alternate kind of return is made available: position versus pointer to found element.
     354\CC's type system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a template @bsearch@.
     355
     356\CFA has replacement libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions, all without rewriting the actual computations (see Section~\ref{sec:libraries}).
     357For example, it is possible to write a type-safe \CFA wrapper @malloc@ based on the C @malloc@, where the return type supplies the type/size of the allocation, which is impossible in most type systems.
     358\begin{cfa}
     359forall( T & | sized(T) ) T * malloc( void ) { return (T *)malloc( sizeof(T) ); }
     360// select type and size from left-hand side
     361int * ip = malloc();  double * dp = malloc();  struct S {...} * sp = malloc();
     362\end{cfa}
     363
     364Call site inferencing and nested functions provide a localized form of inheritance.
     365For example, the \CFA @qsort@ only sorts in ascending order using @<@.
     366However, it is trivial to locally change this behavior.
     367\begin{cfa}
     368forall( T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
     369int main() {
     370        int ?<?( double x, double y ) { return x @>@ y; } $\C{// locally override behavior}$
     371        qsort( vals, 10 );                                                      $\C{// descending sort}$
     372}
     373\end{cfa}
     374The local version of @?<?@ performs @?>?@ overriding the built-in @?<?@ so it is passed to @qsort@.
     375Therefore, programmers can easily form local environments, adding and modifying appropriate functions, to maximize the reuse of other existing functions and types.
     376
     377To reduce duplication, it is possible to distribute a group of @forall@ (and storage-class qualifiers) over functions/types, such that each block declaration is prefixed by the group (see the example in Appendix~\ref{s:CforallStack}).
     378\begin{cfa}
     379forall( @T@ ) {                                                 $\C{// distribution block, add forall qualifier to declarations}$
     380        struct stack { stack_node(@T@) * head; };       $\C{// generic type}$
     381        inline {                                                                        $\C{// nested distribution block, add forall/inline to declarations}$
     382                void push( stack(@T@) & s, @T@ value ) ...      $\C{// generic operations}$
     383        }
     384}
     385\end{cfa}
     386
     387
     388\subsection{Traits}
     389
     390\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
     391\begin{cquote}
     392\lstDeleteShortInline@%
     393\begin{tabular}{@{}l@{\hspace{\parindentlnth}}|@{\hspace{\parindentlnth}}l@{}}
     394\begin{cfa}
     395trait @sumable@( T ) {
     396        void @?{}@( T &, zero_t ); // 0 literal constructor
     397        T ?+?( T, T );                   // assortment of additions
     398        T @?+=?@( T &, T );
     399        T ++?( T & );
     400        T ?++( T & );
     401};
     402\end{cfa}
     403&
     404\begin{cfa}
     405forall( T @| sumable( T )@ ) // use trait
     406T sum( T a[$\,$], size_t size ) {
     407        @T@ total = { @0@ };  // initialize by 0 constructor
     408        for ( size_t i = 0; i < size; i += 1 )
     409                total @+=@ a[i]; // select appropriate +
     410        return total;
     411}
     412\end{cfa}
     413\end{tabular}
     414\lstMakeShortInline@%
     415\end{cquote}
     416
     417Note that the @sumable@ trait does not include a copy constructor needed for the right side of @?+=?@ and return;
     418it is provided by @otype@, which is syntactic sugar for the following trait.
     419\begin{cfa}
     420trait otype( T & | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
     421        void ?{}( T & );                                                $\C{// default constructor}$
     422        void ?{}( T &, T );                                             $\C{// copy constructor}$
     423        void ?=?( T &, T );                                             $\C{// assignment operator}$
     424        void ^?{}( T & );                                               $\C{// destructor}$
     425};
     426\end{cfa}
     427Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack allocatable, default or copy initialized, assigned, and deleted.
     428
     429In summation, the \CFA type system uses \newterm{nominal typing} for concrete types, matching with the C type system, and \newterm{structural typing} for polymorphic types.
     430Hence, trait names play no part in type equivalence;
     431the names are simply macros for a list of polymorphic assertions, which are expanded at usage sites.
     432Nevertheless, trait names form a logical subtype hierarchy with @dtype@ at the top, where traits often contain overlapping assertions, \eg operator @+@.
     433Traits are used like interfaces in Java or abstract base classes in \CC, but without the nominal inheritance relationships.
     434Instead, each polymorphic function (or generic type) defines the structural type needed for its execution (polymorphic type key), and this key is fulfilled at each call site from the lexical environment, which is similar to the Go~\cite{Go} interfaces.
     435Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal inheritance hierarchy.
     436% (Nominal inheritance can be approximated with traits using marker variables or functions, as is done in Go.)
     437
     438% Nominal inheritance can be simulated with traits using marker variables or functions:
     439% \begin{cfa}
     440% trait nominal(T) {
     441%     T is_nominal;
     442% };
     443% int is_nominal;                                                               $\C{// int now satisfies the nominal trait}$
     444% \end{cfa}
     445%
     446% Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
     447% \begin{cfa}
     448% trait pointer_like(Ptr, El) {
     449%     lvalue El *?(Ptr);                                                $\C{// Ptr can be dereferenced into a modifiable value of type El}$
     450% }
     451% struct list {
     452%     int value;
     453%     list * next;                                                              $\C{// may omit "struct" on type names as in \CC}$
     454% };
     455% typedef list * list_iterator;
     456%
     457% lvalue int *?( list_iterator it ) { return it->value; }
     458% \end{cfa}
     459% In the example above, @(list_iterator, int)@ satisfies @pointer_like@ by the user-defined dereference function, and @(list_iterator, list)@ also satisfies @pointer_like@ by the built-in dereference operator for pointers. Given a declaration @list_iterator it@, @*it@ can be either an @int@ or a @list@, with the meaning disambiguated by context (\eg @int x = *it;@ interprets @*it@ as an @int@, while @(*it).value = 42;@ interprets @*it@ as a @list@).
     460% While a nominal-inheritance system with associated types could model one of those two relationships by making @El@ an associated type of @Ptr@ in the @pointer_like@ implementation, few such systems could model both relationships simultaneously.
     461
     462
     463\subsection{Generic Types}
     464
     465A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
     466Broadly speaking, there are three approaches to implement abstract data structures in C.
     467One approach is to write bespoke data structures for each context in which they are needed.
     468While this approach is flexible and supports integration with the C type checker and tooling, it is also tedious and error prone, especially for more complex data structures.
     469A second approach is to use @void *@-based polymorphism, \eg the C standard library functions @bsearch@ and @qsort@, which allow for the reuse of code with common functionality.
     470However, basing all polymorphism on @void *@ eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise not needed.
     471A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type checked, but errors may be difficult to interpret.
     472Furthermore, writing and using preprocessor macros is unnatural and inflexible.
     473
     474\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data types.
     475\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
     476However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.
     477
     478A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
     479\begin{cquote}
     480\lstDeleteShortInline@%
     481\begin{tabular}{@{}l|@{\hspace{\parindentlnth}}l@{}}
     482\begin{cfa}
     483@forall( R, S )@ struct pair {
     484        R first;        S second;
     485};
     486@forall( T )@ // dynamic
     487T value( pair(const char *, T) p ) { return p.second; }
     488@forall( dtype F, T )@ // dtype-static (concrete)
     489T value( pair(F *, T * ) p) { return *p.second; }
     490\end{cfa}
     491&
     492\begin{cfa}
     493pair(const char *, int) p = {"magic", 42}; // concrete
     494int i = value( p );
     495pair(void *, int *) q = { 0, &p.second }; // concrete
     496i = value( q );
     497double d = 1.0;
     498pair(double *, double *) r = { &d, &d }; // concrete
     499d = value( r );
     500\end{cfa}
     501\end{tabular}
     502\lstMakeShortInline@%
     503\end{cquote}
     504
     505\CFA classifies generic types as either \newterm{concrete} or \newterm{dynamic}.
     506Concrete types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on their type parameters.
     507A \newterm{dtype-static} type has polymorphic parameters but is still concrete.
     508Polymorphic pointers are an example of dtype-static types;
     509given some type variable @T@, @T@ is a polymorphic type, as is @T *@, but @T *@ has a fixed size and can, therefore, be represented by @void *@ in code generation.
     510
     511\CFA generic types also allow checked argument constraints.
     512For example, the following declaration of a sorted set type ensures the set key supports equality and relational comparison.
     513\begin{cfa}
     514forall( Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); } ) struct sorted_set;
     515\end{cfa}
     516
     517
     518\subsection{Concrete generic types}
     519
     520The \CFA translator template expands concrete generic types into new structure types, affording maximal inlining.
     521To enable interoperation among equivalent instantiations of a generic type, the translator saves the set of instantiations currently in scope and reuses the generated structure declarations where appropriate.
     522A function declaration that accepts or returns a concrete generic type produces a declaration for the instantiated structure in the same scope, which all callers may reuse.
     523For example, the concrete instantiation for @pair( const char *, int )@ is
     524\begin{cfa}
     525struct _pair_conc0 {
     526        const char * first;  int second;
     527};
     528\end{cfa}
     529
     530A concrete generic type with dtype-static parameters is also expanded to a structure type, but this type is used for all matching instantiations.
     531In the above example, the @pair( F *, T * )@ parameter to @value@ is such a type; its expansion is below, and it is used as the type of the variables @q@ and @r@ as well, with casts for member access where appropriate.
     532\begin{cfa}
     533struct _pair_conc1 {
     534        void * first, * second;
     535};
     536\end{cfa}
     537
     538
     539\subsection{Dynamic generic types}
     540
     541Though \CFA implements concrete generic types efficiently, it also has a fully general system for dynamic generic types.
     542As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact, all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller.
     543Dynamic generic types also have an \newterm{offset array} containing structure-member offsets.
     544A dynamic generic @union@ needs no such offset array, as all members are at offset 0, but size and alignment are still necessary.
     545Access to members of a dynamic structure is provided at runtime via base displacement addressing
     546% FIX
     547using the structure pointer and the member offset (similar to the @offsetof@ macro), moving a compile-time offset calculation to runtime.
     548
     549The offset arrays are statically generated where possible.
     550If a dynamic generic type is declared to be passed or returned by value from a polymorphic function, the translator can safely assume that the generic type is complete (\ie has a known layout) at any call site, and the offset array is passed from the caller;
     551if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C @offsetof@ macro.
     552As an example, the body of the second @value@ function is implemented as
     553\begin{cfa}
     554_assign_T( _retval, p + _offsetof_pair[1] ); $\C{// return *p.second}$
     555\end{cfa}
     556\newpage
     557\noindent
     558Here, @_assign_T@ is passed in as an implicit parameter from @T@, and takes two @T *@ (@void *@ in the generated code), a destination and a source, and @_retval@ is the pointer to a caller-allocated buffer for the return value, the usual \CFA method to handle dynamically sized return types.
     559@_offsetof_pair@ is the offset array passed into @value@;
     560this array is generated at the call site as
     561\begin{cfa}
     562size_t _offsetof_pair[] = { offsetof( _pair_conc0, first ), offsetof( _pair_conc0, second ) }
     563\end{cfa}
     564
     565In some cases, the offset arrays cannot be statically generated.
     566For instance, modularity is generally provided in C by including an opaque forward declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately compiled @.c@ file.
     567\CFA supports this pattern for generic types, but the caller does not know the actual layout or size of the dynamic generic type and only holds it by a pointer.
     568The \CFA translator automatically generates \newterm{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
     569These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all @sized@ parameters to the generic structure (un@sized@ parameters are forbidden from being used in a context that affects layout).
     570Results of these layout functions are cached so that they are only computed once per type per function. %, as in the example below for @pair@.
     571Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature.
     572For instance, a function that strips duplicate values from an unsorted @vector(T)@ likely has a pointer to the vector as its only explicit parameter, but uses some sort of @set(T)@ internally to test for duplicate values.
     573This function could acquire the layout for @set(T)@ by calling its layout function with the layout of @T@ implicitly passed into the function.
     574
     575Whether a type is concrete, dtype-static, or dynamic is decided solely on the @forall@'s type parameters.
     576This design allows opaque forward declarations of generic types, \eg @forall(T)@ @struct Box@ -- like in C, all uses of @Box(T)@ can be separately compiled, and callers from other translation units know the proper calling conventions to use.
     577If the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static (\eg @forall(T)@ @struct unique_ptr { T * p }@ does not depend on @T@ for its layout, but the existence of an @otype@ parameter means that it \emph{could}.);
     578however, preserving separate compilation (and the associated C compatibility) in the existing design is judged to be an appropriate trade-off.
    293579
    294580
Note: See TracChangeset for help on using the changeset viewer.