Changeset a03ed29 for doc


Ignore:
Timestamp:
Jul 24, 2024, 7:11:32 PM (6 months ago)
Author:
JiadaL <j82liang@…>
Branches:
master
Children:
5aeb1a9
Parents:
e561551 (diff), b6923b17 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

conclude merge

Location:
doc
Files:
9 edited

Legend:

Unmodified
Added
Removed
  • doc/LaTeXmacros/lstlang.sty

    re561551 ra03ed29  
    88%% Created On       : Sat May 13 16:34:42 2017
    99%% Last Modified By : Peter A. Buhr
    10 %% Last Modified On : Fri May 31 14:36:02 2024
    11 %% Update Count     : 44
     10%% Last Modified On : Wed Jul 24 07:40:11 2024
     11%% Update Count     : 45
    1212%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    1313
     
    115115        morekeywords={
    116116                alignas, _Alignas, alignof, _Alignof, __alignof, __alignof__, and, asm, __asm, __asm__, _Atomic, __attribute, __attribute__,
    117                 __auto_type, basetypeof, _Bool, bool, catch, catchResume, choose, coerce, corun, cofor, _Complex, __complex, __complex__,
     117                __auto_type, basetypeof, _Bool, bool, catch, catchResume, choose, coerce, cofor, corun, countof, _Complex, __complex, __complex__,
    118118                __const, __const__, continue, coroutine, _Decimal32, _Decimal64, _Decimal128, disable, dtype, enable, exception, __extension__,
    119119                fallthrough, fallthru, finally, fixup, __float80, float80, __float128, float128, _Float16, _Float32, _Float32x, _Float64,
  • doc/proposals/iterators.md

    re561551 ra03ed29  
    1010There are two groups of types that interact with this proposal.
    1111
     12Iterable
     13
     14An iterable is a container of some type that has a default iterator that goes
     15over its own contents. It may be a logical container, like a mathematical range,
     16that does not store elements in memory, but it should have logical elements.
     17A trait is defined to standardize how to get that iterator.
     18
     19        forall(C, I)
     20        trait is_iterable {
     21                I get_iterator(C&);
     22        }
     23
     24This is not particularly useful on its own, but is used as standardized glue
     25between some other features below.
     26
    1227Iterator
    1328
    14 An iterator has a very simple interface with a single operation.
    15 The operation is "get the next value in the sequence", but this actually has
    16 several parts, in that it has to check if there are move values, return the
    17 next one if there is one, and update any internal information in the iterator.
    18 For example: `Maybe(Item) next(Iter &);`.
    19 
    20 Now, iterators can have other operations. Notably, they are often also
    21 iterables that return themselves. They can also have a verity of iterator
    22 transformers built in.
    23 
    24 Iterable
    25 
    26 Anything that you can get an iterator from is called an iterable. There
    27 is an operation to get an iterator from an iterable.
    28 
    29 Range For Loop
    30 --------------
    31 One part of the language that could be reworked to make good use of this is
    32 for loops. In short, remove most of the special rules that can be done inside
    33 the identifier and make it a generic range for loop:
    34 
    35     ```
    36     for ( IDENTIFIER ; EXPRESSION ) STATEMENT
    37     ```
    38 
    39 The common way to implement this is that expression produces an iterable.
    40 The for loop gets an iterator from the iterable (which is why iterators are
    41 often iterables, so they can be passed in with the same interface) and stores
    42 it. Then, for each value in the iterator, the loop binds the value to the
    43 identifier and then executes the statement. The loop exits after every value
    44 has been used and the iterator is exhausted.
    45 
    46 For the chained for loop (`for (i; _: j; _)`) can still have its existing
    47 behaviour, advancing through each range in parallel and stopping as soon
    48 as the first one is exhausted.
     29An iterator is an object that can be advanced through a sequence of values.
     30An iterator can be checked to see if it is at the end of the sequence, if not
     31the next value can be retrieved from it and the iterator advanced. This can
     32be done in as little as a single operation:
     33
     34        forall(Iter, Elem)
     35        trait is_iterator {
     36                maybe(Elem) next(Iter&);
     37
     38                // is_iterable(I, I&)
     39                Iter& get_iterator(Iter&);
     40        }
     41
     42This interface is not fixed, the function could change or be split into
     43multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural,
     44but also efficient to implement. This version will be used for examples in
     45the proposal.
     46
     47When next is called, if there are elements left in the iterators, the next
     48element is returned in a maybe and the iterator is advanced. If the iterator
     49is exhausted nothing is changed and an empty maybe is returned.
     50
     51Iterators are also iterables. They must return themselves, this allows
     52iterators to be used directly where iterables are expected.
     53
     54For-Each Loop
     55-------------
     56The main part of this proposal which is part of the language definition,
     57the rest is part of the standard library. And this is the new for-each loop:
     58
     59        for ( IDENTIFIER ; EXPRESSION ) STATEMENT
     60
     61While this uses similar syntax to the existing special for loops, but takes
     62a standard expression. This expression is evaluated, and then the iterator is
     63retrieved from it (this should be the identity function iterators). Each
     64iteration of the loop then checks the iterator to see if it is exhausted, if
     65not it uses the value to execute the statement. If the iterator is exhausted,
     66the loop is completed, and control continues after the loop (or the else
     67clause if provided).
     68
     69The loop should show the same behaviour as the following code:
     70(`typeof(?)` should be decided by the resolver.)
     71
     72        {
     73                typeof(?) __iterable = EXPRESSION;
     74                typeof(?) __iterator = get_iterator(__iterable);
     75                typeof(?) __next = next(__iterator);
     76                while (has_value(&__next)) {
     77                        typeof(?) IDENTIFIER = get(&__next)
     78                        STATEMENT
     79                        __next = next(__iterator);
     80                }
     81        }
     82
     83This new loop can (but may not need to) also support the existing variants,
     84where the name is omitted or multiple iterators are advanced in lock-step:
     85
     86        for ( EXPRESSION ) STATEMENT
     87        for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT
     88
     89These may not be needed. The unnamed variant is a nice short hand.
     90The multiple iterator method could be replaced with an iterator transformer
     91that has the same behaviour:
     92
     93        for (i, j, k; zip(iter, range, container)) { ... }
     94
     95This would require some further extensions, but would also allow single
     96iterators to iterate over sets of related values. It also shows how library
     97features can be used to extend the for-each loop.
     98
     99Containers as Iterables
     100-----------------------
     101The primary iterator application is going over the contents of various
     102collections. Most of the collections should support the iterable interface.
     103Each of these iterators should go over each element in the container.
     104
     105They could support multiple iterator interfaces. For example, a map could go
     106over key-value-pairs, or just keys or just values.
     107
     108There isn't a lot to say here, except that as the implementation progresses,
     109various rules about iterator invalidation should be hammered out.
    49110
    50111Ranges
    51112------
    52 Ranges, which may be a data type or a trait, are containers that contain
    53 a sequence of values. Unlike an array or vector, these values are stored
    54 logically instead of by copy.
    55 
    56 The purpose of this container is to bridge the new iterator interfaces with
    57 the existing range syntax. The range syntax would become an operator that
    58 returns a range object, which can be used as any other type.
    59 
    60 It might not cover every single case with the same syntax (the `@` syntax may
    61 not translate to operators very well), but should be able to maintain every
    62 option with some library range.
    63 
    64 Library Enhancements
    65 --------------------
    66 There are various other tools in the library that should be improved.
    67 The simplest is to make sure most containers are iterables.
    68 
    69 Also, new utilities for manipulating iterators should be created. The exact
    70 list would have to wait but here are some examples.
    71 
    72 Transformers take in an iterator and produce another iterator.
    73 Examples include map, which modifies each element in turn, and filter,
    74 which checks each element and removes the ones that fail.
    75 
    76 Producers create new iterators from other information.
    77 Most practical iterators tend to be iterable containers, which produce all
    78 the elements in the container, this includes ranges. Others include infinite
    79 series of one element.
    80 
    81 Consumers take an iterator and convert it into something else.
    82 They might be converted into a container or used in a for loop. Dedicated
    83 consumers will be some form of folding function.
     113The existing range syntax is incompatible with the for-each loop. But this
     114can be recovered by making at least some of the range syntax into operators.
     115These operators create and return new range types.
     116
     117The ranges are types that map a random numeric range. For iteration, a range
     118is also considered an iterator over the numbers in the range.
     119
     120The there could be up to 6 new operators to create ranges. These are:
     121    `~` `+~` `~=` `+~=` `-~` `-~=`
     122
     123Parsing issues might rule some of these out. Others may be rendered
     124irrelevant by iterator transformers. For example, adding a transformer that
     125reverses the iterator could replace having downwards range operators.
     126
     127The one range that does not involve the operator is the lone number. This is
     128a shorthand for a half-open/exclusive range from zero to the given number.
     129That is the same behaviour as one (or two ~ and +~) of the existing range
     130operators. So it only saves 2-3 characters.
     131
     132But if we want to save those characters, there may be a special form that
     133could be used (for example, in the unnamed case) or you could treat integers
     134as iterables that create iterators that count from zero up to the limit.
     135
     136        forall(N) struct Upto { N current; N limit; };
     137        Upto(int) get_iterator(const int&);
     138
     139Enumerations as Iterators
     140-------------------------
     141To convert enumerations to the new form, a new syntax will have to be created
     142to treat an enumeration type as an expression returning a range.
     143
     144Using the unmodified type name seems problematic, it could be done if we
     145convert the type expression to some type object that can be passed to the
     146get_iterator function. But there are other alternatives.
     147
     148An operator could be created to convert the type into an iterable expression.
     149
     150        range_over(EnumType)
     151
     152It is also possible to create a library type that iterates over the range.
     153Then we use a compound literal of a nullary type as a flag to get it.
     154
     155        (ERange(EnumType)){}
     156
     157Both of these are placed in the EXPRESSION part of the for-each loop.
     158
     159Iterator Transformers
     160---------------------
     161New library features could be created to work with iterators. One of the
     162biggest groups are transformers, which allow for more complicated compound
     163operations on iterators.
     164
     165For a few examples:
     166
     167        // Create a new iterator that transforms elements from a source iterator.
     168        forall(I, T, U | is_iterator(I, T))
     169        /* Iterator over U */ map(U (*func)(T), I iter);
     170
     171        // Create an iterator that keeps only elements that satisfy a predicate.
     172        forall(I, T | is_iterator(I, T))
     173        /* Iterator over T */ filter(bool (*pred)(const T&), I iter);
     174
     175        forall(Is..., Ts... | is_iterator(Is, Ts)...)
     176        /* Iterator over [Ts] */ zip(Is iters);
     177
     178        forall(Index, I, T | is_serial(Index) | is_iterator(I, T))
     179        /* Iterator over [Index, T]) */ enumerate(Index, I);
     180
     181The return type is omitted in each case, because usually these functions end
     182up being thin wrappers around a constructor for a specialized type that is
     183the iterator, in each case, this is the return type.
     184
     185Also, these should be over iterables (with the conversion from iterable to
     186iterator part of the thin wrapper), not just iterators. But the examples are
     187easier to write out this way.
     188
     189Here is one example expanded to cover those details:
     190
     191        forall(Iter, Elem)
     192        struct Filter {
     193                bool (*pred)(const Elem&);
     194                Iter iterator;
     195        }
     196
     197        forall(Iter, Elem | is_iterator(Iter, Elem))
     198        maybe(Elem) next(Filter(Iter, Elem) & this) {
     199                maybe(Elem) ret;
     200                do {
     201                        ret = next(this.iterator);
     202                } while (has_value(ret) && this.pred(get(&ret)));
     203                return ret;
     204        }
     205
     206        forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem))
     207        Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) {
     208                return (Filter(Iter, Elem)){ pred, get_iterator(iterable) };
     209        }
     210
     211Iterator Consumers
     212------------------
     213Eventually we will have to turn the iterators into something else. Iterator
     214consumers take an iterator and turn it into something else. The biggest of
     215these is the for-each loop, as described above. In addition various library
     216functions can be added.
     217
     218One group of functions are container constructors which take everything from
     219the iterable and build a new container out of it. There would be various
     220instances of this because you will need one for each container type.
     221
     222Higher order folding functions (for example, the functional
     223`foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does
     224not work with anonymous functions quite as well. However, some of the
     225specialized versions may still be useful. For example, getting the sum of an
     226iterable's elements.
     227
     228        forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) |
     229                        { void ?{}(N&, zero_t); N ?+=?(N&, N); })
     230        N sum(T & iterable) {
     231                N acc = 0;
     232                for ( i ; iterable ) acc += i;
     233                return acc;
     234        }
     235
     236Iterator Producers
     237------------------
     238Another group of library features are iterator producers, that take some
     239data and create an iterator out of it. Many of already discussed cases are
     240actually examples of this (containers, ranges and enumerations).
     241
     242There are a few more cases that might be useful. Such as "repeat", which
     243takes a single element and iterates over it infinitely. This is usually
     244combined with other iterators and not used directly.
     245
     246Other Improvements to Cforall
     247----------------------------
     248All code snippets should be considered pseudo-code because they require some
     249features that do not currently exist. There are also some that just run into
     250current bugs. For instance, my proof of concept for ERange encountered a few
     251warnings even after I got it working.
     252
     253This proposal might also benefit from associated types. Both is_iterable and
     254is_iterator may not need to be generic over all their type parameters.
     255
     256Tuple enhancements might also help with iterators that want to return more
     257than one value.
    84258
    85259Related Work
    86260------------
    87 Python has a robust iterator tool set. It also has a `range` built-in which
    88 does many of the same things as the special for loops (the finite and
    89 half-open ranges).
     261Python has a robust iterator tool set. It uses the same iterator / iterable
     262divide as described above and has the interfaces (note, Python uses duck
     263typing instead of defined interfaces, so these are "imaginary"):
     264
     265        class Iterable:
     266
     267                def __iter__(self):
     268                        # Return an iterator over the iterable's contents.
     269
     270        class Iterator(Iterable):
     271
     272                def __iter__(self):
     273                        # Iterators are iterable over themselves.
     274                        return self
     275
     276                def __next__(self):
     277                        # Return next value if present and advance internal state.
     278                        # Otherwise raise StopIteration exception.
     279
     280It uses this in for loops, the for loop takes an iterable gets the iterable,
     281and iterates until the iterable is exhausted. Iterators are iterables so they
     282can be passed in directly as well.
     283
     284It also has a `range` built-in which handles integer loops, although only
     285increasing half-open ranges.
    90286
    91287In addition, it has many dedicated iterator constructors and transformers,
    92288and many containers can both produce and be constructed from iterators.
    93 
     289For example, the chain function goes though a series of iterators in sequence
     290an can be used by:
     291
     292        import itertools
     293        itertools.chain(iter_a, iter_b)
     294
     295Python enumerations are implemented in the standard library and are iterables
     296over all their possible values. There is no special construct for this, in
     297Python types are object, and the type of an enum type object is EnumType
     298which supports the iterable interface.
     299
     300        for elem in SomeEnum:
     301                print(elem)
     302
     303Python References
    94304+   https://docs.python.org/3/reference/datamodel.html#object.__iter__
    95305+   https://docs.python.org/3/library/functions.html#func-range
     306+   https://docs.python.org/3/library/enum.html
    96307
    97308C++ has many iterator tools at well, except for the fact it's "iterators" are
     
    104315They do appear to be a wrapper around the "pointer" iterators.
    105316
     317        template< class T >
     318        concept range = requires( T& t ) {
     319                ranges::begin(t);
     320                ranges::end(t);
     321        };
     322
     323C++ also has "structured bindings" which can be used to unpack an object and
     324bind its components to different names. This can be used to iterate over
     325multiple values simultaneously. In this example from the Cforall compiler
     326using a custom utility function written without special support.
     327
     328        for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) {
     329                ...
     330        }
     331
     332C++ References
    106333+   https://en.cppreference.com/w/cpp/ranges
     334+   https://en.cppreference.com/w/cpp/language/structured_binding
    107335
    108336Rust also has a imperative implementation of a functional style of iterators,
     337with iterator tools implemented as "Provided Methods". Provided methods
    109338including a great number of standard transformers. Otherwise, it is very
    110339similar to Python.
    111340
    112 +   https://doc.rust-lang.org/std/iter/index.html
     341        pub trait Iterator {
     342                type Item;
     343
     344                fn next(&mut self) -> Option<Self::Item>;
     345
     346                // Many provided methods ...
     347        }
     348
     349Not that Rust has many of its iterator transformers stored as methods on the
     350iterator. These methods are "provided" so you only have to define next, but
     351you can redefine the other methods to provide optimizations.
     352
     353Rust also has range expressions. Based around `start .. end` and
     354`start ..= end`, with the sub-expressions giving the start and end of the
     355range being optional, except for end after `..=`. If start is omitted, the
     356range has no lower bound, otherwise the range uses start as its inclusive
     357lower bound. It is the same case for end and the upper bound, except that the
     358`..=` will have an inclusive upper bound instead of an exclusive one.
     359
     360Rust achieves ordering and step control by iterator transformers. `.rev()`
     361allows you to reverse a range and `.step_by(step)` provides a positive step
     362value.
     363
     364Rust Reference
     365+   https://doc.rust-lang.org/std/iter/trait.Iterator.html
     366+   https://doc.rust-lang.org/reference/expressions/range-expr.html
     367+   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev
  • doc/theses/fangren_yu_MMath/content1.tex

    re561551 ra03ed29  
    1 \chapter{Content1}
     1\chapter{Recent Features Introduced to \CFA}
    22\label{c:content1}
    33
    4 This chapter ...
    5 
    6 \section{Section 1}
    7 
    8 \section{Section 2}
     4This chapter discusses some recent additions to the \CFA language and their interactions with the type system.
     5
     6\section{Reference Types}
     7
     8Reference types are added to \CFA by Robert Schluntz in conjunction to his work on resource management. \CFA reference type is similar to \CC reference type but with its own added features.
     9
     10The main difference between \CFA and \CC references is that \CC references are immutable, while \CFA supports reference rebinding operations. In \CC, references are mostly used in function parameters for pass-by-reference semantics, and in class members, which must be initialized during construction. Merely declaring a variable of reference type has little use as it only creates an alias of an existing variable. In contrast, \CFA reference variables can be reassigned (rebinded) and reference to reference is also allowed.
     11
     12This example is taken from the feature demonstration page of \CFA: \footnote{Currently there are no plans of introducing \CC rvalue references to \CFA. Readers should not confuse the multi-reference declarations with \CC rvalue reference syntax.}
     13
     14\begin{cfa}
     15int x = 1, y = 2, z = 3;
     16int * p1 = &x, ** p2 = &p1,  *** p3 = &p2, // pointers to x
     17        & r1 = x,  && r2 = r1,   &&& r3 = r2;  // references to x
     18int * p4 = &z, & r4 = z;
     19
     20*p1 = 3; **p2 = 3; ***p3 = 3;                   // change x
     21 r1 =  3;       r2 = 3;         r3 = 3;                 // change x: implicit dereference *r1, **r2, ***r3
     22**p3 = &y;      *p3 = &p4;                                      // change p1, p2
     23// cancel implicit dereferences (&*)**r3, (&(&*)*)*r3, &(&*)r4
     24&r3 = &y; &&r3 = &&r4;                                  // change r1, r2
     25\end{cfa}
     26
     27A different syntax is required for reassigning to a reference variable itself, since auto-deferencing is always performed and the expression \texttt{r1} would mean the \texttt{int} variable that \texttt{r1} referenced to instead. Using \CFA's reference types (including multi-references) we can actually describe the "lvalue" rules in C by types only, and the concept of lvalue could have been completely dropped off. However, since the cfa-cc program is not a full compiler but a transpiler from \CFA to C, lvalueness is still used in some places for code generation purposes, while the type checker now works on just types without needing to consider lvalueness of an expression.
     28
     29The current typing rules used in \CFA can be summarized as follows:
     30
     31\begin{enumerate}
     32    \item For a variable x with declared type T, the variable-expression x has type reference to T, even if T itself is a reference type.
     33    \item For an expression e with type $T\&_1...\&_n$ i.e. T followed by n references, where T is not a reference type, the expression \&T (address of T) has type T* followed by n-1 references.
     34    \item For an expression e with type T* followed by n references, *T has type T followed by n+1 references. This is the reverse of previous rule, such that address-of and dereference operators are perfect inverses.
     35    \item When matching parameter and argument types in a function call context, the number of references on the argument type is always stripped off to match the number of references on the parameter type \footnote{\CFA allows rvalue expressions be converted to reference values by implicitly creating a temporary variable, with some restrictions.}. In an assignment context, the left-hand side operand type is always reduced to a single reference.
     36\end{enumerate}
     37
     38Under this ruleset, in a function call context, a type parameter will never be bound to a reference type. For example given the declarations
     39
     40\begin{cfa}
     41    forall (T) void f (T&);
     42
     43    int &x;
     44\end{cfa}
     45
     46The call f(x); will apply implicit dereference once to x so the call is typed f(int\&) with T=int, rather than with T=int\&.
     47
     48While initially the design of reference types in \CFA seeks to make it more like a "real" data type than reference in \CC, which mostly only serves the purpose of choosing argument-passing methods (by-value or by-reference) in function calls, the inherent ambiguity of auto-dereferencing still limits the behavior of reference types in \CFA polymorphic functions. Moreover, there is also some discrepancy of how the reference types are treated in initialization and assignment expressions. In the former case no implicit dereference is applied at all (see line 3 of example code) and in the latter case there is actually no assignment operators defined for reference types; the reassignment of reference variables uses the assignment operators for pointer types instead. There is also an annoying issue (although purely syntactic) that to pass in a null value for reference initialization one has to write \texttt{int \& r1 = *0p;} which looks like dereferencing a null pointer, but the dereferencing operation does not actually happen and the expression is eventually translated into initializing the underlying pointer value to null.
     49
     50This second point of difference would prevent the type system to treat reference types the same way as other types in many cases even if we allow type variables to be bound to reference types. This is because \CFA uses the common "object" trait (constructor, destructor and assignment operators) to handle passing dynamic concrete type arguments into polymorphic functions, and the reference types are handled differently in these contexts so they do not satisfy this common interface.
     51
     52When generic types are introduced to \CFA, some thoughts had been put into allowing reference types as type arguments. While it is possible to write a declaration such as \texttt{vector(int\&)} for a container of reference variables, it will be disallowed in assertion checking if the generic type in question requires the object trait for the type argument (a fairly common use case) and even if the object trait can be made as non-required, the current type system often misbehaves by adding undesirable auto-dereference and operate on the referenced-to value rather than the reference variable itself as intended. Some tweaks would have to be made to accommodate reference types in polymorphic contexts and we are still not sure of what can or cannot be achieved. Currently we have to reside on using pointer types and giving up the benefits of auto-dereference operations on reference types.
     53
     54
     55
     56\section{Tuple Types}
     57
     58The addition of tuple types to \CFA can be traced back to the original design by David Till in K-W C, a predecessor project of \CFA. The introduction of tuples was aiming to eliminate the need of having output parameters or defining an aggregate type in order to return multiple values from a function. In the K-W C design, tuples can be thought of as merely a syntactic sugar as it is not allowed to define a variable with tuple type. The usage of tuples are restricted to argument passing and assignments, and the translator implementation converts tuple assignments by expanding the tuple assignment expressions to assignments of each component, creating temporary variables to avoid unexpected side effects when necessary. As in the case of a function returning multiple values (thereafter called MVR functions), a struct type is created for the returning tuple and the values are extracted by field access operations.
     59
     60In an early implementation of tuples in \CFA made by Rodolfo Gabriel Esteves, a different strategy is taken to handle MVR functions. The return values are converted to output parameters passed in by pointers. When the return values of a MVR function are directly used in an assignment expression, the addresses of the left-hand operands can be directly passed in to the function; composition of MVR functions is handled by creating temporaries for the returns.
     61
     62Suppose we have a function returning two values as follows:
     63
     64\begin{cfa}
     65    [int, int] gives_two();
     66
     67    int x,y;
     68    [x,y] = gives_two();
     69\end{cfa}
     70
     71The K-W C implementation translates the program to
     72
     73\begin{cfa}
     74    struct _tuple2 { int _0; int _1; }
     75    struct _tuple2 gives_two();
     76    int x,y;
     77    struct _tuple2 _tmp = gives_two();
     78    x = _tmp._0; y = _tmp._1;
     79\end{cfa}
     80
     81While the Rodolfo implementation translates it to
     82
     83\begin{cfa}
     84    void gives_two(int *, int *);
     85    int x,y;
     86    gives_two(&x, &y);
     87\end{cfa}
     88
     89and inside the body of the function \text{gives\_two}, the return statement is rewritten to assignments into the passed-in addresses.
     90
     91The latter implementation looks much more concise, and in the case of returning values having nontrivial types (e.g. structs), this implementation can also save some unnecessary copying.
     92
     93Interestingly, in Robert Schluntz's rework of the tuple type, the implementation got reverted back to struct-based, and it remained in the current version of cfa-cc translator. During the same time of his work, generic types were being added into \CFA independently as another feature, and the tuple type was changed to use the same implementation techniques of generic types. Consequently, it made tuples become first-class values in \CFA.
     94
     95However, upon further investigation, making tuple types first-class has very little benefits in \CFA, mainly because that function call semantics with tuples are designed to be unstructured, and that since operator overloading in \CFA are implemented by treating every overloadable operator as functions, tuple types are very rarely used in a structured way. When a tuple-type expression appears in a function call (except assignment expressions, which are handled differently by mass- or multiple-assignment expansions), it is always flattened, and the tuple structure of function parameter is not considered a part of the function signature, for example
     96
     97\begin{cfa}
     98    void f(int, int);
     99    void f([int, int]);
     100\end{cfa}
     101
     102are considered to have the same signature (a function taking two ints and returning nothing), and therefore not valid overloads. Furthermore, ordinary polymorphic type parameters are not allowed to have tuple types in order to restrict the expression resolution algorithm to not create too many argument-parameter matching options, such that the type-checking problem remains tractable and does not take too long to compute. Therefore tuple types are never present in any fixed-argument function calls.
     103
     104A type-safe variadic argument signature was proposed using \CFA's \texttt{forall} clause and a new tuple parameter type, denoted previously by the \texttt{ttype} keyword and now by the ellipsis syntax similar to \CC's template parameter pack.
     105
     106The C \texttt{printf} function, for example, can be rewritten using the new variadic argument, in favor of the C untyped \texttt{va\_list} interface as
     107
     108\begin{cfa}
     109    forall (TT...) int printf(char *fmt, TT args);
     110\end{cfa}
     111
     112Note that this is just for illustration purposes, as type assertions are needed to actually achieve type safety, and \CFA's I/O library does not use a format string since argument types are inferred by the type system.
     113
     114There are two common patterns for using the variadic function in \CFA: one is to forward the arguments to another function
     115
     116\begin{cfa}
     117    forall(T*, TT... | {void ?{}(T &, TT);})
     118    T* new (T, TT) { return ((T*)malloc()){TT}; }
     119\end{cfa}
     120
     121and the other is structural recursion which extracts arguments one at a time
     122
     123\begin{cfa}
     124    forall( ostype &, T, Params... | { ostype & ?|?( ostype &, T); ostype & ?|?( ostype &, Params ); } )
     125        ostype & ?|?( ostype & os, T arg, Params rest );
     126\end{cfa}
     127
     128The above is the actual implementation of variadic print function in \CFA. \texttt{ostype} represents the output stream, similar to \CC's \texttt{ostream} interface. Note that recursion must be used in order to extract type information of the first argument in the list, as opposed to C \texttt{va\_list} variadics which uses a loop to extract each argument, and generally requires some companion data that provides type information, such as the format string in \texttt{printf}.
     129
     130Variadic polymorphic functions are somehow currently the only place tuple types are used in functions. And just like the case for user-defined generic types, many wrapper functions need to be generated to implement polymorphism with variadics. However, note that the only permitted operations on polymorphic function parameters are given by the assertion (trait) functions, and those eventually need to be supplied flattened tuple arguments, packing the variadic arguments into a rigid struct type and generating all the required wrapper functions become mostly wasted work. Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output needs to be to implement a simple variadic call with 3 arguments, and it will quickly become even worse if the number of arguments is increased: for a call with 5 arguments the translator would have to generate concrete struct types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them! Instead, a much simpler approach of putting all variadic arguments into a data array and providing an offset array to retrieve each individual argument can be utilized. This method is very similar to how the C \texttt{va\_list} object is used, with \CFA type resolver validating and generating the required type information to guarantee type safety.
     131
     132The print function example
     133
     134\begin{cfa}
     135    forall(T, Params... | { void print(T); void print(Params ); })
     136    void print(T arg , Params rest) {
     137        print(arg);
     138        print(rest);
     139    }
     140\end{cfa}
     141
     142using the offset array approach, can be converted to pseudo-\CFA code (ignoring type assertions for T) as
     143
     144\begin{cfa}
     145    void print(T arg, char* _data_rest, size_t* _offset_rest) {
     146        print(arg);
     147        print(*((typeof rest.0)*) _data_rest, // first element of rest
     148            _data_rest + _offset_rest[0],  // remainder of data
     149            _offset_rest + 1);  // remainder of offset array
     150    }
     151\end{cfa}
     152
     153where the fixed-arg polymorphism for T can be handled by the standard \texttt{void*}-based \CFA polymorphic calling conventions, and the type information can all be deduced at the call site.
     154
     155Turning tuples into first-class values in \CFA does have a few benefits, namely allowing pointers to tuples and arrays of tuples to exist. However it seems very unlikely that these types can have realistic use cases that are hard to achieve without them. And indeed having the pointer-to-tuple type to exist at all will potentially forbid the simple offset-array implementation of variadic polymorphism (in case that a type assertion requests the pointer type \texttt{Params*} in the above example, forcing the tuple type to be materialized as a struct), and thus incurring a high cost. Perhaps it is of best interest to keep tuples as non-first-class, as Rodolfo originally describes them to "[does] not enforce a particular memory layout, and in particular, [does] not guarantee that the components of a tuple occupy a contiguous region of memory," and therefore to be able to use the simplified implementations for MVR and variadic functions.
     156
     157One final topic worth discussing is that the strategy of converting return values to output parameters can be utilized to implement copy elision, which is relevant for \CFA since constructors are introduced to the language. However, the first example given in this section
     158
     159\begin{cfa}
     160    int x,y;
     161    [x,y] = gives_two();
     162\end{cfa}
     163
     164actually \textit{cannot} have copy elision applied, since the call to \texttt{gives\_two} appears in an \textit{assignment} context rather than an initialization context, as the variables x and y may be already initialized. Unfortunately \CFA currently does not support declaring variables in tuple form:
     165
     166\begin{cfa}
     167    [int x, int y] = gives_two(); // NOT ALLOWED
     168\end{cfa}
     169
     170It is possible to declare a tuple-typed variable and call MVR functions in initialization context
     171
     172\begin{cfa}
     173    [int, int] ret = gives_two(); // OK
     174\end{cfa}
     175
     176but using the values is more awkward as we cannot give them separate names and have to use \texttt{ret.0} or \texttt{ret.1} to extract the values. If copy elision semantics were to be added to \CFA it would be preferable to allow declaring variables in tuple form to have the benefit of eliding copy construction while giving each variable a unique name.
     177
     178\section{Plan-9 Struct Inheritance}
     179
     180Plan-9 Inheritance is a non-standard C feature originally introduced by the C dialect used in Bell Labs' Plan-9 research operating system, and is supported by mainstream C compilers such as GCC and Clang. This feature allows an aggregate type (struct or union) to be embedded into another one with implicit access semantics similar to anonymous substructures.
     181
     182In standard C, it is possible to define a field with an anonymous struct or union type within another. This is often utilized to implement a tagged union:
     183
     184\begin{cfa}
     185
     186struct T {
     187  unsigned int tag;
     188  union {
     189    int x;
     190    double y;
     191    char z;
     192  };
     193} t;
     194\end{cfa}
     195
     196The union fields can be directly accessed using their names, such as \texttt{T.x}. With Plan-9 extensions enabled, the same can be applied to a struct or union type defined elsewhere. \CFA uses the inline specifier to denote the anonymously embedded field.
     197
     198In GCC it is possible to simply use\texttt{struct B \{struct A;\}}
     199for the Plan-9 feature; since \CFA no longer requires the struct and union keywords in variable declarations, having a keyword to denote Plan-9 inheritance is preferable.
     200
     201\begin{cfa}
     202  struct A {int x;};
     203
     204  struct B {
     205    inline A;
     206    int y;
     207  };
     208
     209  B b;
     210  b.x; // accesses the embedded struct A's field
     211\end{cfa}
     212
     213As the \CFA translator simply just reduce the source code to C, usually the non-standard C features do not need any special treatment, and are directly passed down to the C compiler. However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the type checking algorithm must take that information into account. Therefore, the \CFA translator must implement the Plan-9 features and insert the type conversions into the translated code output. In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C conversions.
     214
     215Since variable overloading is possible, \CFA's implementation of Plan-9 inheritance allows duplicate field names. When an outer field and an embedded field have the same name and type, the inner field is shadowed and cannot be accessed directly by name. While such definitions are allowed, using duplicate names is not good practice in general and should be avoided if possible.
     216
     217Plan-9 fields can be nested, and a struct definition can contain multiple Plan-9 embedded fields. In particular, the "diamond pattern" can occur and result in a nested field to be embedded twice.
     218
     219\begin{cfa}
     220  struct A {int x;};
     221  struct B {inline A;};
     222  struct C {inline A;};
     223
     224  struct D {
     225    inline B;
     226    inline C;
     227  };
     228
     229  D d;
     230\end{cfa}
     231
     232In the above example, the expression \texttt{d.x} becomes ambiguous, since it can refer to the indirectly embedded field either from B or from C. It is still possible to disambiguate the expression by first casting the outer struct to one of the directly embedded type, such as \texttt{((B)d).x}
     233
     234
     235
     236
  • doc/theses/fangren_yu_MMath/content2.tex

    re561551 ra03ed29  
    1 \chapter{Content1}
     1\chapter{Resolution Algorithms}
    22\label{c:content1}
    33
    4 This chapter ...
    5 
    6 \section{Section 1}
    7 
    8 \section{Section 2}
     4\CFA's type system is fairly complicated. The compiler needs to analyze each expression with many possible forms of overloading. Variables can be overloaded in \CFA, and functions can be overloaded by the argument types as well as the return types. Combined with the polymorphism introduced by forall clauses and generic types, the complexity of expression analysis can go up very quickly. Designing a rule set that behaves mostly as expected, and implementing it as an efficient algorithm for actual use, are both very challenging tasks. As the \CFA translator's performance improves to a level that can compile a mid-sized program in a reasonable amount of time, the development of \CFA's standard library also speeds up and many new features utilizing the expressiveness of \CFA's type system has been implemented, such as generic container types similar to those in \CC's standard template library. During the process of development, many weaknesses and design flaws of \CFA type system have been discovered. Some of those problems arise from the newly introduced language features described in the previous chapter, and fixing those unexpected interactions with the type system is especially difficult. This chapter describes the type resolution rules currently in use and some major problems that have been identified. Not all of those problems have got solutions yet, because fixing them may require redesigning parts of the \CFA language at a larger scale.
     5
     6\section{The Expression Cost Model}
     7
     8\CFA has been using an expression cost model to resolve ambiguity of overloaded expressions from the very beginning. Since most operators can be overloaded in \CFA (excluding a few operators that have special semantics, such as the comma operator, and the short-circuit logical operators \&\& and ||, which require the operands to be evaluated in order), they are treated the same way as other function calls and the same rules for overload resolution must apply to them as well.
     9
     10In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, that accounts for the type conversions needed from the argument type to the corresponding declared function parameter type, as well as the polymorphic types and restrictions introduces by the forall clause. Currently the expression cost used in \CFA has the following components, ranked from higher to lower by importance:
     11
     12\begin{enumerate}
     13    \item \textbf{Unsafe} cost representing a narrowing conversion of arithmetic types, e.g. \texttt{int} to \texttt{short}, and qualifier-dropping conversions for pointer and reference types;
     14    \item \textbf{Polymorphic} cost where the function parameter type is or contains a polymorphic type variable;
     15    \item \textbf{Safe} cost representing a widening conversion e.g. \texttt{short} to \texttt{int}, qualifier-adding conversions for pointer and reference types, and value conversion for enumeration constants.
     16    \item \textbf{Variable} cost that counts the number of polymorphic variables, if any, introduced by the forall clause in the function declaration;
     17    \item \textbf{Specialization} cost that counts the number of restrictions introduced by the type assertions.
     18\end{enumerate}
     19
     20The comparison of cost tuple is by lexicographical order, starting from the highest importance term (unsafe cost) and the lower one has lower cost, with ties going to the second term (polymorphic cost) and so on. At a sub-expression level, the lowest cost candidate for each result type is included as a possible interpretation of the expression; at the top level all possible interpretations of different types are considered and the overall lowest cost is selected as the final interpretation of the expression.
     21
     22In many languages that support function and operator overloading, such as \CC and Java, a partial ordering system decides which interpretation of the expression gets selected, which means that sometimes the candidates are incomparable (none of those are considered a better match) and only when one candidate is considered better than all others (maximal with respect to the partial order) is the expression unambiguous.
     23
     24In \CFA trying to use such a system is problematic because of the presence of return type overloading of functions, and overloading of variables. The resolution algorithms used in \CC and Java are greedy, as they select the best match for each sub-expression without considering the higher level ones, and therefore at each step of resolution, the arguments are already given unique interpretations, so the ordering only needs to consider comparing different sets of conversion targets (function parameter types) on the same set of input. However, in \CFA expression resolution considers multiple interpretations of argument sub-expressions with different types, so it is possible that both the selected function and the set of arguments are different, and cannot be compared if we choose to use some kind of partial ordering system. Since this situation can arise quite often in \CFA, even in the simplest form such as an expression \textbf{f(a)} where both the function name \textbf{f} and variable name \textbf{a} are overloaded. We do not want the resolution algorithm to report too many expressions as ambiguous (which would therefore be compilation errors) and restrict the flexibility of \CFA by too much. The previous documentations and papers on \CFA expression resolution never explained why such a cost system is used; this could be a plausible guess of the original motivation of introducing the cost system to \CFA.
     25
     26On the contrary, using such a cost-based model can sometimes make \CFA expression resolution too permissive; the system will always attempt to select the lowest cost option, and only when there are multiple options tied at the lowest cost it reports the expression as ambiguous. With so many elements in the cost tuple, ties are expected to be uncommon. Other than the case of multiple exact matches which would all have cost of zero, incomparable candidates under a partial ordering of being more specific can often have different expression costs since different kinds of implicit conversions are involved, resulting in seemingly arbitrary overload selections.
     27
     28Ada is another programming language that has overloading based on return type. Although Ada also allows implicit type conversions of function arguments, it is fairly conservative on resolving ambiguities. There are only two "preference" rules in Ada overload resolution, one for primitive arithmetic operators and one for universal access types (analogous to void* in C); any other cases where an expression have multiple legal interpretations are considered ambiguous. The current overload resolution system for \CFA is on the other end of the spectrum, as it tries to order every legal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results.
     29
     30There are currently at least three different situations where the polymorphic cost element of the cost model does not yield a candidate selection that is clearly justifiable, and one of them is straight up wrong. Here are those three cases:
     31
     32\begin{enumerate}
     33    \item Polymorphic exact match versus non-polymorphic inexact match: consider the following declarations
     34
     35    \begin{cfa}
     36        forall (T) void f (T); // 1
     37        void f (long); // 2
     38
     39        f (42); // currently selects 2
     40    \end{cfa}
     41
     42    Under the current cost model, option 1 incurs a polymorphic cost from matching the argument type \textbf{int} to type variable \textbf{T}, and option 2 incurs a safe cost from integer promotion of type \textbf{int} to \textbf{long}. Since polymorphic cost is ranked above safe conversion cost, option 2 is considered to have lower cost and gets selected.
     43   
     44    In contrast, the template deduction and overload resolution rules in \CC selects option 1 instead (converting forall to the equivalent function template declaration). \CC performs template argument deduction and overload candidate ranking in two separate steps: in the first step the type parameters are deduced for each primary function template, and if the corresponding template instantiation succeeds, the resulting function prototype is added to the resolution candidate set. In the second step the implicit conversions (if any) applied to argument types are compared after taking away top-level qualifiers and references, and it prefers an exact match, followed by basic type promotions (roughly corresponds to safe conversion in \CFA), and then other kinds of conversions (roughly corresponds to unsafe conversion in \CFA). Only when the type conversions are the same does it prioritize a non-template candidate. In this example, option 1 produces the prototype void f(int) which gives an exact match and therefore takes priority. The \CC resolution rules effectively makes option 2 a specialization that only applies to type \textbf{long} exactly,\footnote{\CC does have explicit template specializations, however they do not participate directly in overload resolution and can sometimes lead to unintuitive results.} while the current \CFA rules make option 2 apply for all integral types below \textbf{long}. Such a discrepancy could be explained as a design decision that since \CFA polymorphic functions are real entities and are separately compiled, calling them would require passing type information and thus have an actual runtime cost.
     45
     46    \item Having a lower total polymorphic cost does not always mean a function is more specialized. The following example is taken from Aaron Moss's thesis, which discusses some improvements to the \CFA expression cost model, where he claims the following function prototypes are increasingly more constrained:
     47
     48    \begin{cfa}
     49        forall(T, U) void f(T, U); //1, polymorphic
     50        forall(T) void f(T, T); //2, less polymorphic
     51        forall(T) void f(T, int); //3, even less polymorphic
     52        forall(T) void f(T*, int); //4, least polymorphic
     53    \end{cfa}
     54
     55    This argument is not entirely correct. Although it is true that both the sequence 1,2 and 1,3,4 are increasingly more constrained on the argument  types, the option 2 is not comparable to either of option 3 or 4; they actually describe independent constraints on the two arguments. In natural language, option 3 says that the second argument must have type \textbf{int}, while option 2 says that the two arguments must have the same type. These two constraints can independently be satisfied, therefore neither should be considered a better match when trying to resolve a call to f with argument types (int, int), and reporting such an expression as ambiguous is the most appropriate action. This is a limitation of using a numerical cost value as it cannot these logically complicated cases.
     56
     57    \item Finally, the introduction of generic types means that it may require more type variables to describe a more specific type and that means simply counting the number of polymorphic type variables is no longer correct in general to order the function candidates as being more constrained. Suppose we have a generic pair type defined and writing a function that takes an arbitrary pair would require using two type variables
     58    \begin{cfa}
     59        forall (T,U) void f (pair(T,U)); // 1
     60    \end{cfa}
     61    and compare that with a function that takes any type at all:
     62    \begin{cfa}
     63        forall (T) void f (T); // 2
     64    \end{cfa}
     65
     66    Passing a pair variable to f gives a cost of 1 poly, 2 variable for the pair overload, and a cost of 1 poly, 1 variable for the unconstrained overload. Clearly we would like the former to be chosen but the cost model cannot handle that correctly.
     67\end{enumerate}
     68
     69These inconsistencies do not seem to be easily solvable and currently the \CFA codebase has to work around with these known defects. One potential path that could possibly be taken is a mix of the conversion cost and \CC-like partial ordering of specializations. Observe that in the \CFA cost tuple, the first three elements (unsafe, polymorphic and safe conversions) are related to the argument types, while the other elements (polymorphic variable and assertion counts) are properties of the function declarations independent of the arguments. This means it may be reasonable to have an ordering that compares the argument conversion costs first and uses the partial ordering of specializations as a tiebreaker. The algorithm used by \CC template specialization ordering can be applied for \CFA with some slight modifications.
     70
     71At the meantime, some other improvements have been made to the expression cost system. Most notably, the expression resolution algorithm now consistently uses the globally minimal cost interpretation, as discussed in a previous technical report. While implementing the change, there are also two detailed issues that need to be addressed for the new rules to fully work.
     72
     73The first one deals with an idiom commonly used in \CFA that would cause a lot of overloads to have equal costs. These kinds of expressions are so ubiquitous in \CFA code that we do not want them to be deemed ambiguous in the language. Many C library functions have multiple versions for different argument types, for example there are absolute value functions defined for basic arithmetic types with different names, since C does not support any kind of overloading:
     74
     75\begin{cfa}
     76    int abs (int);
     77    long labs (long);
     78    double fabs (double);
     79    float fabsf (float);
     80\end{cfa}
     81
     82It is cumbersome for the programmers to remember all these different function names and select the correct ones, and even worse, if the incorrect version is picked, the program still compiles but with undesired conversions, which can sometimes even change the result, such as using the int version for floating point argument. In \CFA all of these functions are renamed to simply @abs@. This causes multiple overloads to have the same total cost when some conversion is needed. For example @long x = abs(42);@ could be either calling @long abs(long)@ with the argument 42 converted to @long@ or calling @int abs(int)@ and converting the result to @long@. In this example the choice could be arbitrary because both yields identical results. In some other cases, the choice can have an actual impact on the final result. While testing the effects of using the updated cost rule we found this piece of code in \CFA standard library:
     83
     84\begin{cfa}
     85    static inline __readyQ_avg_t __to_readyQ_avg(unsigned long long intsc) {
     86        if(unlikely(0 == intsc)) return 0.0;
     87        else return log2(intsc); // implicit conversion happens here
     88    } // __readyQ_avg_t is defined to be double
     89\end{cfa}
     90
     91This is a helper function for performance logging that calculate the geometric mean of a counter value, and it does so by summing up the logarithm value of the counter. The function @log2@ is similarly overloaded in \CFA for both integer and floating point types, however in this case, the integer overload returns an integer, so the fractional part of logarithm is truncated. With the previous cost rules the integer version of @log2@ is selected, and when experimenting the updated cost rules this got picked up as an ambiguous expression at first. I reported this issue to the author of library code and got the reply that the expectation was that \CFA would choose the floating point overload, by the return type overloading selection. This mistake went unnoticed since it is only inside a performance logging function and does not serve any critical purposes, and the only effect it has caused is that the performance data becomes inaccurate as the fractional parts got truncated before the sum. Investigating this example leads to the decision that matching the return type higher up in the expression tree is prioritized, in case the total expression cost is equal.
     92
     93Another change addresses the issue that C arithmetic expressions have unique meanings governed by the arithmetic promotion rules, however in \CFA they are all modelled as function calls for overload resolution purposes. The previous, partially greedy resolution rules will pick the locally optimal match and it matches the C rules naturally. Care needs to be taken to maintain the C semantics when switching to the total expression cost approach.
     94
     95This problem is already partially recognized, when Aaron Moss suggested overload resolution by total cost, in the form of handling cast expressions. To quote directly the example:
     96
     97If a cast argument has an unambiguous interpretation as a conversion argument then it must be interpreted as such, even if the ascription interpretation would have a lower overall cost. This is demonstrated in the following example, adapted from the C standard library:
     98
     99\begin{cfa}
     100    unsigned long long x;
     101    (unsigned)(x >> 32);
     102\end{cfa}
     103
     104In C semantics, this example is unambiguously upcasting 32 to @unsigned long long@, performing the shift, then downcasting the result to @unsigned@, at cost (1, 0, 3, 1, 0, 0, 0). If ascription were allowed to be a first-class interpretation of a cast expression, it would be cheaper to select the @unsigned@ interpretation of @?>>?@ by downcasting x to @unsigned@ and upcasting 32 to @unsigned@, at a total cost of (1, 0, 1, 1, 0, 0, 0). However, this break from C semantics is not backwards compatible, so to maintain C compatibility, the \CFA resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to @unsigned long long@ in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.
     105
     106However, a cast expression is not necessary to have such inconsistency to C semantics. With any implicit argument-parameter type conversion in function calls we can replicate this issue without an explicit cast. For example
     107
     108\begin{cfa}
     109    unsigned long long x;
     110    void f (unsigned);
     111    f (x >> 32);
     112\end{cfa}
     113
     114This has the same effect as using an explicit cast to coerce the type of expression @x >> 32@ to @unsigned@. This shows that fundamentally the problem is not coming from the cast expressions, but modelling the C built-in operators as overloaded functions. A different rule is enforced in selecting the built-in function candidates to fix this problem. If an expression has any legal interpretations as a C built-in operation, only the lowest cost one is kept, regardless of the result types.
     115
     116
     117\section{Type Unification}
     118
     119Type unification is the algorithm that assigns values to each (free) type parameters such that the types of the provided arguments and function parameters match.
     120
     121\CFA does not attempt to do any type \textit{inference}: it has no anonymous functions (i.e. lambdas, commonly found in functional programming and also used in \CC and Java), and the variable types must all be explicitly defined (no auto typing). This makes the unification problem more tractable in \CFA as the argument types at each call site are usually all specified. There is a single exception case, which happens when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression. A top level expression whose type still contains an unbounded type variable is considered ill-formed as such expression is inherently ambiguous.
     122
     123The unification algorithm in \CFA is originally presented in Richard Bilson's thesis and it has remained as the basis of the algorithm currently in use. Some additions have been made in order to accommodate for the newly added type features to the language. To summarize, the \CFA type unification has two minor variants: an \textit{exact} mode and an \textit{inexact} mode. The inexact mode is applied at top level argument-parameter matching, and attempts to find an assignment to the type variables such that the argument types can be converted to parameter types with minimal cost as defined in the previous section. The exact mode is required since the type matching algorithm operates recursively and the inner types often have to match exactly, for example there is no common type for the pointer types \textbf{int*} and \textbf{long*} while there is for \textbf{int} and \textbf{long}. With the introduction of generic record types, the parameters must match exactly as well; currently there are no covariance or contravariance supported for the generics.
     124
     125One simplification was made to the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed in declarations. The polymorphic function declarations themselves are still treated as function pointer types internally, however the change means that formal parameter types can no longer be polymorphic. Previously it is possible to write function prototypes such as
     126
     127\begin{cfa}
     128    void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
     129\end{cfa}
     130
     131Notably, the unification algorithm implemented in the \CFA compiler has never managed to trace the assertion parameters on the formal types at all, and the problem of determining if two assertion sets are compatible may very likely be undecidable in general, given the ability of synthesizing more complicated types by the nesting of generics. Eventually, the reason of not allowing such constructs is that they mostly do not provide useful type features for actual programming tasks. A subroutine of a program operates on the arguments provided at the call site together with (if any) local and global variables, and even though the subroutine can be polymorphic, the types will be supported at each call site. On each invocation the types to be operate on can be determined from the arguments provided, and therefore there should not be a need to pass a polymorphic function pointer, which can take any type in principle.
     132
     133For example, consider a polymorphic function that takes one argument of type \textbf{T} and another pointer to a subroutine
     134
     135\begin{cfa}
     136    forall (T) void f (T x, forall (U) void (*g) (U));
     137\end{cfa}
     138
     139Making \textbf{g} polymorphic in this context would almost certainly be unnecessary, since it can only be called inside the body of \textbf{f} and the types of the argument would have been known anyways, although it can potentially depend on \textbf{T}. Moreover, requesting a function parameter to be able to potentially work on any input type at all would always impose too much constraint on the arguments, as it only needs to make each calls inside the body of \textbf{f} valid.
     140
     141Rewriting the prototype to
     142
     143\begin{cfa}
     144    forall (T) void f (T x, void (*g) (T));
     145\end{cfa}
     146
     147will be sufficient (or potentially, some compound type synthesized from \textbf{T}), in which case \textbf{g} is no longer a polymorphic type on itself. The "monomorphization" conversion is readily supported in \CFA, either by explicitly assigning a polymorphic function name to a compatible function pointer type, or implicitly done in deducing assertion parameters (which will be discussed in the next section). Such technique can be directly applied to argument passing, which is essentially just assignment to function parameter variables. There might be some edge cases where the supplied subroutine \textbf{g} is called on arguments of different types inside the body of \textbf{f} and so declared as polymorphic, but such use case is rare and the benefit of allowing such constructs seems to be minimal in practice.
     148
     149The result of this change is that the unification algorithm no longer needs to distinguish "open" and "closed" type variables, as the latter is not allowed to exist. The only type variables that need to be handled are those introduced by the \textbf{forall} clause from the function prototype. The subtype relationship between function types is now also rendered redundant since none of the function parameter or return types can be polymorphic, and no basic types or non-polymorphic function types are subtypes of any other type. Therefore the goal of (exact) type unification now simply becomes finding a substitution that produces identical types. The assertion set need to be resolved is also always just the declarations on the function prototype, which also simplifies the assertion satisfaction algorithm by a bit, as will be discussed further in the next section.
     150
     151The type unification results are stored in a type environment data structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and some other extra information, such as whether the bound type is allowed to be opaque (i.e. a forward declaration without definition in scope), and whether the bounds are allowed to be widened. In the more general approach commonly used in functional languages, the unification variables are given a lower bound and an upper bound to account for covariance and contravariance of types. \CFA currently does not implement any variance with its generic types, and does not allow polymorphic function types, therefore no explicit upper bound is needed and one simple binding value for each equivalence class suffices. However, since type conversions are allowed in \CFA, the type environment needs to keep track on which type variables are allowed conversions. This behavior is notably different from \CC template argument deduction which enforces an exact match everywhere unless the template argument types are explicitly given. For example, a polymorphic maximum function in \CFA can be called with arguments of different arithmetic types and the result follows the usual arithmetic conversion rules, while such expression is not allowed by \CC:
     152
     153\begin{cfa}
     154    forall (T | {int ?<? (T, T); }) T max (T, T);
     155
     156    max (42, 3.14); // OK, T=double; requires explicit type annotation in C++ such as max<double>(42, 3.14);
     157\end{cfa}
     158
     159The current \CFA documentation does not include a formal set of rules for type unification. In practice, the algorithm implemented in the \CFA translator can be summarized as follows, given a function signature forall$(T_1,..., T_n) f(p_1, ..., p_m)$ and argument types $(a_1, ..., a_m)$, the unification algorithm performs the following steps: \footnote{This assumes argument tuples are already expanded to the individual components.}
     160
     161\begin{enumerate}
     162    \item The type environment is initialized as the union of all type environments of the arguments, plus $(T_1,...,T_n)$ as free variables. The inclusion of argument environments serves the purpose of resolving polymorphic return types that needs to be deduced.
     163    \item Initially, all type variables
     164
     165\end{enumerate}
     166
     167
     168
     169
     170
     171
     172
     173\section{Satisfaction of Assertions}
     174
     175The assertion satisfaction problem greatly increases the complexity of \CFA expression resolution. Past experiments have shown that the majority of time is spent in resolving the assertions for those expressions that takes the longest time to resolve. Even though a few heuristics-based optimizations are introduced to the compiler now, this remains to be the most costly part of compiling a \CFA program. The major difficulty of resolving assertions is that the problem can become recursive, since the expression used to satisfy an outstanding assertion can have its own assertions, and in theory this can go on indefinitely. Detecting infinite recursion cases in general is not algorithmically possible and it is not attempted in the compiler. Instead, a fixed maximum depth of recursive assertions is imposed. This approach is also taken by \CC compilers as template argument deduction is also similarly undecidable in general.
     176
     177In previous versions of \CFA this number was set at 4; as the compiler becomes more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and in most occasions it does not lead to trouble. Very rarely there will be a case where the infinite recursion produces an exponentially growing assertion set, causing minutes of time wasted before the limit is reached. Fortunately it is very hard to run into this situation with realistic \CFA code, and the ones that were found all have some clear characteristics, which can be prevented by some clever tricks. In fact, some of the performance optimizations come from analyzing these problematic cases. One example of such will be presented later in this section.
     178
     179While the assertion satisfaction problem in isolation looks like just another expression to resolve, the recursive nature makes some techniques applied to expression resolution without assertions no longer possible. The most significant impact is that the type unification has a side effect, namely editing the type environment (equivalence classes and bindings), which means that if one expression has multiple associated assertions, they are not independent as the changes to the type environment must be compatible for all the assertions to be resolved. Particularly, if one assertion parameter can be resolved in multiple different ways, all of the results need to be checked to make sure the change to type variable bindings are compatible with other assertions to be resolved. A naive algorithm that simply picks any pending assertion to resolve and continue in a depth-first search could be very inefficient and especially prone of falling into an infinite loop, while in many cases it can be avoided by examining other assertions that can provide insight on the desired type binding: if one assertion parameter can only be matched by a unique option, we can then update the type bindings confidently without the need of backtracking.
     180
     181The algorithm currently used in \CFA compiler is designed by Aaron Moss through a simplified prototype experiment that captures most of \CFA type system features and ported back to the actual language. It can be described as a mix of breadth- and depth-first search in a staged approach.
     182
     183To resolve a set of assertions, the algorithm first attempts to resolve each assertion item individually. There are three possible outcomes on resolving each assertion:
     184
     185\begin{enumerate}
     186    \item If no matches are found, the algorithm terminates with a failure.
     187    \item If exactly one match is found, the type environment is updated immediately, and used in resolving any remaining assertions.
     188    \item If multiple matches are found, the assertion candidates with their updated type environments are stored in a list that will be checked for compatibility at the end.
     189\end{enumerate}
     190
     191When all assertion items are resolved successfully, the algorithm attempts to combine the ambiguously resolved assertions to produce mutually compatible assignments. If any new assertions are introduced by the selected candidates, the algorithm is applied recursively, until there are none pending resolution or the recursion limit is reached which results in a failure.
     192
     193It has been discovered in practice that the efficiency of such algorithm can sometimes be very sensitive to the order of resolving assertions. Suppose an unbound type variable @T@ appears in two assertions, one can be uniquely resolved and allow the type @T@ to be inferred immediately, and another has many ways to be resolved, each results in @T@ being bound to a different concrete type. If the first assertion is examined first by the algorithm, the deducted type can then be utilized in resolving the second assertion and eliminate many incorrect options without producing the list of candidates pending further checks. In practice, this have a significant impact when an unbound type @T@ is declared to satisfy the basic "object assertions"\footnote{The term is borrowed from object-oriented languages although \CFA is not object-oriented in principle.} of having a default constructor, destructor, and copy assignment operations. Since they are defined for every type currently in scope, there are often hundreds or even thousands of matches to these functions with an unspecified operand type, and in most of the cases the value of @T@ can be deduced by resolving another assertion first, which then allows specific object lifetime functions to be looked up since they are sorted internally by the operand type, and greatly reduces the number of wasted resolution attempts.
     194
     195Currently this issue also causes the capability of the assertion resolution algorithm to be limited. Assertion matching is implemented to be more restricted than expression resolution in general, in that the parameter types must match exactly, rather than just merely callable. If one function declaration includes an assertion of @void f(T)@ and only a @f(long)@ is currently in scope, trying to resolve the assertion with @T=int@ would not work. Loosening the assertion matching requirement causes many assertion variables to have multiple matches and makes the delayed combination step too costly.
     196
     197Given all the issues caused by assertion resolution potentially creating new type variable bindings, a natural progression is to put some restrictions on free type variables such that all the type variables will be bound when the expression reaches assertion resolution stage. A type variable introduced by the @forall@ clause of function declaration can appear in parameter types, return types and assertion variables. If it appears in the parameter types, it will be bound when matching the arguments to parameters at the call site. If it only appears in the return type, it can be eventually figured out by the context in principle. The current implementation in \CFA compiler does not do enough return type deduction as it performs eager assertion resolution, and the return type information cannot be known in general before the parent expression is resolved, unless the expression is in an initialization context, in which the type of variable to be initialized is certainly known. By delaying the assertion resolution until the return type becomes known, this problem can be circumvented. The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in some assertion variables. Such case is very rare and it is not evident that forcing every type variable to appear at least once in parameter or return types limits the expressiveness of \CFA type system to a significant extent. In the next chapter I will discuss about a proposal of including type declarations in traits rather than having all type variables appear in the trait parameter list, which could be helpful for providing equivalent functionality of having an unbound type parameter in assertion variables, and also addressing some of the variable cost issue discussed in section 4.1.
     198
     199\subsection*{Caching Assertion Resolution Results}
     200
     201In Aaron Moss's prototype design and experiments, a potential optimization of caching the result of already resolved assertions is discussed. Based on the experiment results, this approach can improve the performance of expression resolution in general, and sometimes allow hard instances of assertion resolution problems to be solved that are otherwise infeasible, for example when the resolution would encounter infinite loops.
     202
     203The problem that makes this approach tricky to be implemented correctly is that the resolution algorithm has side effects, namely modifying the type bindings in the environment. If we were to cache those results that cause the type bindings to be modified, it would be necessary to store the changes to type bindings too, and in case where multiple candidates can be used to satisfy one assertion parameter, all of them needs to be cached including those that are not eventually selected, since the side effect can produce different results depending on the context.
     204
     205In the original design of \CFA that includes unrestricted polymorphic formal parameters that can have assertions on themselves, the problem is even more complicated as new declarations can be introduced in scope during expression resolution. Here is one such example taken from Bilson:
     206
     207\begin{cfa}
     208    void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
     209    forall( U, V | { U -?( U ); V -?( V ); } ) U g( U, V ) );
     210    f( g );
     211\end{cfa}
     212
     213The inner assertion parameter on the \textit{closed} type variable @T@ is used to satisfy the assertions on @U@ and @V@ in this example.
     214
     215However, as per the previous discussions on this topic, polymorphic function pointer types have been removed from \CFA, since correctly implementing assertion matching is not possible in general. Without closed parameters (and therefore no have-set for assertions) the set of declarations in scope remains unchanged while resolving any expression. The current \CFA implementation also does not attempt to widen any already bound type parameters to satisfy an assertion. Note that such restriction does mean that certain kinds of expressions cannot be resolved, for example:
     216
     217\begin{cfa}
     218    forall (T | {void f(T);}) void g(T);
     219    void f (long);
     220    g(42);
     221\end{cfa}
     222
     223The call @g(42)@ is rejected since no attempt is made to widen the parameter type @T@ from @int@ to @long@. Such problem could be mitigated if we allow inexact matches of assertions, but cannot be eliminated completely, if @T@ is matched in a parameterized type, including pointers and references:
     224
     225\begin{cfa}
     226    forall (T | {void f(T*);}) void g(T);
     227    void f (long *);
     228    g(42);
     229\end{cfa}
     230
     231Here the only way to resolve the call @g(42)@ is to allow assertion resolution to widen the parameter @T@, since even with inexact matching, @int*@ cannot be converted to @long*@.
     232
     233
     234
     235\section{Compiler Implementation Considerations}
     236
     237
  • doc/theses/fangren_yu_MMath/performance.tex

    re561551 ra03ed29  
    1 \chapter{Performance}
     1\chapter{Future Work}
    22
    3 If there are any performance experiments.
     3These are some feature requests related to type system enhancement that come up during the development of \CFA language and library, but have not been implemented yet. A lot of the current implementations have to work around the limits of the language features and sometimes lead to inefficiency.
     4
     5\section{Closed trait types}
     6
     7\CFA as it currently is does not have any closed types, as new functions can be added at any time. It is also possible to locally declare a function\footnote{Local functions are not a standard feature in C but supported by mainstream C compilers such as gcc, and allowed in \CFA too.} or a function pointer to make a type satisfy a certain trait temporarily and be used as such in this limited scope. However, the lack of closed types in such a "duck typing" scheme proposes two problems. For library implementors, it is common to not want the defined set of operations to be overwritten and cause the behavior of polymorphic invocations to change. For the compiler, it means caching and reusing the result of resolution is not reliable as newly introduced declarations can participate in assertion resolution, making a previously invalid expression valid, or the other way around by introducing ambiguity in assertions. Sometimes those interfaces are fairly complicated, for example the I/O library traits \textbf{istream} and \textbf{ostream} each has over 20 operations. Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation in the source code (mainly the pipe operator \textbf{?|?} used to represent stream operations in \CFA) it has to resolve the same set of assertions again, causing a lot of repetitive work. Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit. Introducing a way to eliminate the need of doing such repetitive assertion resolutions that are very unlikely to change by new overloads can therefore provide significant improvement to the performance of the compiler.
     8
     9\section{Associated Types}
     10
     11The analysis presented in section 4.3 shows that if we mandate that all type parameters have to be bound before assertion resolution, the complexity of resolving assertions become much lower as every assertion parameter can be resolved independently. By fully utilizing information from higher up the expression tree for return value overloading, most of the type bindings can be resolved. However there are certain scenarios where we would like to have some intermediate types to be involved in certain operations, that are neither input nor output types.
     12
     13\CFA standard library provides a few polymorphic container types similar to those found in \CC standard template library. Naturally, we would like to have a harmonized iterator interface for different container types. The feature is still under development and has not been finalized, but for the purpose of this discussion, we will be using a similar approach to the \CC standard iterator contract. The equivalent type signatures can be given in \CFA trait as:
     14
     15\begin{cfa}
     16    forall (Container, Iter, Elem) trait iterable {
     17        Iter begin(Container);
     18        Iter end(Container);
     19        Elem& *?(Iter);
     20        Iter ++?(Iter);
     21        bool ?==?(Iter,Iter);
     22    }
     23\end{cfa}
     24
     25These are the exact operators required in \CC for the for-loop comprehension
     26@for (Elem & e: container)@. The problem with this trait declaration is that the iterator type @Iter@ has to be explicitly given in the trait parameter list, but is intended to be deduced from the @begin@ and @end@ operations on the container type; and the same can be said for the element type. The iterable trait is meant to describe a property for the container type but involves two additional types, one for the iterator type and one for the element type. If we were to disallow unbound type parameters in assertions without adding any features to the type system, we will not be able to describe this iterable property in \CFA.
     27
     28To solve this problem, I propose adding associate type declarations in addition to the existing (free) @forall@ type parameters. In the body of a trait declaration, new types can be introduced as the return type of an expression involving the type parameters. Following the above example, the iterator contract could be rewritten to
     29\begin{cfa}
     30    forall (Container) trait iterable {
     31        type Iter = begin(Container);
     32        type Elem& = *?(Iter);
     33        Iter end(Container);
     34        Iter ++?(Iter);
     35        bool ?==?(Iter,Iter);
     36    }
     37\end{cfa}
     38
     39This matches conceptually better that the iterable trait is about one type (the container) rather than about three types. The iterator type and element type can be viewed as properties of the container types, not independent type parameters.
     40
     41There remains one design decision to be made, that is whether the expressions that define associated types need to be uniquely resolved. If the associated type does not appear in parameter or return types, having such requirement seems to be reasonable, because the expression cost does not consider any conversions that occur in assertion parameter matching, which essentially means having multiple ways to resolve an associated type always results in ambiguity. A more interesting case would be that the associated type also appears in the return type, where both resolving the return type overload based on context and resolving the assertion that defines the associate type can help, and for the whole expression to resolve, they must agree on a common result. Moss gave the following example to illustrate \CFA assertion system's expressiveness that could be viewed as having an associated type as return, and can potentially vary based on the context:
     42
     43\begin{cfa}
     44    forall(Ptr, Elem) trait pointer_like {
     45        Elem& *?(Ptr); // Ptr can be dereferenced to Elem
     46    };
     47    struct list {
     48        int value;
     49        list* next; // may omit struct on type names
     50    };
     51    typedef list* list_iterator;
     52    int& *?(list_iterator it) {
     53        return it->value;
     54    }
     55\end{cfa}
     56
     57Note that the type @list*@ satisfies both @pointer_like(list*, int)@ and @pointer_like(list*, list)@ (the latter by the built-in pointer dereference operator) and the expression @*it@ could be either a @struct list@ or an @int@. Requiring associated types to be unique would make the @pointer_like@ trait inapplicable to @list*@ here, which is not desirable. I have not attempted to implement associated types in \CFA compiler, but based on the above discussions, one option could be to make associated type resolution and return type overloading coexist: when the associated type appears in the returns, we deduce it from the context and then verify the trait with ordinary assertion resolution; when it does not appear in the returns, we instead require the type to be uniquely determined by the expression that defines the associated type.
     58
     59\section{User-defined conversions}
  • doc/theses/jiada_liang_MMath/CFAenum.tex

    re561551 ra03ed29  
    11\chapter{\CFA Enumeration}
    2 
    32
    43% \CFA supports C enumeration using the same syntax and semantics for backwards compatibility.
     
    98\begin{clang}[identifierstyle=\linespread{0.9}\it]
    109$\it enum$-specifier:
    11         enum @(type-specifier$\(_{opt}\)$)@ identifier$\(_{opt}\)$ { enumerator-list-noinit }
    12         enum @(type-specifier$\(_{opt}\)$)@ identifier$\(_{opt}\)$ { enumerator-list-noinit , }
     10        enum @(type-specifier$\(_{opt}\)$)@ identifier$\(_{opt}\)$ { cfa-enumerator-list }
     11        enum @(type-specifier$\(_{opt}\)$)@ identifier$\(_{opt}\)$ { cfa-enumerator-list , }
    1312        enum @(type-specifier$\(_{opt}\)$)@ identifier
    14 enumerator-list-noinit:
     13cfa-enumerator-list:
     14        cfa-enumerator
     15        cfa-enumerator, cfa-enumerator-list
     16cfa-enumerator:
    1517        enumeration-constant
    16         enumerator-list-noinit , enumeration-constant
     18        $\it inline$ identifier
     19        enumeration-constant = expression
    1720\end{clang}
    18 \CFA enumerations, or \CFA enums, have optional type declaration in a bracket next to the enum keyword.
     21A \newterm{\CFA enumeration}, or \newterm{\CFA enum}, has an optional type declaration in the bracket next to the @enum@ keyword.
    1922Without optional type declarations, the syntax defines "opaque enums".
    2023Otherwise, \CFA enum with type declaration are "typed enums".
     
    284287Note, the validity of calls is the same for call-by-reference as for call-by-value, and @const@ restrictions are the same as for other types.
    285288
    286 
    287 
    288 \section{Enumeration Traits}
    289 
    290 \CFA defines the set of traits containing operators and helper functions for @enum@.
    291 A \CFA enumeration satisfies all of these traits allowing it to interact with runtime features in \CFA.
    292 Each trait is discussed in detail.
    293 
    294 The trait @CfaEnum@:
    295 \begin{cfa}
    296 forall( E ) trait CfaEnum {
    297         char * label( E e );
    298         unsigned int posn( E e );
    299 };
    300 \end{cfa}
    301 
    302 describes an enumeration as a named constant with position. And @TypeEnum@
    303 \begin{cfa}
    304 forall( E, V ) trait TypeEnum {
    305         V value( E e );
    306 };     
    307 \end{cfa}
    308 asserts two types @E@ and @T@, with @T@ being the base type for the enumeration @E@.
    309 
    310 The declarative syntax
    311 \begin{cfa}
    312 enum(T) E { A = ..., B = ..., C = ... };
    313 \end{cfa}
    314 creates an enumerated type E with @label@, @posn@ and @value@ implemented automatically.
    315 
    316 \begin{cfa}
    317 void foo( T t ) { ... }
    318 void bar(E e) {
    319         choose (e) {
    320                 case A: printf("\%d", posn(e));
    321                 case B: printf("\%s", label(e));
    322                 case C: foo(value(e));
    323         }
    324 }
    325 \end{cfa}
    326 
    327 Implementing general functions across all enumeration types is possible by asserting @CfaEnum( E, T )@, \eg:
    328 \begin{cfa}
    329 #include <string.hfa>
    330 forall( E, T | CfaEnum( E, T ) | {unsigned int toUnsigned(T)} )
    331 string formatEnum( E e ) {
    332         unsigned int v = toUnsigned(value(e));
    333         string out = label(e) + '(' + v +')';
    334         return out;
    335 }
    336 printEunm( Week.Mon );
    337 printEnum( RGB.Green );
    338 \end{cfa}
    339 
    340 \CFA does not define attribute functions for C style enumeration. But it is possilbe for users to explicitly implement
    341 enumeration traits for C enum and any other types.
    342 
    343 \begin{cfa}
    344 enum Fruit { Apple, Bear, Cherry };                     $\C{// C enum}$
    345 char * label(Fruit f) {
    346         switch(f) {
    347                 case Apple: "A"; break;
    348                 case Bear: "B"; break;
    349                 case Cherry: "C"; break;
    350         }
    351 }
    352 unsigned posn(Fruit f) { return f; }
    353 char* value(Fruit f) { return ""; }             $\C{// value can return any non void type}$
    354 formatEnum( Apple );                                                    $\C{// Fruit is now a Cfa enum}$
    355 \end{cfa}
    356 
    357 A type that implements trait @CfaEnum@, \ie, a type has no @value@, is called an opaque enum.
    358 
    359 % \section{Enumerator Opaque Type}
    360 
    361 % \CFA provides a special opaque enumeration type, where the internal representation is chosen by the compiler and only equality operations are available.
    362 \begin{cfa}
    363 enum@()@ Planets { MERCURY, VENUS, EARTH, MARS, JUPITER, SATURN, URANUS, NEPTUNE };
    364 \end{cfa}
    365 
    366 
    367 In addition, \CFA implements @Bound@ and @Serial@ for \CFA Enums.
    368 \begin{cfa}
    369 forall( E ) trait Bounded {
    370         E first();
    371         E last();
    372 };
    373 \end{cfa}
    374 The function @first()@ and @last()@ of enumerated type E return the first and the last enumerator declared in E, respectively. \eg:
    375 \begin{cfa}
    376 Workday day = first();                                  $\C{// Mon}$
    377 Planet outermost = last();                              $\C{// NEPTUNE}$
    378 \end{cfa}
    379 @first()@ and @last()@ are overloaded with return types only, so in the example, the enumeration type is found on the left-hand side of the assignment.
    380 Calling either functions without a context results in a type ambiguity, except in the rare case where the type environment has only one enumeration.
    381 \begin{cfa}
    382 @first();@                                                              $\C{// ambiguous because both Workday and Planet implement Bounded}$
    383 sout | @last()@;
    384 Workday day = first();                                  $\C{// day provides type Workday}$
    385 void foo( Planet p );
    386 foo( last() );                                                  $\C{// parameter provides type Planet}$
    387 \end{cfa}
    388 
    389 The trait @Serial@:
    390 \begin{cfa}
    391 forall( E | Bounded( E ) ) trait Serial {
    392         unsigned fromInstance( E e );
    393         E fromInt( unsigned int posn );
    394         E succ( E e );
    395         E pred( E e );
    396 };
    397 \end{cfa}
    398 is a @Bounded@ trait, where elements can be mapped to an integer sequence.
    399 A type @T@ matching @Serial@ can project to an unsigned @int@ type, \ie an instance of type T has a corresponding integer value.
    400 %However, the inverse may not be possible, and possible requires a bound check.
    401 The mapping from a serial type to integer is defined by @fromInstance@, which returns the enumerator's position.
    402 The inverse operation is @fromInt@, which performs a bound check using @first()@ and @last()@ before casting the integer into an enumerator.
    403 Specifically, for enumerator @E@ declaring $N$ enumerators, @fromInt( i )@ returns the $i-1_{th}$ enumerator, if $0 \leq i < N$, or raises the exception @enumBound@.
    404 
    405 The @succ( E e )@ and @pred( E e )@ imply the enumeration positions are consecutive and ordinal.
    406 Specifically, if @e@ is the $i_{th}$ enumerator, @succ( e )@ returns the $i+1_{th}$ enumerator when $e \ne last()$, and @pred( e )@ returns the $i-1_{th}$ enumerator when $e \ne first()$.
    407 The exception @enumRange@ is raised if the result of either operation is outside the range of type @E@.
    408 
    409 Finally, there is an associated trait defining comparison operators among enumerators.
    410 \begin{cfa}
    411 forall( E, T | CfaEnum( E, T ) ) {
    412         // comparison
    413         int ?==?( E l, E r );           $\C{// true if l and r are same enumerators}$
    414         int ?!=?( E l, E r );           $\C{// true if l and r are different enumerators}$
    415         int ?!=?( E l, zero_t );        $\C{// true if l is not the first enumerator}$
    416         int ?<?( E l, E r );            $\C{// true if l is an enumerator before r}$
    417         int ?<=?( E l, E r );           $\C{// true if l before or the same as r}$
    418         int ?>?( E l, E r );            $\C{// true if l is an enumerator after r}$
    419         int ?>=?( E l, E r );           $\C{// true if l after or the same as r}$         
    420 }
    421 \end{cfa}
    422 
    423 
    424 
    425 
    426 
    427289\section{Enumerator Control Structures}
    428290
     
    433295
    434296For example, an intuitive use of enumerations is with the \CFA @switch@/@choose@ statement, where @choose@ performs an implicit @break@ rather than a fall-through at the end of a @case@ clause.
    435 \begin{cquote}
    436 \begin{cfa}
     297(For this discussion, ignore the fact that @case@ requires a compile-time constant.)
     298\begin{cfa}[belowskip=0pt]
    437299enum Count { First, Second, Third, Fourth };
    438300Count e;
    439301\end{cfa}
    440 \begin{tabular}{ll}
    441 \begin{cfa}
     302\begin{cquote}
     303\setlength{\tabcolsep}{15pt}
     304\noindent
     305\begin{tabular}{@{}ll@{}}
     306\begin{cfa}[aboveskip=0pt]
    442307
    443308choose( e ) {
     
    449314\end{cfa}
    450315&
    451 \begin{cfa}
     316\begin{cfa}[aboveskip=0pt]
    452317// rewrite
    453318choose( @value@( e ) ) {
     
    465330enum Count { First, Second, Third @= First@, Fourth };
    466331\end{cfa}
    467 which make @Third == First@ and @Fourth == Second@, causing a compilation error because of duplicate @case@ clauses.
     332making @Third == First@ and @Fourth == Second@, causing a compilation error because of duplicate @case@ clauses.
    468333To better match with programmer intuition, \CFA toggles between value and position semantics depending on the language context.
    469334For conditional clauses and switch statements, \CFA uses the robust position implementation.
    470335\begin{cfa}
    471 choose( @position@( e ) ) {
    472         case @position@( First ): ...;
    473         case @position@( Second ): ...;
    474         case @position@( Third ): ...;
    475         case @position@( Fourth ): ...;
    476 }
    477 \end{cfa}
    478 
    479 \begin{cfa}
    480 Count variable_a = First, variable_b = Second, variable_c = Third, variable_d = Fourth;
    481 p(variable_a); // 0
    482 p(variable_b); // 1
    483 p(variable_c); // "Third"
    484 p(variable_d); // 3
    485 \end{cfa}
    486 
    487 \begin{cfa}
    488 for (d; Workday) { sout | d; }
    489 for (p; +~=Planet) { sout | p; }
    490 for (c: -~=Alphabet ) { sout | c; }
    491 \end{cfa}
    492 The @range loop@ for enumeration is a syntax sugar that loops over all enumerators and assigns each enumeration to a variable in every iteration.
    493 The loop control of the range loop consists of two parts: a variable declaration and a @range expression@, with the type of the variable
    494 can be inferred from the range expression.
    495 
    496 The range expression is an enumeration type, optionally prefixed by @+~=@ or @-~=@. Without a prefix, or prefixed with @+~=@, the control
    497 loop over all enumerators from the first to the last. With a @-~=@ prefix, the control loops backward.
    498 
    499 On a side note, the loop syntax
    500 \begin{cfa}
    501 for ( typeof(Workday) d; d <= last(); d = succ(d) );
    502 \end{cfa}
    503 does not work. When d == last(), the loop control will still attempt to assign succ(d) to d, which causes an @enumBound@ exception.
    504 
    505 \CFA reduces conditionals to its "if case" if the predicate is not equal to ( @!=@ ) zero, and the "else case" otherwise.
    506 Overloading the @!=@ operator with an enumeration type against the zero defines a conceptual conversion from
    507 enum to boolean, which can be used as predicates.
    508 
     336if ( @posn@( e ) < posn( Third ) ) ...
     337choose( @posn@( e ) ) {
     338        case @posn@( First ): ...;
     339        case @posn@( Second ): ...;
     340        case @posn@( Third ): ...;
     341        case @posn@( Fourth ): ...;
     342}
     343\end{cfa}
     344
     345\CFA provides a special form of for-control for enumerating through an enumeration, where the range is a type.
     346\begin{cfa}
     347for ( cx; @Count@ ) { sout | cx | nonl; } sout | nl;
     348for ( cx; +~= Count ) { sout | cx | nonl; } sout | nl;
     349for ( cx; -~= Count ) { sout | cx | nonl; } sout | nl;
     350First Second Third Fourth
     351First Second Third Fourth
     352Fourth Third Second First
     353\end{cfa}
     354The enumeration type is syntax sugar for looping over all enumerators and assigning each enumerator to the loop index, whose type is inferred from the range type.
     355The prefix @+~=@ or @-~=@ iterate forward or backwards through the inclusive enumeration range, where no prefix defaults to @+~=@.
     356
     357C has an idiom for @if@ and loop predicates of comparing the predicate result ``not equal to 0''.
     358\begin{cfa}
     359if ( x + y /* != 0 */ ) ...
     360while ( p /* != 0 */ ) ...
     361\end{cfa}
     362This idiom extends to enumerations because there is a boolean conversion in terms of the enumeration value, if and only if such a conversion is available.
     363For example, such a conversion exists for all numerical types (integral and floating-point).
     364It is possible to explicitly extend this idiom to any typed enumeration by overloading the @!=@ operator.
     365\begin{cfa}
     366bool ?!=?( Name n, zero_t ) { return n != Fred; }
     367Name n = Mary;
     368if ( n ) ... // result is true
     369\end{cfa}
     370Specialize meanings are also possible.
    509371\begin{cfa}
    510372enum(int) ErrorCode { Normal = 0, Slow = 1, Overheat = 1000, OutOfResource = 1001 };
    511 bool ?!=?(ErrorCode lhs, zero_t) { return value(lhs) >= 1000; }
    512 ErrorCode code = /.../
    513 if (code) { scream(); }
    514 \end{cfa}
    515 
    516 Incidentally, \CFA does not define boolean conversion for enumeration. If no
    517 @?!=?(ErrorCode, zero_t)@
    518 overloading defined,
    519 \CFA looks for the boolean conversion in terms of its value and gives a compiler error if no such conversion is available.
    520 
    521 \begin{cfa}
    522 enum(int) Weekday { Mon, Tues, Wed, Thurs, Fri, Sat, Sun, };
    523 enum() Colour { Red, Green, Blue };
    524 enum(S) Fruit { Apple, Banana, Cherry }
    525 Weekday w = ...; Colour c = ...; Fruit f = ...;
    526 if (w) { ... } // w is true if and only if w != Mon, because value(Mon) == 0 (auto initialized)
    527 if (c) { ... } // error
    528 if (s) { ... } // depends on ?!=?(S lhs, zero_t ), and error if no such overloading available
    529 \end{cfa}
    530 
    531 As an alternative, users can define the boolean conversion for CfaEnum:
    532 
    533 \begin{cfa}
    534 forall(E | CfaEnum(E))
    535 bool ?!=?(E lhs, zero_t) {
    536         return posn(lhs) != 0;
    537 }
    538 \end{cfa}
    539 which effectively turns the first enumeration as a logical zero and non-zero for others.
    540 
    541 \section{Enumerated Arrays}
    542 Enumerated arrays use an \CFA array as their index.
    543 \begin{cfa}
    544 enum() Colour {
    545         Red, Orange, Yellow, Green, Blue, Indigo, Violet
    546 };
    547 
    548 string colourCode[Colour] = { "#e81416", "#ffa500", "#ffa500", "#ffa500", "#487de7", "#4b369d", "#70369d" };
    549 sout | "Colour Code of Orange is " | colourCode[Orange];
    550 \end{cfa}
     373bool ?!=?( ErrorCode ec, zero_t ) { return ec >= Overheat; }
     374ErrorCode code = ...;
     375if ( code ) { problem(); }
     376\end{cfa}
     377
     378
     379\section{Enumeration Dimension}
     380
     381\VRef{s:EnumeratorTyping} introduced the harmonizing problem between an enumeration and secondary information.
     382When possible, using a typed enumeration for the secondary information is the best approach.
     383However, there are times when combining these two types is not possible.
     384For example, the secondary information might precede the enumeration and/or its type is needed directly to declare parameters of functions.
     385In these cases, having secondary arrays of the enumeration size are necessary.
     386
     387To support some level of harmonizing in these cases, an array dimension can be defined using an enumerator type, and the enumerators used as subscripts.
     388\begin{cfa}
     389enum E { A, B, C, N }; // possibly predefined
     390float H1[N] = { [A] : 3.4, [B] : 7.1, [C] : 0.01 }; // C
     391float H2[@E@] = { [A] : 3.4, [B] : 7.1, [C] : 0.01 }; // CFA
     392\end{cfa}
     393(Note, C uses the symbol, @'='@ for designator initialization, but \CFA had to change to @':'@ because of problems with tuple syntax.)
     394This approach is also necessary for a predefined typed enumeration (unchangeable), when additional secondary-information need to be added.
    551395
    552396
     
    568412\small
    569413\begin{cfa}
    570 struct MR { double mass, radius; };
     414struct MR { double mass, radius; };                     $\C{// planet definition}$
    571415enum( @MR@ ) Planet {                                           $\C{// typed enumeration}$
    572416        //                      mass (kg)   radius (km)
  • doc/theses/jiada_liang_MMath/background.tex

    re561551 ra03ed29  
    66The following discussion covers C enumerations.
    77
    8 As discussed in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
     8As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
    99\begin{clang}
    1010#define Mon 0
     
    3333C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
    3434\begin{cquote}
    35 \begin{tabular}{@{}l@{}l@{}}
    36 \multicolumn{1}{@{}c@{}}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
     35\begin{tabular}{@{}ll@{}}
     36\multicolumn{1}{@{}c}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
    3737\begin{clang}
    3838static const int one = 0 + 1;
     
    5151        int va[r];
    5252}
    53 
    54 
    5553\end{clang}
    5654\end{tabular}
    5755\end{cquote}
    58 However, statically initialized identifiers can not appear in constant-expression contexts, \eg @case@.
     56However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
    5957Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
    6058Again, this form of aliasing is not an enumeration.
    61 
    6259
    6360\section{C Enumeration}
     
    129126
    130127
    131 \subsection{Representation}
    132 
    133 C standard specifies enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
    134 In practice, C uses @int@ as the underlying type for enumeration variables, because of the restriction to integral constants, which have type @int@ (unless qualified with a size suffix).
     128\subsection{Implementation}
     129\label{s:CenumImplementation}
     130
     131In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
     132In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
     133However, type @int@ is defined as:
     134\begin{quote}
     135A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
     136\end{quote}
     137Howeveer, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
     138Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
     139In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization.
     140\begin{cfa}
     141enum E { IMin = INT_MIN, IMax = INT_MAX,
     142                         ILMin = LONG_MIN, ILMax = LONG_MAX,
     143                         ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
     144int main() {
     145        printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
     146                         sizeof(IMin), IMin, IMax,
     147                         sizeof(ILMin), ILMin, ILMax,
     148                         sizeof(ILLMin), ILLMin, ILLMax );
     149}
     1504 -2147483648 2147483647
     1518 -9223372036854775808 9223372036854775807
     1528 -9223372036854775808 9223372036854775807
     153\end{cfa}
     154Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes.
    135155
    136156\subsection{Usage}
     
    150170enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
    151171switch ( week ) {
    152         case Mon: case Tue: case Wed: case Thu: case Fri:
     172        case Mon ... Fri:                               $\C{// gcc case range}$
    153173                printf( "weekday\n" );
    154174        case Sat: case Sun:
    155175                printf( "weekend\n" );
    156176}
    157 for ( enum Week day = Mon; day <= Sun; day += 1 ) { // step of 1
     177for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
    158178        printf( "day %d\n", day ); // 0-6
    159179}
     
    177197}
    178198\end{cfa}
    179 However, for typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
    180 
    181 This idiom leads to another C idiom using an enumeration with matching companion information.
     199However, for non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
     200
     201This idiom is used in another C idiom for matching companion information.
    182202For example, an enumeration is linked with a companion array of printable strings.
    183203\begin{cfa}
     
    196216
    197217\bigskip
    198 While C provides a true enumeration, it is restricted, has unsafe semantics, and does provide useful enumeration features in other programming languages.
     218While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages.
    199219
    200220\section{\CFA Polymorphism}
     
    238258\subsection{Constructor and Destructor}
    239259In \CFA, all objects are initialized by @constructors@ during its allocation, including basic types,
    240 which are initialized by constructors default-generated by a compiler.
     260which are initialized by auto-generated basic type constructors.
    241261
    242262Constructors are overloadable functions with name @?{}@, return @void@, and have at least one parameter, which is a reference
  • doc/theses/jiada_liang_MMath/implementation.tex

    re561551 ra03ed29  
    11\chapter{Enumeration Implementation}
     2
     3\section{Enumeration Traits}
     4
     5\CFA defines a set of traits containing operators and helper functions for @enum@.
     6A \CFA enumeration satisfies all of these traits allowing it to interact with runtime features in \CFA.
     7Each trait is discussed in detail.
     8
     9The trait @CfaEnum@:
     10\begin{cfa}
     11forall( E ) trait CfaEnum {
     12        const char * @label@( E e );
     13        unsigned int @posn@( E e );
     14};
     15\end{cfa}
     16asserts an enumeration type @E@ has named enumerator constants (@label@) with positions (@posn@).
     17
     18The trait @TypedEnum@ extends @CfaEnum@:
     19\begin{cfa}
     20forall( E, V | CfaEnum( E ) ) trait TypedEnum {
     21        V @value@( E e );
     22};
     23\end{cfa}
     24asserting an enumeration type @E@ can have homogeneous enumerator values of type @V@.
     25
     26The declarative syntax
     27\begin{cfa}
     28enum(T) E { A = ..., B = ..., C = ... };
     29\end{cfa}
     30creates an enumerated type E with @label@, @posn@ and @value@ implemented automatically.
     31\begin{cfa}
     32void foo( T t ) { ... }
     33void bar(E e) {
     34        choose ( e ) {
     35                case A: printf( "\%d", posn( e) );
     36                case B: printf( "\%s", label( e ) );
     37                case C: foo( value( e ) );
     38        }
     39}
     40\end{cfa}
     41
     42Implementing general functions across all enumeration types is possible by asserting @CfaEnum( E, T )@, \eg:
     43\begin{cfa}
     44#include <string.hfa>
     45forall( E, T | CfaEnum( E, T ) | {unsigned int toUnsigned(T)} )
     46string formatEnum( E e ) {
     47        unsigned int v = toUnsigned( value( e ) );
     48        string out = label(e) + '(' + v +')';
     49        return out;
     50}
     51formatEnum( Week.Mon );
     52formatEnum( RGB.Green );
     53\end{cfa}
     54
     55\CFA does not define attribute functions for C-style enumeration.
     56But it is possible for users to explicitly implement enumeration traits for C enum and any other types.
     57\begin{cfa}
     58enum Fruit { Apple, Pear, Cherry };                     $\C{// C enum}$
     59const char * label( Fruit f ) {
     60        choose ( f ) {
     61                case Apple: return "Apple";
     62                case Bear: return "Pear";
     63                case Cherry: return "Cherry";
     64        }
     65}
     66unsigned posn( Fruit f ) { return f; }
     67const char * value( Fruit f ) { return ""; } $\C{// value can return any non void type}$
     68formatEnum( Apple );                                            $\C{// Fruit is now a \CFA enum}$
     69\end{cfa}
     70
     71A type that implements trait @CfaEnum@, \ie, a type has no @value@, is called an opaque enum.
     72
     73% \section{Enumerator Opaque Type}
     74
     75% \CFA provides a special opaque enumeration type, where the internal representation is chosen by the compiler and only equality operations are available.
     76\begin{cfa}
     77enum@()@ Planets { MERCURY, VENUS, EARTH, MARS, JUPITER, SATURN, URANUS, NEPTUNE };
     78\end{cfa}
     79
     80
     81In addition, \CFA implements @Bound@ and @Serial@ for \CFA enumerations.
     82\begin{cfa}
     83forall( E ) trait Bounded {
     84        E first();
     85        E last();
     86};
     87\end{cfa}
     88The function @first()@ and @last()@ of enumerated type E return the first and the last enumerator declared in E, respectively. \eg:
     89\begin{cfa}
     90Workday day = first();                                  $\C{// Mon}$
     91Planet outermost = last();                              $\C{// NEPTUNE}$
     92\end{cfa}
     93@first()@ and @last()@ are overloaded with return types only, so in the example, the enumeration type is found on the left-hand side of the assignment.
     94Calling either functions without a context results in a type ambiguity, except in the rare case where the type environment has only one enumeration.
     95\begin{cfa}
     96@first();@                                                              $\C{// ambiguous because both Workday and Planet implement Bounded}$
     97sout | @last()@;
     98Workday day = first();                                  $\C{// day provides type Workday}$
     99void foo( Planet p );
     100foo( last() );                                                  $\C{// argument provides type Planet}$
     101\end{cfa}
     102
     103The trait @Serial@:
     104\begin{cfa}
     105forall( E | Bounded( E ) ) trait Serial {
     106        unsigned fromInstance( E e );
     107        E fromInt( unsigned int posn );
     108        E succ( E e );
     109        E pred( E e );
     110};
     111\end{cfa}
     112is a @Bounded@ trait, where elements can be mapped to an integer sequence.
     113A type @T@ matching @Serial@ can project to an unsigned @int@ type, \ie an instance of type T has a corresponding integer value.
     114%However, the inverse may not be possible, and possible requires a bound check.
     115The mapping from a serial type to integer is defined by @fromInstance@, which returns the enumerator's position.
     116The inverse operation is @fromInt@, which performs a bound check using @first()@ and @last()@ before casting the integer into an enumerator.
     117Specifically, for enumerator @E@ declaring $N$ enumerators, @fromInt( i )@ returns the $i-1_{th}$ enumerator, if $0 \leq i < N$, or raises the exception @enumBound@.
     118
     119The @succ( E e )@ and @pred( E e )@ imply the enumeration positions are consecutive and ordinal.
     120Specifically, if @e@ is the $i_{th}$ enumerator, @succ( e )@ returns the $i+1_{th}$ enumerator when $e \ne last()$, and @pred( e )@ returns the $i-1_{th}$ enumerator when $e \ne first()$.
     121The exception @enumRange@ is raised if the result of either operation is outside the range of type @E@.
     122
     123Finally, there is an associated trait defining comparison operators among enumerators.
     124\begin{cfa}
     125forall( E, T | CfaEnum( E, T ) ) {
     126        // comparison
     127        int ?==?( E l, E r );           $\C{// true if l and r are same enumerators}$
     128        int ?!=?( E l, E r );           $\C{// true if l and r are different enumerators}$
     129        int ?!=?( E l, zero_t );        $\C{// true if l is not the first enumerator}$
     130        int ?<?( E l, E r );            $\C{// true if l is an enumerator before r}$
     131        int ?<=?( E l, E r );           $\C{// true if l before or the same as r}$
     132        int ?>?( E l, E r );            $\C{// true if l is an enumerator after r}$
     133        int ?>=?( E l, E r );           $\C{// true if l after or the same as r}$         
     134}
     135\end{cfa}
     136
     137As an alternative, users can define the boolean conversion for CfaEnum:
     138
     139\begin{cfa}
     140forall(E | CfaEnum(E))
     141bool ?!=?(E lhs, zero_t) {
     142        return posn(lhs) != 0;
     143}
     144\end{cfa}
     145which effectively turns the first enumeration as a logical zero and non-zero for others.
     146
     147\begin{cfa}
     148Count variable_a = First, variable_b = Second, variable_c = Third, variable_d = Fourth;
     149p(variable_a); // 0
     150p(variable_b); // 1
     151p(variable_c); // "Third"
     152p(variable_d); // 3
     153\end{cfa}
     154
     155
     156\section{Enumeration Variable}
     157
     158Although \CFA enumeration captures three different attributes, an enumeration instance does not store all this information.
     159The @sizeof@ a \CFA enumeration instance is always 4 bytes, the same size as a C enumeration instance (@sizeof( int )@).
     160It comes from the fact that:
     161\begin{enumerate}
     162\item
     163a \CFA enumeration is always statically typed;
     164\item
     165it is always resolved as one of its attributes regarding real usage.
     166\end{enumerate}
     167When creating an enumeration instance @colour@ and assigning it with the enumerator @Color.Green@, the compiler allocates an integer variable and stores the position 1.
     168The invocations of $positions()$, $value()$, and $label()$ turn into calls to special functions defined in the prelude:
     169\begin{cfa}
     170position( green );
     171>>> position( Colour, 1 ) -> int
     172value( green );
     173>>> value( Colour, 1 ) -> T
     174label( green );
     175>>> label( Colour, 1) -> char *
     176\end{cfa}
     177@T@ represents the type declared in the \CFA enumeration defined and @char *@ in the example.
     178These generated functions are $Companion Functions$, they take an $companion$ object and the position as parameters.
     179
    2180
    3181\section{Enumeration Data}
  • doc/theses/jiada_liang_MMath/intro.tex

    re561551 ra03ed29  
    44Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integer and floating-point types.
    55(In \CFA, the constants @0@ and @1@ can be overloaded for any type.)
    6 A constant's symbolic name is dictated by language syntax related to types.
     6A constant's symbolic name is dictated by language syntax related to types, \eg @5.@ (double), @5.0f@ (float), @5l@ (long double).
    77In general, the representation of a constant's value is \newterm{opaque}, so the internal representation can be chosen arbitrarily.
    88In theory, there are an infinite set of constant names per type representing an infinite set of values.
     
    1313
    1414Many programming languages capture this important software-engineering capability through a mechanism called \newterm{constant} or \newterm{literal} naming, where a new constant is aliased to an existing constant.
    15 Its purpose is for readability, replacing a constant name that directly represents a value with a name that is more symbolic and meaningful in the context of the program.
     15Its purpose is for readability: replacing a constant name that directly represents a value with a name that is more symbolic and meaningful in the context of the program.
    1616Thereafter, changing the aliasing of the new constant to another constant automatically distributes the rebinding, preventing errors.
    1717% and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
     
    6767\label{s:Terminology}
    6868
    69 The term \newterm{enumeration} defines a type with a set of new constants, and the term \newterm{enumerator} represents an arbitrary alias name \see{\VRef{s:CEnumeration} for the name derivation}.
    70 As well, an enumerated type can have three fundamental properties, \newterm{label}, \newterm{order}, and \newterm{value}.
     69The term \newterm{enumeration} defines a type with a set of new constants, and the term \newterm{enumerator} represents an arbitrary alias name \see{\VRef{s:CEnumeration} for the name derivations}.
     70An enumerated type can have three fundamental properties, \newterm{label} (name), \newterm{order} (position), and \newterm{value} (payload).
    7171\begin{cquote}
    7272\sf\setlength{\tabcolsep}{3pt}
     
    111111\end{cfa}
    112112The alias name is logically replaced in the program text by its matching constant.
    113 It is possible to compare aliases, if the constants allow it, \eg @Size < Pi@;
    114 whereas \eg @Pi < Name@ might be disallowed depending on the language.
     113It is possible to compare aliases, if the constants allow it, \eg @Size < Pi@, whereas @Pi < Name@ might be disallowed depending on the language.
    115114
    116115Aliasing is not macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language.
     
    133132Any reordering of the enumerators requires manual renumbering.
    134133\begin{cfa}
    135 const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
     134const @Sun = 1@, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
    136135\end{cfa}
    137136For these reasons, aliasing is sometimes called an enumeration.
Note: See TracChangeset for help on using the changeset viewer.