Changeset 1f922f4 for doc/proposals


Ignore:
Timestamp:
Jul 23, 2024, 3:23:04 PM (5 months ago)
Author:
Andrew Beach <ajbeach@…>
Branches:
master
Children:
35c792f
Parents:
719fdbc
Message:

Updated iterator proposal. It was supposed to just go into a bit more detail about ranges, but ended up being a complete rewrite.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/iterators.md

    r719fdbc r1f922f4  
    1010There are two groups of types that interact with this proposal.
    1111
     12Iterable
     13
     14An iterable is a container of some type that has a default iterator that goes
     15over its own contents. It may be a logical container, like a mathematical range,
     16that does not store elements in memory, but it should have logical elements.
     17A trait is defined to standardize how to get that iterator.
     18
     19        forall(C, I)
     20        trait is_iterable {
     21                I get_iterator(C&);
     22        }
     23
     24This is not particularly useful on its own, but is used as standardized glue
     25between some other features below.
     26
    1227Iterator
    1328
    14 An iterator has a very simple interface with a single operation.
    15 The operation is "get the next value in the sequence", but this actually has
    16 several parts, in that it has to check if there are move values, return the
    17 next one if there is one, and update any internal information in the iterator.
    18 For example: `Maybe(Item) next(Iter &);`.
    19 
    20 Now, iterators can have other operations. Notably, they are often also
    21 iterables that return themselves. They can also have a verity of iterator
    22 transformers built in.
    23 
    24 Iterable
    25 
    26 Anything that you can get an iterator from is called an iterable. There
    27 is an operation to get an iterator from an iterable.
    28 
    29 Range For Loop
    30 --------------
    31 One part of the language that could be reworked to make good use of this is
    32 for loops. In short, remove most of the special rules that can be done inside
    33 the identifier and make it a generic range for loop:
    34 
    35     ```
    36     for ( IDENTIFIER ; EXPRESSION ) STATEMENT
    37     ```
    38 
    39 The common way to implement this is that expression produces an iterable.
    40 The for loop gets an iterator from the iterable (which is why iterators are
    41 often iterables, so they can be passed in with the same interface) and stores
    42 it. Then, for each value in the iterator, the loop binds the value to the
    43 identifier and then executes the statement. The loop exits after every value
    44 has been used and the iterator is exhausted.
    45 
    46 For the chained for loop (`for (i; _: j; _)`) can still have its existing
    47 behaviour, advancing through each range in parallel and stopping as soon
    48 as the first one is exhausted.
     29An iterator is an object that can be advanced through a sequence of values.
     30An iterator can be checked to see if it is at the end of the sequence, if not
     31the next value can be retrieved from it and the iterator advanced. This can
     32be done in as little as a single operation:
     33
     34        forall(Iter, Elem)
     35        trait is_iterator {
     36                maybe(Elem) next(Iter&);
     37
     38                // is_iterable(I, I&)
     39                Iter& get_iterator(Iter&);
     40        }
     41
     42This interface is not fixed, the function could change or be split into
     43multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural,
     44but also efficient to implement. This version will be used for examples in
     45the proposal.
     46
     47When next is called, if there are elements left in the iterators, the next
     48element is returned in a maybe and the iterator is advanced. If the iterator
     49is exhausted nothing is changed and an empty maybe is returned.
     50
     51Iterators are also iterables. They must return themselves, this allows
     52iterators to be used directly where iterables are expected.
     53
     54For-Each Loop
     55-------------
     56The main part of this proposal which is part of the language definition,
     57the rest is part of the standard library. And this is the new for-each loop:
     58
     59        for ( IDENTIFIER ; EXPRESSION ) STATEMENT
     60
     61While this uses similar syntax to the existing special for loops, but takes
     62a standard expression. This expression is evaluated, and then the iterator is
     63retrieved from it (this should be the identity function iterators). Each
     64iteration of the loop then checks the iterator to see if it is exhausted, if
     65not it uses the value to execute the statement. If the iterator is exhausted,
     66the loop is completed, and control continues after the loop (or the else
     67clause if provided).
     68
     69The loop should show the same behaviour as the following code:
     70(`typeof(?)` should be decided by the resolver.)
     71
     72        {
     73                typeof(?) __iterable = EXPRESSION;
     74                typeof(?) __iterator = get_iterator(__iterable);
     75                typeof(?) __next = next(__iterator);
     76                while (has_value(&__next)) {
     77                        typeof(?) IDENTIFIER = get(&__next)
     78                        STATEMENT
     79                        __next = next(__iterator);
     80                }
     81        }
     82
     83This new loop can (but may not need to) also support the existing variants,
     84where the name is omitted or multiple iterators are advanced in lock-step:
     85
     86        for ( EXPRESSION ) STATEMENT
     87        for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT
     88
     89These may not be needed. The unnamed variant is a nice short hand.
     90The multiple iterator method could be replaced with an iterator transformer
     91that has the same behaviour:
     92
     93        for (i, j, k; zip(iter, range, container)) { ... }
     94
     95This would require some further extensions, but would also allow single
     96iterators to iterate over sets of related values. It also shows how library
     97features can be used to extend the for-each loop.
     98
     99Containers as Iterables
     100-----------------------
     101The primary iterator application is going over the contents of various
     102collections. Most of the collections should support the iterable interface.
     103Each of these iterators should go over each element in the container.
     104
     105They could support multiple iterator interfaces. For example, a map could go
     106over key-value-pairs, or just keys or just values.
     107
     108There isn't a lot to say here, except that as the implementation progresses,
     109various rules about iterator invalidation should be hammered out.
    49110
    50111Ranges
    51112------
    52 Ranges, which may be a data type or a trait, are containers that contain
    53 a sequence of values. Unlike an array or vector, these values are stored
    54 logically instead of by copy.
    55 
    56 The purpose of this container is to bridge the new iterator interfaces with
    57 the existing range syntax. The range syntax would become an operator that
    58 returns a range object, which can be used as any other type.
    59 
    60 It might not cover every single case with the same syntax (the `@` syntax may
    61 not translate to operators very well), but should be able to maintain every
    62 option with some library range.
    63 
    64 Library Enhancements
    65 --------------------
    66 There are various other tools in the library that should be improved.
    67 The simplest is to make sure most containers are iterables.
    68 
    69 Also, new utilities for manipulating iterators should be created. The exact
    70 list would have to wait but here are some examples.
    71 
    72 Transformers take in an iterator and produce another iterator.
    73 Examples include map, which modifies each element in turn, and filter,
    74 which checks each element and removes the ones that fail.
    75 
    76 Producers create new iterators from other information.
    77 Most practical iterators tend to be iterable containers, which produce all
    78 the elements in the container, this includes ranges. Others include infinite
    79 series of one element.
    80 
    81 Consumers take an iterator and convert it into something else.
    82 They might be converted into a container or used in a for loop. Dedicated
    83 consumers will be some form of folding function.
     113The existing range syntax is incompatible with the for-each loop. But this
     114can be recovered by making at least some of the range syntax into operators.
     115These operators create and return new range types.
     116
     117The ranges are types that map a random numeric range. For iteration, a range
     118is also considered an iterator over the numbers in the range.
     119
     120The there could be up to 6 new operators to create ranges. These are:
     121    `~` `+~` `~=` `+~=` `-~` `-~=`
     122
     123Parsing issues might rule some of these out. Others may be rendered
     124irrelevant by iterator transformers. For example, adding a transformer that
     125reverses the iterator could replace having downwards range operators.
     126
     127The one range that does not involve the operator is the lone number. This is
     128a shorthand for a half-open/exclusive range from zero to the given number.
     129That is the same behaviour as one (or two ~ and +~) of the existing range
     130operators. So it only saves 2-3 characters.
     131
     132But if we want to save those characters, there may be a special form that
     133could be used (for example, in the unnamed case) or you could treat integers
     134as iterables that create iterators that count from zero up to the limit.
     135
     136        forall(N) struct Upto { N current; N limit; };
     137        Upto(int) get_iterator(const int&);
     138
     139Enumerations as Iterators
     140-------------------------
     141To convert enumerations to the new form, a new syntax will have to be created
     142to treat an enumeration type as an expression returning a range.
     143
     144Using the unmodified type name seems problematic, it could be done if we
     145convert the type expression to some type object that can be passed to the
     146get_iterator function. But there are other alternatives.
     147
     148An operator could be created to convert the type into an iterable expression.
     149
     150        range_over(EnumType)
     151
     152It is also possible to create a library type that iterates over the range.
     153Then we use a compound literal of a nullary type as a flag to get it.
     154
     155        (ERange(EnumType)){}
     156
     157Both of these are placed in the EXPRESSION part of the for-each loop.
     158
     159Iterator Transformers
     160---------------------
     161New library features could be created to work with iterators. One of the
     162biggest groups are transformers, which allow for more complicated compound
     163operations on iterators.
     164
     165For a few examples:
     166
     167        // Create a new iterator that transforms elements from a source iterator.
     168        forall(I, T, U | is_iterator(I, T))
     169        /* Iterator over U */ map(U (*func)(T), I iter);
     170
     171        // Create an iterator that keeps only elements that satisfy a predicate.
     172        forall(I, T | is_iterator(I, T))
     173        /* Iterator over T */ filter(bool (*pred)(const T&), I iter);
     174
     175        forall(Is..., Ts... | is_iterator(Is, Ts)...)
     176        /* Iterator over [Ts] */ zip(Is iters);
     177
     178        forall(Index, I, T | is_serial(Index) | is_iterator(I, T))
     179        /* Iterator over [Index, T]) */ enumerate(Index, I);
     180
     181The return type is omitted in each case, because usually these functions end
     182up being thin wrappers around a constructor for a specialized type that is
     183the iterator, in each case, this is the return type.
     184
     185Also, these should be over iterables (with the conversion from iterable to
     186iterator part of the thin wrapper), not just iterators. But the examples are
     187easier to write out this way.
     188
     189Here is one example expanded to cover those details:
     190
     191        forall(Iter, Elem)
     192        struct Filter {
     193                bool (*pred)(const Elem&);
     194                Iter iterator;
     195        }
     196
     197        forall(Iter, Elem | is_iterator(Iter, Elem))
     198        maybe(Elem) next(Filter(Iter, Elem) & this) {
     199                maybe(Elem) ret;
     200                do {
     201                        ret = next(this.iterator);
     202                } while (has_value(ret) && this.pred(get(&ret)));
     203                return ret;
     204        }
     205
     206        forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem))
     207        Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) {
     208                return (Filter(Iter, Elem)){ pred, get_iterator(iterable) };
     209        }
     210
     211Iterator Consumers
     212------------------
     213Eventually we will have to turn the iterators into something else. Iterator
     214consumers take an iterator and turn it into something else. The biggest of
     215these is the for-each loop, as described above. In addition various library
     216functions can be added.
     217
     218One group of functions are container constructors which take everything from
     219the iterable and build a new container out of it. There would be various
     220instances of this because you will need one for each container type.
     221
     222Higher order folding functions (for example, the functional
     223`foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does
     224not work with anonymous functions quite as well. However, some of the
     225specialized versions may still be useful. For example, getting the sum of an
     226iterable's elements.
     227
     228        forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) |
     229                        { void ?{}(N&, zero_t); N ?+=?(N&, N); })
     230        N sum(T & iterable) {
     231                N acc = 0;
     232                for ( i ; iterable ) acc += i;
     233                return acc;
     234        }
     235
     236Iterator Producers
     237------------------
     238Another group of library features are iterator producers, that take some
     239data and create an iterator out of it. Many of already discussed cases are
     240actually examples of this (containers, ranges and enumerations).
     241
     242There are a few more cases that might be useful. Such as "repeat", which
     243takes a single element and iterates over it infinitely. This is usually
     244combined with other iterators and not used directly.
     245
     246Other Improvements to Cforall
     247----------------------------
     248All code snippets should be considered pseudo-code because they require some
     249features that do not currently exist. There are also some that just run into
     250current bugs. For instance, my proof of concept for ERange encountered a few
     251warnings even after I got it working.
     252
     253This proposal might also benefit from associated types. Both is_iterable and
     254is_iterator may not need to be generic over all their type parameters.
     255
     256Tuple enhancements might also help with iterators that want to return more
     257than one value.
    84258
    85259Related Work
    86260------------
    87 Python has a robust iterator tool set. It also has a `range` built-in which
    88 does many of the same things as the special for loops (the finite and
    89 half-open ranges).
     261Python has a robust iterator tool set. It uses the same iterator / iterable
     262divide as described above and has the interfaces (note, Python uses duck
     263typing instead of defined interfaces, so these are "imaginary"):
     264
     265        class Iterable:
     266
     267                def __iter__(self):
     268                        # Return an iterator over the iterable's contents.
     269
     270        class Iterator(Iterable):
     271
     272                def __iter__(self):
     273                        # Iterators are iterable over themselves.
     274                        return self
     275
     276                def __next__(self):
     277                        # Return next value if present and advance internal state.
     278                        # Otherwise raise StopIteration exception.
     279
     280It uses this in for loops, the for loop takes an iterable gets the iterable,
     281and iterates until the iterable is exhausted. Iterators are iterables so they
     282can be passed in directly as well.
     283
     284It also has a `range` built-in which handles integer loops, although only
     285increasing half-open ranges.
    90286
    91287In addition, it has many dedicated iterator constructors and transformers,
    92288and many containers can both produce and be constructed from iterators.
    93 
     289For example, the chain function goes though a series of iterators in sequence
     290an can be used by:
     291
     292        import itertools
     293        itertools.chain(iter_a, iter_b)
     294
     295Python enumerations are implemented in the standard library and are iterables
     296over all their possible values. There is no special construct for this, in
     297Python types are object, and the type of an enum type object is EnumType
     298which supports the iterable interface.
     299
     300        for elem in SomeEnum:
     301                print(elem)
     302
     303Python References
    94304+   https://docs.python.org/3/reference/datamodel.html#object.__iter__
    95305+   https://docs.python.org/3/library/functions.html#func-range
     306+   https://docs.python.org/3/library/enum.html
    96307
    97308C++ has many iterator tools at well, except for the fact it's "iterators" are
     
    104315They do appear to be a wrapper around the "pointer" iterators.
    105316
     317        template< class T >
     318        concept range = requires( T& t ) {
     319                ranges::begin(t);
     320                ranges::end(t);
     321        };
     322
     323C++ also has "structured bindings" which can be used to unpack an object and
     324bind its components to different names. This can be used to iterate over
     325multiple values simultaneously. In this example from the Cforall compiler
     326using a custom utility function written without special support.
     327
     328        for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) {
     329                ...
     330        }
     331
     332C++ References
    106333+   https://en.cppreference.com/w/cpp/ranges
     334+   https://en.cppreference.com/w/cpp/language/structured_binding
    107335
    108336Rust also has a imperative implementation of a functional style of iterators,
     337with iterator tools implemented as "Provided Methods". Provided methods
    109338including a great number of standard transformers. Otherwise, it is very
    110339similar to Python.
    111340
    112 +   https://doc.rust-lang.org/std/iter/index.html
     341        pub trait Iterator {
     342                type Item;
     343
     344                fn next(&mut self) -> Option<Self::Item>;
     345
     346                // Many provided methods ...
     347        }
     348
     349Not that Rust has many of its iterator transformers stored as methods on the
     350iterator. These methods are "provided" so you only have to define next, but
     351you can redefine the other methods to provide optimizations.
     352
     353Rust also has range expressions. Based around `start .. end` and
     354`start ..= end`, with the sub-expressions giving the start and end of the
     355range being optional, except for end after `..=`. If start is omitted, the
     356range has no lower bound, otherwise the range uses start as its inclusive
     357lower bound. It is the same case for end and the upper bound, except that the
     358`..=` will have an inclusive upper bound instead of an exclusive one.
     359
     360Rust achieves ordering and step control by iterator transformers. `.rev()`
     361allows you to reverse a range and `.step_by(step)` provides a positive step
     362value.
     363
     364Rust Reference
     365+   https://doc.rust-lang.org/std/iter/trait.Iterator.html
     366+   https://doc.rust-lang.org/reference/expressions/range-expr.html
     367+   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev
Note: See TracChangeset for help on using the changeset viewer.