Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/iterators.md

    r1f922f4 r0e3f80d  
    1010There are two groups of types that interact with this proposal.
    1111
     12Iterator
     13
     14An iterator has a very simple interface with a single operation.
     15The operation is "get the next value in the sequence", but this actually has
     16several parts, in that it has to check if there are move values, return the
     17next one if there is one, and update any internal information in the iterator.
     18For example: `Maybe(Item) next(Iter &);`.
     19
     20Now, iterators can have other operations. Notably, they are often also
     21iterables that return themselves. They can also have a verity of iterator
     22transformers built in.
     23
    1224Iterable
    1325
    14 An iterable is a container of some type that has a default iterator that goes
    15 over its own contents. It may be a logical container, like a mathematical range,
    16 that does not store elements in memory, but it should have logical elements.
    17 A trait is defined to standardize how to get that iterator.
     26Anything that you can get an iterator from is called an iterable. There
     27is an operation to get an iterator from an iterable.
    1828
    19         forall(C, I)
    20         trait is_iterable {
    21                 I get_iterator(C&);
    22         }
     29Range For Loop
     30--------------
     31One part of the language that could be reworked to make good use of this is
     32for loops. In short, remove most of the special rules that can be done inside
     33the identifier and make it a generic range for loop:
    2334
    24 This is not particularly useful on its own, but is used as standardized glue
    25 between some other features below.
     35    ```
     36    for ( IDENTIFIER ; EXPRESSION ) STATEMENT
     37    ```
    2638
    27 Iterator
     39The common way to implement this is that expression produces an iterable.
     40The for loop gets an iterator from the iterable (which is why iterators are
     41often iterables, so they can be passed in with the same interface) and stores
     42it. Then, for each value in the iterator, the loop binds the value to the
     43identifier and then executes the statement. The loop exits after every value
     44has been used and the iterator is exhausted.
    2845
    29 An iterator is an object that can be advanced through a sequence of values.
    30 An iterator can be checked to see if it is at the end of the sequence, if not
    31 the next value can be retrieved from it and the iterator advanced. This can
    32 be done in as little as a single operation:
    33 
    34         forall(Iter, Elem)
    35         trait is_iterator {
    36                 maybe(Elem) next(Iter&);
    37 
    38                 // is_iterable(I, I&)
    39                 Iter& get_iterator(Iter&);
    40         }
    41 
    42 This interface is not fixed, the function could change or be split into
    43 multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural,
    44 but also efficient to implement. This version will be used for examples in
    45 the proposal.
    46 
    47 When next is called, if there are elements left in the iterators, the next
    48 element is returned in a maybe and the iterator is advanced. If the iterator
    49 is exhausted nothing is changed and an empty maybe is returned.
    50 
    51 Iterators are also iterables. They must return themselves, this allows
    52 iterators to be used directly where iterables are expected.
    53 
    54 For-Each Loop
    55 -------------
    56 The main part of this proposal which is part of the language definition,
    57 the rest is part of the standard library. And this is the new for-each loop:
    58 
    59         for ( IDENTIFIER ; EXPRESSION ) STATEMENT
    60 
    61 While this uses similar syntax to the existing special for loops, but takes
    62 a standard expression. This expression is evaluated, and then the iterator is
    63 retrieved from it (this should be the identity function iterators). Each
    64 iteration of the loop then checks the iterator to see if it is exhausted, if
    65 not it uses the value to execute the statement. If the iterator is exhausted,
    66 the loop is completed, and control continues after the loop (or the else
    67 clause if provided).
    68 
    69 The loop should show the same behaviour as the following code:
    70 (`typeof(?)` should be decided by the resolver.)
    71 
    72         {
    73                 typeof(?) __iterable = EXPRESSION;
    74                 typeof(?) __iterator = get_iterator(__iterable);
    75                 typeof(?) __next = next(__iterator);
    76                 while (has_value(&__next)) {
    77                         typeof(?) IDENTIFIER = get(&__next)
    78                         STATEMENT
    79                         __next = next(__iterator);
    80                 }
    81         }
    82 
    83 This new loop can (but may not need to) also support the existing variants,
    84 where the name is omitted or multiple iterators are advanced in lock-step:
    85 
    86         for ( EXPRESSION ) STATEMENT
    87         for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT
    88 
    89 These may not be needed. The unnamed variant is a nice short hand.
    90 The multiple iterator method could be replaced with an iterator transformer
    91 that has the same behaviour:
    92 
    93         for (i, j, k; zip(iter, range, container)) { ... }
    94 
    95 This would require some further extensions, but would also allow single
    96 iterators to iterate over sets of related values. It also shows how library
    97 features can be used to extend the for-each loop.
    98 
    99 Containers as Iterables
    100 -----------------------
    101 The primary iterator application is going over the contents of various
    102 collections. Most of the collections should support the iterable interface.
    103 Each of these iterators should go over each element in the container.
    104 
    105 They could support multiple iterator interfaces. For example, a map could go
    106 over key-value-pairs, or just keys or just values.
    107 
    108 There isn't a lot to say here, except that as the implementation progresses,
    109 various rules about iterator invalidation should be hammered out.
     46For the chained for loop (`for (i; _: j; _)`) can still have its existing
     47behaviour, advancing through each range in parallel and stopping as soon
     48as the first one is exhausted.
    11049
    11150Ranges
    11251------
    113 The existing range syntax is incompatible with the for-each loop. But this
    114 can be recovered by making at least some of the range syntax into operators.
    115 These operators create and return new range types.
     52Ranges, which may be a data type or a trait, are containers that contain
     53a sequence of values. Unlike an array or vector, these values are stored
     54logically instead of by copy.
    11655
    117 The ranges are types that map a random numeric range. For iteration, a range
    118 is also considered an iterator over the numbers in the range.
     56The purpose of this container is to bridge the new iterator interfaces with
     57the existing range syntax. The range syntax would become an operator that
     58returns a range object, which can be used as any other type.
    11959
    120 The there could be up to 6 new operators to create ranges. These are:
    121     `~` `+~` `~=` `+~=` `-~` `-~=`
     60It might not cover every single case with the same syntax (the `@` syntax may
     61not translate to operators very well), but should be able to maintain every
     62option with some library range.
    12263
    123 Parsing issues might rule some of these out. Others may be rendered
    124 irrelevant by iterator transformers. For example, adding a transformer that
    125 reverses the iterator could replace having downwards range operators.
     64Library Enhancements
     65--------------------
     66There are various other tools in the library that should be improved.
     67The simplest is to make sure most containers are iterables.
    12668
    127 The one range that does not involve the operator is the lone number. This is
    128 a shorthand for a half-open/exclusive range from zero to the given number.
    129 That is the same behaviour as one (or two ~ and +~) of the existing range
    130 operators. So it only saves 2-3 characters.
     69Also, new utilities for manipulating iterators should be created. The exact
     70list would have to wait but here are some examples.
    13171
    132 But if we want to save those characters, there may be a special form that
    133 could be used (for example, in the unnamed case) or you could treat integers
    134 as iterables that create iterators that count from zero up to the limit.
     72Transformers take in an iterator and produce another iterator.
     73Examples include map, which modifies each element in turn, and filter,
     74which checks each element and removes the ones that fail.
    13575
    136         forall(N) struct Upto { N current; N limit; };
    137         Upto(int) get_iterator(const int&);
     76Producers create new iterators from other information.
     77Most practical iterators tend to be iterable containers, which produce all
     78the elements in the container, this includes ranges. Others include infinite
     79series of one element.
    13880
    139 Enumerations as Iterators
    140 -------------------------
    141 To convert enumerations to the new form, a new syntax will have to be created
    142 to treat an enumeration type as an expression returning a range.
    143 
    144 Using the unmodified type name seems problematic, it could be done if we
    145 convert the type expression to some type object that can be passed to the
    146 get_iterator function. But there are other alternatives.
    147 
    148 An operator could be created to convert the type into an iterable expression.
    149 
    150         range_over(EnumType)
    151 
    152 It is also possible to create a library type that iterates over the range.
    153 Then we use a compound literal of a nullary type as a flag to get it.
    154 
    155         (ERange(EnumType)){}
    156 
    157 Both of these are placed in the EXPRESSION part of the for-each loop.
    158 
    159 Iterator Transformers
    160 ---------------------
    161 New library features could be created to work with iterators. One of the
    162 biggest groups are transformers, which allow for more complicated compound
    163 operations on iterators.
    164 
    165 For a few examples:
    166 
    167         // Create a new iterator that transforms elements from a source iterator.
    168         forall(I, T, U | is_iterator(I, T))
    169         /* Iterator over U */ map(U (*func)(T), I iter);
    170 
    171         // Create an iterator that keeps only elements that satisfy a predicate.
    172         forall(I, T | is_iterator(I, T))
    173         /* Iterator over T */ filter(bool (*pred)(const T&), I iter);
    174 
    175         forall(Is..., Ts... | is_iterator(Is, Ts)...)
    176         /* Iterator over [Ts] */ zip(Is iters);
    177 
    178         forall(Index, I, T | is_serial(Index) | is_iterator(I, T))
    179         /* Iterator over [Index, T]) */ enumerate(Index, I);
    180 
    181 The return type is omitted in each case, because usually these functions end
    182 up being thin wrappers around a constructor for a specialized type that is
    183 the iterator, in each case, this is the return type.
    184 
    185 Also, these should be over iterables (with the conversion from iterable to
    186 iterator part of the thin wrapper), not just iterators. But the examples are
    187 easier to write out this way.
    188 
    189 Here is one example expanded to cover those details:
    190 
    191         forall(Iter, Elem)
    192         struct Filter {
    193                 bool (*pred)(const Elem&);
    194                 Iter iterator;
    195         }
    196 
    197         forall(Iter, Elem | is_iterator(Iter, Elem))
    198         maybe(Elem) next(Filter(Iter, Elem) & this) {
    199                 maybe(Elem) ret;
    200                 do {
    201                         ret = next(this.iterator);
    202                 } while (has_value(ret) && this.pred(get(&ret)));
    203                 return ret;
    204         }
    205 
    206         forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem))
    207         Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) {
    208                 return (Filter(Iter, Elem)){ pred, get_iterator(iterable) };
    209         }
    210 
    211 Iterator Consumers
    212 ------------------
    213 Eventually we will have to turn the iterators into something else. Iterator
    214 consumers take an iterator and turn it into something else. The biggest of
    215 these is the for-each loop, as described above. In addition various library
    216 functions can be added.
    217 
    218 One group of functions are container constructors which take everything from
    219 the iterable and build a new container out of it. There would be various
    220 instances of this because you will need one for each container type.
    221 
    222 Higher order folding functions (for example, the functional
    223 `foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does
    224 not work with anonymous functions quite as well. However, some of the
    225 specialized versions may still be useful. For example, getting the sum of an
    226 iterable's elements.
    227 
    228         forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) |
    229                         { void ?{}(N&, zero_t); N ?+=?(N&, N); })
    230         N sum(T & iterable) {
    231                 N acc = 0;
    232                 for ( i ; iterable ) acc += i;
    233                 return acc;
    234         }
    235 
    236 Iterator Producers
    237 ------------------
    238 Another group of library features are iterator producers, that take some
    239 data and create an iterator out of it. Many of already discussed cases are
    240 actually examples of this (containers, ranges and enumerations).
    241 
    242 There are a few more cases that might be useful. Such as "repeat", which
    243 takes a single element and iterates over it infinitely. This is usually
    244 combined with other iterators and not used directly.
    245 
    246 Other Improvements to Cforall
    247 ----------------------------
    248 All code snippets should be considered pseudo-code because they require some
    249 features that do not currently exist. There are also some that just run into
    250 current bugs. For instance, my proof of concept for ERange encountered a few
    251 warnings even after I got it working.
    252 
    253 This proposal might also benefit from associated types. Both is_iterable and
    254 is_iterator may not need to be generic over all their type parameters.
    255 
    256 Tuple enhancements might also help with iterators that want to return more
    257 than one value.
     81Consumers take an iterator and convert it into something else.
     82They might be converted into a container or used in a for loop. Dedicated
     83consumers will be some form of folding function.
    25884
    25985Related Work
    26086------------
    261 Python has a robust iterator tool set. It uses the same iterator / iterable
    262 divide as described above and has the interfaces (note, Python uses duck
    263 typing instead of defined interfaces, so these are "imaginary"):
    264 
    265         class Iterable:
    266 
    267                 def __iter__(self):
    268                         # Return an iterator over the iterable's contents.
    269 
    270         class Iterator(Iterable):
    271 
    272                 def __iter__(self):
    273                         # Iterators are iterable over themselves.
    274                         return self
    275 
    276                 def __next__(self):
    277                         # Return next value if present and advance internal state.
    278                         # Otherwise raise StopIteration exception.
    279 
    280 It uses this in for loops, the for loop takes an iterable gets the iterable,
    281 and iterates until the iterable is exhausted. Iterators are iterables so they
    282 can be passed in directly as well.
    283 
    284 It also has a `range` built-in which handles integer loops, although only
    285 increasing half-open ranges.
     87Python has a robust iterator tool set. It also has a `range` built-in which
     88does many of the same things as the special for loops (the finite and
     89half-open ranges).
    28690
    28791In addition, it has many dedicated iterator constructors and transformers,
    28892and many containers can both produce and be constructed from iterators.
    289 For example, the chain function goes though a series of iterators in sequence
    290 an can be used by:
    29193
    292         import itertools
    293         itertools.chain(iter_a, iter_b)
    294 
    295 Python enumerations are implemented in the standard library and are iterables
    296 over all their possible values. There is no special construct for this, in
    297 Python types are object, and the type of an enum type object is EnumType
    298 which supports the iterable interface.
    299 
    300         for elem in SomeEnum:
    301                 print(elem)
    302 
    303 Python References
    30494+   https://docs.python.org/3/reference/datamodel.html#object.__iter__
    30595+   https://docs.python.org/3/library/functions.html#func-range
    306 +   https://docs.python.org/3/library/enum.html
    30796
    30897C++ has many iterator tools at well, except for the fact it's "iterators" are
     
    315104They do appear to be a wrapper around the "pointer" iterators.
    316105
    317         template< class T >
    318         concept range = requires( T& t ) {
    319                 ranges::begin(t);
    320                 ranges::end(t);
    321         };
    322 
    323 C++ also has "structured bindings" which can be used to unpack an object and
    324 bind its components to different names. This can be used to iterate over
    325 multiple values simultaneously. In this example from the Cforall compiler
    326 using a custom utility function written without special support.
    327 
    328         for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) {
    329                 ...
    330         }
    331 
    332 C++ References
    333106+   https://en.cppreference.com/w/cpp/ranges
    334 +   https://en.cppreference.com/w/cpp/language/structured_binding
    335107
    336108Rust also has a imperative implementation of a functional style of iterators,
    337 with iterator tools implemented as "Provided Methods". Provided methods
    338109including a great number of standard transformers. Otherwise, it is very
    339110similar to Python.
    340111
    341         pub trait Iterator {
    342                 type Item;
    343 
    344                 fn next(&mut self) -> Option<Self::Item>;
    345 
    346                 // Many provided methods ...
    347         }
    348 
    349 Not that Rust has many of its iterator transformers stored as methods on the
    350 iterator. These methods are "provided" so you only have to define next, but
    351 you can redefine the other methods to provide optimizations.
    352 
    353 Rust also has range expressions. Based around `start .. end` and
    354 `start ..= end`, with the sub-expressions giving the start and end of the
    355 range being optional, except for end after `..=`. If start is omitted, the
    356 range has no lower bound, otherwise the range uses start as its inclusive
    357 lower bound. It is the same case for end and the upper bound, except that the
    358 `..=` will have an inclusive upper bound instead of an exclusive one.
    359 
    360 Rust achieves ordering and step control by iterator transformers. `.rev()`
    361 allows you to reverse a range and `.step_by(step)` provides a positive step
    362 value.
    363 
    364 Rust Reference
    365 +   https://doc.rust-lang.org/std/iter/trait.Iterator.html
    366 +   https://doc.rust-lang.org/reference/expressions/range-expr.html
    367 +   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev
     112+   https://doc.rust-lang.org/std/iter/index.html
Note: See TracChangeset for help on using the changeset viewer.