Iterators ========= This is the proposal for adding iterators to Cforall and the standard library. Iterators provide a common interface for sequences of values in the language. Many inputs and outputs can be described in terms of sequences, creating a common interface that can be used in many places. Related Traits -------------- There are two groups of types that interact with this proposal. Iterable An iterable is a container of some type that has a default iterator that goes over its own contents. It may be a logical container, like a mathematical range, that does not store elements in memory, but it should have logical elements. A trait is defined to standardize how to get that iterator. forall(C, I) trait is_iterable { I get_iterator(C&); } This is not particularly useful on its own, but is used as standardized glue between some other features below. Iterator An iterator is an object that can be advanced through a sequence of values. An iterator can be checked to see if it is at the end of the sequence, if not the next value can be retrieved from it and the iterator advanced. This can be done in as little as a single operation: forall(Iter, Elem) trait is_iterator { maybe(Elem) next(Iter&); // is_iterable(I, I&) Iter& get_iterator(Iter&); } This interface is not fixed, the function could change or be split into multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural, but also efficient to implement. This version will be used for examples in the proposal. When next is called, if there are elements left in the iterators, the next element is returned in a maybe and the iterator is advanced. If the iterator is exhausted nothing is changed and an empty maybe is returned. Iterators are also iterables. They must return themselves, this allows iterators to be used directly where iterables are expected. For-Each Loop ------------- The main part of this proposal which is part of the language definition, the rest is part of the standard library. And this is the new for-each loop: for ( IDENTIFIER ; EXPRESSION ) STATEMENT While this uses similar syntax to the existing special for loops, but takes a standard expression. This expression is evaluated, and then the iterator is retrieved from it (this should be the identity function iterators). Each iteration of the loop then checks the iterator to see if it is exhausted, if not it uses the value to execute the statement. If the iterator is exhausted, the loop is completed, and control continues after the loop (or the else clause if provided). The loop should show the same behaviour as the following code: (`typeof(?)` should be decided by the resolver.) { typeof(?) __iterable = EXPRESSION; typeof(?) __iterator = get_iterator(__iterable); typeof(?) __next = next(__iterator); while (has_value(&__next)) { typeof(?) IDENTIFIER = get(&__next) STATEMENT __next = next(__iterator); } } This new loop can (but may not need to) also support the existing variants, where the name is omitted or multiple iterators are advanced in lock-step: for ( EXPRESSION ) STATEMENT for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT These may not be needed. The unnamed variant is a nice short hand. The multiple iterator method could be replaced with an iterator transformer that has the same behaviour: for (i, j, k; zip(iter, range, container)) { ... } This would require some further extensions, but would also allow single iterators to iterate over sets of related values. It also shows how library features can be used to extend the for-each loop. Containers as Iterables ----------------------- The primary iterator application is going over the contents of various collections. Most of the collections should support the iterable interface. Each of these iterators should go over each element in the container. They could support multiple iterator interfaces. For example, a map could go over key-value-pairs, or just keys or just values. There isn't a lot to say here, except that as the implementation progresses, various rules about iterator invalidation should be hammered out. Ranges ------ The existing range syntax is incompatible with the for-each loop. But this can be recovered by making at least some of the range syntax into operators. These operators create and return new range types. The ranges are types that map a random numeric range. For iteration, a range is also considered an iterator over the numbers in the range. The there could be up to 6 new operators to create ranges. These are: `~` `+~` `~=` `+~=` `-~` `-~=` Parsing issues might rule some of these out. Others may be rendered irrelevant by iterator transformers. For example, adding a transformer that reverses the iterator could replace having downwards range operators. The one range that does not involve the operator is the lone number. This is a shorthand for a half-open/exclusive range from zero to the given number. That is the same behaviour as one (or two ~ and +~) of the existing range operators. So it only saves 2-3 characters. But if we want to save those characters, there may be a special form that could be used (for example, in the unnamed case) or you could treat integers as iterables that create iterators that count from zero up to the limit. forall(N) struct Upto { N current; N limit; }; Upto(int) get_iterator(const int&); Enumerations as Iterators ------------------------- To convert enumerations to the new form, a new syntax will have to be created to treat an enumeration type as an expression returning a range. Using the unmodified type name seems problematic, it could be done if we convert the type expression to some type object that can be passed to the get_iterator function. But there are other alternatives. An operator could be created to convert the type into an iterable expression. range_over(EnumType) It is also possible to create a library type that iterates over the range. Then we use a compound literal of a nullary type as a flag to get it. (ERange(EnumType)){} Both of these are placed in the EXPRESSION part of the for-each loop. Iterator Transformers --------------------- New library features could be created to work with iterators. One of the biggest groups are transformers, which allow for more complicated compound operations on iterators. For a few examples: // Create a new iterator that transforms elements from a source iterator. forall(I, T, U | is_iterator(I, T)) /* Iterator over U */ map(U (*func)(T), I iter); // Create an iterator that keeps only elements that satisfy a predicate. forall(I, T | is_iterator(I, T)) /* Iterator over T */ filter(bool (*pred)(const T&), I iter); forall(Is..., Ts... | is_iterator(Is, Ts)...) /* Iterator over [Ts] */ zip(Is iters); forall(Index, I, T | is_serial(Index) | is_iterator(I, T)) /* Iterator over [Index, T]) */ enumerate(Index, I); The return type is omitted in each case, because usually these functions end up being thin wrappers around a constructor for a specialized type that is the iterator, in each case, this is the return type. Also, these should be over iterables (with the conversion from iterable to iterator part of the thin wrapper), not just iterators. But the examples are easier to write out this way. Here is one example expanded to cover those details: forall(Iter, Elem) struct Filter { bool (*pred)(const Elem&); Iter iterator; } forall(Iter, Elem | is_iterator(Iter, Elem)) maybe(Elem) next(Filter(Iter, Elem) & this) { maybe(Elem) ret; do { ret = next(this.iterator); } while (has_value(ret) && this.pred(get(&ret))); return ret; } forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem)) Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) { return (Filter(Iter, Elem)){ pred, get_iterator(iterable) }; } Iterator Consumers ------------------ Eventually we will have to turn the iterators into something else. Iterator consumers take an iterator and turn it into something else. The biggest of these is the for-each loop, as described above. In addition various library functions can be added. One group of functions are container constructors which take everything from the iterable and build a new container out of it. There would be various instances of this because you will need one for each container type. Higher order folding functions (for example, the functional `foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does not work with anonymous functions quite as well. However, some of the specialized versions may still be useful. For example, getting the sum of an iterable's elements. forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) | { void ?{}(N&, zero_t); N ?+=?(N&, N); }) N sum(T & iterable) { N acc = 0; for ( i ; iterable ) acc += i; return acc; } Iterator Producers ------------------ Another group of library features are iterator producers, that take some data and create an iterator out of it. Many of already discussed cases are actually examples of this (containers, ranges and enumerations). There are a few more cases that might be useful. Such as "repeat", which takes a single element and iterates over it infinitely. This is usually combined with other iterators and not used directly. Other Improvements to Cforall ---------------------------- All code snippets should be considered pseudo-code because they require some features that do not currently exist. There are also some that just run into current bugs. For instance, my proof of concept for ERange encountered a few warnings even after I got it working. This proposal might also benefit from associated types. Both is_iterable and is_iterator may not need to be generic over all their type parameters. Tuple enhancements might also help with iterators that want to return more than one value. Related Work ------------ Python has a robust iterator tool set. It uses the same iterator / iterable divide as described above and has the interfaces (note, Python uses duck typing instead of defined interfaces, so these are "imaginary"): class Iterable: def __iter__(self): # Return an iterator over the iterable's contents. class Iterator(Iterable): def __iter__(self): # Iterators are iterable over themselves. return self def __next__(self): # Return next value if present and advance internal state. # Otherwise raise StopIteration exception. It uses this in for loops, the for loop takes an iterable gets the iterable, and iterates until the iterable is exhausted. Iterators are iterables so they can be passed in directly as well. It also has a `range` built-in which handles integer loops, although only increasing half-open ranges. In addition, it has many dedicated iterator constructors and transformers, and many containers can both produce and be constructed from iterators. For example, the chain function goes though a series of iterators in sequence an can be used by: import itertools itertools.chain(iter_a, iter_b) Python enumerations are implemented in the standard library and are iterables over all their possible values. There is no special construct for this, in Python types are object, and the type of an enum type object is EnumType which supports the iterable interface. for elem in SomeEnum: print(elem) Python References + https://docs.python.org/3/reference/datamodel.html#object.__iter__ + https://docs.python.org/3/library/functions.html#func-range + https://docs.python.org/3/library/enum.html C++ has many iterator tools at well, except for the fact it's "iterators" are not what are usually called iterators (as above) but rather an abstraction of pointers. The notable missing feature is that a single iterator has no concept of being empty or not, instead it must be compared to the end iterator. However, C++ ranges have an interface much more similar to iterators. They do appear to be a wrapper around the "pointer" iterators. template< class T > concept range = requires( T& t ) { ranges::begin(t); ranges::end(t); }; C++ also has "structured bindings" which can be used to unpack an object and bind its components to different names. This can be used to iterate over multiple values simultaneously. In this example from the Cforall compiler using a custom utility function written without special support. for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) { ... } C++ References + https://en.cppreference.com/w/cpp/ranges + https://en.cppreference.com/w/cpp/language/structured_binding Rust also has a imperative implementation of a functional style of iterators, with iterator tools implemented as "Provided Methods". Provided methods including a great number of standard transformers. Otherwise, it is very similar to Python. pub trait Iterator { type Item; fn next(&mut self) -> Option; // Many provided methods ... } Not that Rust has many of its iterator transformers stored as methods on the iterator. These methods are "provided" so you only have to define next, but you can redefine the other methods to provide optimizations. Rust also has range expressions. Based around `start .. end` and `start ..= end`, with the sub-expressions giving the start and end of the range being optional, except for end after `..=`. If start is omitted, the range has no lower bound, otherwise the range uses start as its inclusive lower bound. It is the same case for end and the upper bound, except that the `..=` will have an inclusive upper bound instead of an exclusive one. Rust achieves ordering and step control by iterator transformers. `.rev()` allows you to reverse a range and `.step_by(step)` provides a positive step value. Rust Reference + https://doc.rust-lang.org/std/iter/trait.Iterator.html + https://doc.rust-lang.org/reference/expressions/range-expr.html + https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev