source: doc/proposals/iterators.md @ 661e7b0

Last change on this file since 661e7b0 was 1f922f4, checked in by Andrew Beach <ajbeach@…>, 3 months ago

Updated iterator proposal. It was supposed to just go into a bit more detail about ranges, but ended up being a complete rewrite.

  • Property mode set to 100644
File size: 13.7 KB
RevLine 
[8baa40aa]1Iterators
2=========
3This is the proposal for adding iterators to Cforall and the standard
[4d3666d]4library. Iterators provide a common interface for sequences of values in
[8baa40aa]5the language. Many inputs and outputs can be described in terms of sequences,
6creating a common interface that can be used in many places.
7
8Related Traits
9--------------
10There are two groups of types that interact with this proposal.
11
[1f922f4]12Iterable
13
14An iterable is a container of some type that has a default iterator that goes
15over its own contents. It may be a logical container, like a mathematical range,
16that does not store elements in memory, but it should have logical elements.
17A trait is defined to standardize how to get that iterator.
18
19        forall(C, I)
20        trait is_iterable {
21                I get_iterator(C&);
22        }
23
24This is not particularly useful on its own, but is used as standardized glue
25between some other features below.
26
[8baa40aa]27Iterator
28
[1f922f4]29An iterator is an object that can be advanced through a sequence of values.
30An iterator can be checked to see if it is at the end of the sequence, if not
31the next value can be retrieved from it and the iterator advanced. This can
32be done in as little as a single operation:
[8baa40aa]33
[1f922f4]34        forall(Iter, Elem)
35        trait is_iterator {
36                maybe(Elem) next(Iter&);
[8baa40aa]37
[1f922f4]38                // is_iterable(I, I&)
39                Iter& get_iterator(Iter&);
40        }
[8baa40aa]41
[1f922f4]42This interface is not fixed, the function could change or be split into
43multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural,
44but also efficient to implement. This version will be used for examples in
45the proposal.
[8baa40aa]46
[1f922f4]47When next is called, if there are elements left in the iterators, the next
48element is returned in a maybe and the iterator is advanced. If the iterator
49is exhausted nothing is changed and an empty maybe is returned.
50
51Iterators are also iterables. They must return themselves, this allows
52iterators to be used directly where iterables are expected.
53
54For-Each Loop
55-------------
56The main part of this proposal which is part of the language definition,
57the rest is part of the standard library. And this is the new for-each loop:
58
59        for ( IDENTIFIER ; EXPRESSION ) STATEMENT
60
61While this uses similar syntax to the existing special for loops, but takes
62a standard expression. This expression is evaluated, and then the iterator is
63retrieved from it (this should be the identity function iterators). Each
64iteration of the loop then checks the iterator to see if it is exhausted, if
65not it uses the value to execute the statement. If the iterator is exhausted,
66the loop is completed, and control continues after the loop (or the else
67clause if provided).
68
69The loop should show the same behaviour as the following code:
70(`typeof(?)` should be decided by the resolver.)
71
72        {
73                typeof(?) __iterable = EXPRESSION;
74                typeof(?) __iterator = get_iterator(__iterable);
75                typeof(?) __next = next(__iterator);
76                while (has_value(&__next)) {
77                        typeof(?) IDENTIFIER = get(&__next)
78                        STATEMENT
79                        __next = next(__iterator);
80                }
81        }
82
83This new loop can (but may not need to) also support the existing variants,
84where the name is omitted or multiple iterators are advanced in lock-step:
85
86        for ( EXPRESSION ) STATEMENT
87        for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT
88
89These may not be needed. The unnamed variant is a nice short hand.
90The multiple iterator method could be replaced with an iterator transformer
91that has the same behaviour:
[8baa40aa]92
[1f922f4]93        for (i, j, k; zip(iter, range, container)) { ... }
[8baa40aa]94
[1f922f4]95This would require some further extensions, but would also allow single
96iterators to iterate over sets of related values. It also shows how library
97features can be used to extend the for-each loop.
[8baa40aa]98
[1f922f4]99Containers as Iterables
100-----------------------
101The primary iterator application is going over the contents of various
102collections. Most of the collections should support the iterable interface.
103Each of these iterators should go over each element in the container.
104
105They could support multiple iterator interfaces. For example, a map could go
106over key-value-pairs, or just keys or just values.
107
108There isn't a lot to say here, except that as the implementation progresses,
109various rules about iterator invalidation should be hammered out.
[8baa40aa]110
111Ranges
112------
[1f922f4]113The existing range syntax is incompatible with the for-each loop. But this
114can be recovered by making at least some of the range syntax into operators.
115These operators create and return new range types.
116
117The ranges are types that map a random numeric range. For iteration, a range
118is also considered an iterator over the numbers in the range.
119
120The there could be up to 6 new operators to create ranges. These are:
121    `~` `+~` `~=` `+~=` `-~` `-~=`
122
123Parsing issues might rule some of these out. Others may be rendered
124irrelevant by iterator transformers. For example, adding a transformer that
125reverses the iterator could replace having downwards range operators.
126
127The one range that does not involve the operator is the lone number. This is
128a shorthand for a half-open/exclusive range from zero to the given number.
129That is the same behaviour as one (or two ~ and +~) of the existing range
130operators. So it only saves 2-3 characters.
131
132But if we want to save those characters, there may be a special form that
133could be used (for example, in the unnamed case) or you could treat integers
134as iterables that create iterators that count from zero up to the limit.
135
136        forall(N) struct Upto { N current; N limit; };
137        Upto(int) get_iterator(const int&);
138
139Enumerations as Iterators
140-------------------------
141To convert enumerations to the new form, a new syntax will have to be created
142to treat an enumeration type as an expression returning a range.
143
144Using the unmodified type name seems problematic, it could be done if we
145convert the type expression to some type object that can be passed to the
146get_iterator function. But there are other alternatives.
147
148An operator could be created to convert the type into an iterable expression.
149
150        range_over(EnumType)
151
152It is also possible to create a library type that iterates over the range.
153Then we use a compound literal of a nullary type as a flag to get it.
154
155        (ERange(EnumType)){}
156
157Both of these are placed in the EXPRESSION part of the for-each loop.
158
159Iterator Transformers
160---------------------
161New library features could be created to work with iterators. One of the
162biggest groups are transformers, which allow for more complicated compound
163operations on iterators.
[8baa40aa]164
[1f922f4]165For a few examples:
[8baa40aa]166
[1f922f4]167        // Create a new iterator that transforms elements from a source iterator.
168        forall(I, T, U | is_iterator(I, T))
169        /* Iterator over U */ map(U (*func)(T), I iter);
[0e3f80d]170
[1f922f4]171        // Create an iterator that keeps only elements that satisfy a predicate.
172        forall(I, T | is_iterator(I, T))
173        /* Iterator over T */ filter(bool (*pred)(const T&), I iter);
[8baa40aa]174
[1f922f4]175        forall(Is..., Ts... | is_iterator(Is, Ts)...)
176        /* Iterator over [Ts] */ zip(Is iters);
[8baa40aa]177
[1f922f4]178        forall(Index, I, T | is_serial(Index) | is_iterator(I, T))
179        /* Iterator over [Index, T]) */ enumerate(Index, I);
[8baa40aa]180
[1f922f4]181The return type is omitted in each case, because usually these functions end
182up being thin wrappers around a constructor for a specialized type that is
183the iterator, in each case, this is the return type.
[8baa40aa]184
[1f922f4]185Also, these should be over iterables (with the conversion from iterable to
186iterator part of the thin wrapper), not just iterators. But the examples are
187easier to write out this way.
188
189Here is one example expanded to cover those details:
190
191        forall(Iter, Elem)
192        struct Filter {
193                bool (*pred)(const Elem&);
194                Iter iterator;
195        }
196
197        forall(Iter, Elem | is_iterator(Iter, Elem))
198        maybe(Elem) next(Filter(Iter, Elem) & this) {
199                maybe(Elem) ret;
200                do {
201                        ret = next(this.iterator);
202                } while (has_value(ret) && this.pred(get(&ret)));
203                return ret;
204        }
205
206        forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem))
207        Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) {
208                return (Filter(Iter, Elem)){ pred, get_iterator(iterable) };
209        }
210
211Iterator Consumers
212------------------
213Eventually we will have to turn the iterators into something else. Iterator
214consumers take an iterator and turn it into something else. The biggest of
215these is the for-each loop, as described above. In addition various library
216functions can be added.
217
218One group of functions are container constructors which take everything from
219the iterable and build a new container out of it. There would be various
220instances of this because you will need one for each container type.
221
222Higher order folding functions (for example, the functional
223`foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does
224not work with anonymous functions quite as well. However, some of the
225specialized versions may still be useful. For example, getting the sum of an
226iterable's elements.
227
228        forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) |
229                        { void ?{}(N&, zero_t); N ?+=?(N&, N); })
230        N sum(T & iterable) {
231                N acc = 0;
232                for ( i ; iterable ) acc += i;
233                return acc;
234        }
235
236Iterator Producers
237------------------
238Another group of library features are iterator producers, that take some
239data and create an iterator out of it. Many of already discussed cases are
240actually examples of this (containers, ranges and enumerations).
241
242There are a few more cases that might be useful. Such as "repeat", which
243takes a single element and iterates over it infinitely. This is usually
244combined with other iterators and not used directly.
245
246Other Improvements to Cforall
247----------------------------
248All code snippets should be considered pseudo-code because they require some
249features that do not currently exist. There are also some that just run into
250current bugs. For instance, my proof of concept for ERange encountered a few
251warnings even after I got it working.
252
253This proposal might also benefit from associated types. Both is_iterable and
254is_iterator may not need to be generic over all their type parameters.
255
256Tuple enhancements might also help with iterators that want to return more
257than one value.
[8baa40aa]258
259Related Work
260------------
[1f922f4]261Python has a robust iterator tool set. It uses the same iterator / iterable
262divide as described above and has the interfaces (note, Python uses duck
263typing instead of defined interfaces, so these are "imaginary"):
264
265        class Iterable:
266
267                def __iter__(self):
268                        # Return an iterator over the iterable's contents.
269
270        class Iterator(Iterable):
271
272                def __iter__(self):
273                        # Iterators are iterable over themselves.
274                        return self
275
276                def __next__(self):
277                        # Return next value if present and advance internal state.
278                        # Otherwise raise StopIteration exception.
279
280It uses this in for loops, the for loop takes an iterable gets the iterable,
281and iterates until the iterable is exhausted. Iterators are iterables so they
282can be passed in directly as well.
283
284It also has a `range` built-in which handles integer loops, although only
285increasing half-open ranges.
[0e3f80d]286
287In addition, it has many dedicated iterator constructors and transformers,
288and many containers can both produce and be constructed from iterators.
[1f922f4]289For example, the chain function goes though a series of iterators in sequence
290an can be used by:
[8baa40aa]291
[1f922f4]292        import itertools
293        itertools.chain(iter_a, iter_b)
294
295Python enumerations are implemented in the standard library and are iterables
296over all their possible values. There is no special construct for this, in
297Python types are object, and the type of an enum type object is EnumType
298which supports the iterable interface.
299
300        for elem in SomeEnum:
301                print(elem)
302
303Python References
[8baa40aa]304+   https://docs.python.org/3/reference/datamodel.html#object.__iter__
305+   https://docs.python.org/3/library/functions.html#func-range
[1f922f4]306+   https://docs.python.org/3/library/enum.html
[8baa40aa]307
[0e3f80d]308C++ has many iterator tools at well, except for the fact it's "iterators" are
[8baa40aa]309not what are usually called iterators (as above) but rather an abstraction of
[0e3f80d]310pointers. The notable missing feature is that a single iterator has no
311concept of being empty or not, instead it must be compared to the end
312iterator.
313
314However, C++ ranges have an interface much more similar to iterators.
315They do appear to be a wrapper around the "pointer" iterators.
316
[1f922f4]317        template< class T >
318        concept range = requires( T& t ) {
319                ranges::begin(t);
320                ranges::end(t);
321        };
322
323C++ also has "structured bindings" which can be used to unpack an object and
324bind its components to different names. This can be used to iterate over
325multiple values simultaneously. In this example from the Cforall compiler
326using a custom utility function written without special support.
327
328        for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) {
329                ...
330        }
331
332C++ References
[0e3f80d]333+   https://en.cppreference.com/w/cpp/ranges
[1f922f4]334+   https://en.cppreference.com/w/cpp/language/structured_binding
[8baa40aa]335
[4d3666d]336Rust also has a imperative implementation of a functional style of iterators,
[1f922f4]337with iterator tools implemented as "Provided Methods". Provided methods
[8baa40aa]338including a great number of standard transformers. Otherwise, it is very
339similar to Python.
340
[1f922f4]341        pub trait Iterator {
342                type Item;
343
344                fn next(&mut self) -> Option<Self::Item>;
345
346                // Many provided methods ...
347        }
348
349Not that Rust has many of its iterator transformers stored as methods on the
350iterator. These methods are "provided" so you only have to define next, but
351you can redefine the other methods to provide optimizations.
352
353Rust also has range expressions. Based around `start .. end` and
354`start ..= end`, with the sub-expressions giving the start and end of the
355range being optional, except for end after `..=`. If start is omitted, the
356range has no lower bound, otherwise the range uses start as its inclusive
357lower bound. It is the same case for end and the upper bound, except that the
358`..=` will have an inclusive upper bound instead of an exclusive one.
359
360Rust achieves ordering and step control by iterator transformers. `.rev()`
361allows you to reverse a range and `.step_by(step)` provides a positive step
362value.
363
364Rust Reference
365+   https://doc.rust-lang.org/std/iter/trait.Iterator.html
366+   https://doc.rust-lang.org/reference/expressions/range-expr.html
367+   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev
Note: See TracBrowser for help on using the repository browser.