1 | Iterators |
---|
2 | ========= |
---|
3 | This is the proposal for adding iterators to Cforall and the standard |
---|
4 | library. Iterators provide a common interface for sequences of values in |
---|
5 | the language. Many inputs and outputs can be described in terms of sequences, |
---|
6 | creating a common interface that can be used in many places. |
---|
7 | |
---|
8 | Related Traits |
---|
9 | -------------- |
---|
10 | There are two groups of types that interact with this proposal. |
---|
11 | |
---|
12 | Iterable |
---|
13 | |
---|
14 | An iterable is a container of some type that has a default iterator that goes |
---|
15 | over its own contents. It may be a logical container, like a mathematical range, |
---|
16 | that does not store elements in memory, but it should have logical elements. |
---|
17 | A trait is defined to standardize how to get that iterator. |
---|
18 | |
---|
19 | forall(C, I) |
---|
20 | trait is_iterable { |
---|
21 | I get_iterator(C&); |
---|
22 | } |
---|
23 | |
---|
24 | This is not particularly useful on its own, but is used as standardized glue |
---|
25 | between some other features below. |
---|
26 | |
---|
27 | Iterator |
---|
28 | |
---|
29 | An iterator is an object that can be advanced through a sequence of values. |
---|
30 | An iterator can be checked to see if it is at the end of the sequence, if not |
---|
31 | the next value can be retrieved from it and the iterator advanced. This can |
---|
32 | be done in as little as a single operation: |
---|
33 | |
---|
34 | forall(Iter, Elem) |
---|
35 | trait is_iterator { |
---|
36 | maybe(Elem) next(Iter&); |
---|
37 | |
---|
38 | // is_iterable(I, I&) |
---|
39 | Iter& get_iterator(Iter&); |
---|
40 | } |
---|
41 | |
---|
42 | This interface is not fixed, the function could change or be split into |
---|
43 | multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural, |
---|
44 | but also efficient to implement. This version will be used for examples in |
---|
45 | the proposal. |
---|
46 | |
---|
47 | When next is called, if there are elements left in the iterators, the next |
---|
48 | element is returned in a maybe and the iterator is advanced. If the iterator |
---|
49 | is exhausted nothing is changed and an empty maybe is returned. |
---|
50 | |
---|
51 | Iterators are also iterables. They must return themselves, this allows |
---|
52 | iterators to be used directly where iterables are expected. |
---|
53 | |
---|
54 | For-Each Loop |
---|
55 | ------------- |
---|
56 | The main part of this proposal which is part of the language definition, |
---|
57 | the rest is part of the standard library. And this is the new for-each loop: |
---|
58 | |
---|
59 | for ( IDENTIFIER ; EXPRESSION ) STATEMENT |
---|
60 | |
---|
61 | While this uses similar syntax to the existing special for loops, but takes |
---|
62 | a standard expression. This expression is evaluated, and then the iterator is |
---|
63 | retrieved from it (this should be the identity function iterators). Each |
---|
64 | iteration of the loop then checks the iterator to see if it is exhausted, if |
---|
65 | not it uses the value to execute the statement. If the iterator is exhausted, |
---|
66 | the loop is completed, and control continues after the loop (or the else |
---|
67 | clause if provided). |
---|
68 | |
---|
69 | The loop should show the same behaviour as the following code: |
---|
70 | (`typeof(?)` should be decided by the resolver.) |
---|
71 | |
---|
72 | { |
---|
73 | typeof(?) __iterable = EXPRESSION; |
---|
74 | typeof(?) __iterator = get_iterator(__iterable); |
---|
75 | typeof(?) __next = next(__iterator); |
---|
76 | while (has_value(&__next)) { |
---|
77 | typeof(?) IDENTIFIER = get(&__next) |
---|
78 | STATEMENT |
---|
79 | __next = next(__iterator); |
---|
80 | } |
---|
81 | } |
---|
82 | |
---|
83 | This new loop can (but may not need to) also support the existing variants, |
---|
84 | where the name is omitted or multiple iterators are advanced in lock-step: |
---|
85 | |
---|
86 | for ( EXPRESSION ) STATEMENT |
---|
87 | for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT |
---|
88 | |
---|
89 | These may not be needed. The unnamed variant is a nice short hand. |
---|
90 | The multiple iterator method could be replaced with an iterator transformer |
---|
91 | that has the same behaviour: |
---|
92 | |
---|
93 | for (i, j, k; zip(iter, range, container)) { ... } |
---|
94 | |
---|
95 | This would require some further extensions, but would also allow single |
---|
96 | iterators to iterate over sets of related values. It also shows how library |
---|
97 | features can be used to extend the for-each loop. |
---|
98 | |
---|
99 | Containers as Iterables |
---|
100 | ----------------------- |
---|
101 | The primary iterator application is going over the contents of various |
---|
102 | collections. Most of the collections should support the iterable interface. |
---|
103 | Each of these iterators should go over each element in the container. |
---|
104 | |
---|
105 | They could support multiple iterator interfaces. For example, a map could go |
---|
106 | over key-value-pairs, or just keys or just values. |
---|
107 | |
---|
108 | There isn't a lot to say here, except that as the implementation progresses, |
---|
109 | various rules about iterator invalidation should be hammered out. |
---|
110 | |
---|
111 | Ranges |
---|
112 | ------ |
---|
113 | The existing range syntax is incompatible with the for-each loop. But this |
---|
114 | can be recovered by making at least some of the range syntax into operators. |
---|
115 | These operators create and return new range types. |
---|
116 | |
---|
117 | The ranges are types that map a random numeric range. For iteration, a range |
---|
118 | is also considered an iterator over the numbers in the range. |
---|
119 | |
---|
120 | The there could be up to 6 new operators to create ranges. These are: |
---|
121 | `~` `+~` `~=` `+~=` `-~` `-~=` |
---|
122 | |
---|
123 | Parsing issues might rule some of these out. Others may be rendered |
---|
124 | irrelevant by iterator transformers. For example, adding a transformer that |
---|
125 | reverses the iterator could replace having downwards range operators. |
---|
126 | |
---|
127 | The one range that does not involve the operator is the lone number. This is |
---|
128 | a shorthand for a half-open/exclusive range from zero to the given number. |
---|
129 | That is the same behaviour as one (or two ~ and +~) of the existing range |
---|
130 | operators. So it only saves 2-3 characters. |
---|
131 | |
---|
132 | But if we want to save those characters, there may be a special form that |
---|
133 | could be used (for example, in the unnamed case) or you could treat integers |
---|
134 | as iterables that create iterators that count from zero up to the limit. |
---|
135 | |
---|
136 | forall(N) struct Upto { N current; N limit; }; |
---|
137 | Upto(int) get_iterator(const int&); |
---|
138 | |
---|
139 | Enumerations as Iterators |
---|
140 | ------------------------- |
---|
141 | To convert enumerations to the new form, a new syntax will have to be created |
---|
142 | to treat an enumeration type as an expression returning a range. |
---|
143 | |
---|
144 | Using the unmodified type name seems problematic, it could be done if we |
---|
145 | convert the type expression to some type object that can be passed to the |
---|
146 | get_iterator function. But there are other alternatives. |
---|
147 | |
---|
148 | An operator could be created to convert the type into an iterable expression. |
---|
149 | |
---|
150 | range_over(EnumType) |
---|
151 | |
---|
152 | It is also possible to create a library type that iterates over the range. |
---|
153 | Then we use a compound literal of a nullary type as a flag to get it. |
---|
154 | |
---|
155 | (ERange(EnumType)){} |
---|
156 | |
---|
157 | Both of these are placed in the EXPRESSION part of the for-each loop. |
---|
158 | |
---|
159 | Iterator Transformers |
---|
160 | --------------------- |
---|
161 | New library features could be created to work with iterators. One of the |
---|
162 | biggest groups are transformers, which allow for more complicated compound |
---|
163 | operations on iterators. |
---|
164 | |
---|
165 | For a few examples: |
---|
166 | |
---|
167 | // Create a new iterator that transforms elements from a source iterator. |
---|
168 | forall(I, T, U | is_iterator(I, T)) |
---|
169 | /* Iterator over U */ map(U (*func)(T), I iter); |
---|
170 | |
---|
171 | // Create an iterator that keeps only elements that satisfy a predicate. |
---|
172 | forall(I, T | is_iterator(I, T)) |
---|
173 | /* Iterator over T */ filter(bool (*pred)(const T&), I iter); |
---|
174 | |
---|
175 | forall(Is..., Ts... | is_iterator(Is, Ts)...) |
---|
176 | /* Iterator over [Ts] */ zip(Is iters); |
---|
177 | |
---|
178 | forall(Index, I, T | is_serial(Index) | is_iterator(I, T)) |
---|
179 | /* Iterator over [Index, T]) */ enumerate(Index, I); |
---|
180 | |
---|
181 | The return type is omitted in each case, because usually these functions end |
---|
182 | up being thin wrappers around a constructor for a specialized type that is |
---|
183 | the iterator, in each case, this is the return type. |
---|
184 | |
---|
185 | Also, these should be over iterables (with the conversion from iterable to |
---|
186 | iterator part of the thin wrapper), not just iterators. But the examples are |
---|
187 | easier to write out this way. |
---|
188 | |
---|
189 | Here is one example expanded to cover those details: |
---|
190 | |
---|
191 | forall(Iter, Elem) |
---|
192 | struct Filter { |
---|
193 | bool (*pred)(const Elem&); |
---|
194 | Iter iterator; |
---|
195 | } |
---|
196 | |
---|
197 | forall(Iter, Elem | is_iterator(Iter, Elem)) |
---|
198 | maybe(Elem) next(Filter(Iter, Elem) & this) { |
---|
199 | maybe(Elem) ret; |
---|
200 | do { |
---|
201 | ret = next(this.iterator); |
---|
202 | } while (has_value(ret) && this.pred(get(&ret))); |
---|
203 | return ret; |
---|
204 | } |
---|
205 | |
---|
206 | forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem)) |
---|
207 | Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) { |
---|
208 | return (Filter(Iter, Elem)){ pred, get_iterator(iterable) }; |
---|
209 | } |
---|
210 | |
---|
211 | Iterator Consumers |
---|
212 | ------------------ |
---|
213 | Eventually we will have to turn the iterators into something else. Iterator |
---|
214 | consumers take an iterator and turn it into something else. The biggest of |
---|
215 | these is the for-each loop, as described above. In addition various library |
---|
216 | functions can be added. |
---|
217 | |
---|
218 | One group of functions are container constructors which take everything from |
---|
219 | the iterable and build a new container out of it. There would be various |
---|
220 | instances of this because you will need one for each container type. |
---|
221 | |
---|
222 | Higher order folding functions (for example, the functional |
---|
223 | `foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does |
---|
224 | not work with anonymous functions quite as well. However, some of the |
---|
225 | specialized versions may still be useful. For example, getting the sum of an |
---|
226 | iterable's elements. |
---|
227 | |
---|
228 | forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) | |
---|
229 | { void ?{}(N&, zero_t); N ?+=?(N&, N); }) |
---|
230 | N sum(T & iterable) { |
---|
231 | N acc = 0; |
---|
232 | for ( i ; iterable ) acc += i; |
---|
233 | return acc; |
---|
234 | } |
---|
235 | |
---|
236 | Iterator Producers |
---|
237 | ------------------ |
---|
238 | Another group of library features are iterator producers, that take some |
---|
239 | data and create an iterator out of it. Many of already discussed cases are |
---|
240 | actually examples of this (containers, ranges and enumerations). |
---|
241 | |
---|
242 | There are a few more cases that might be useful. Such as "repeat", which |
---|
243 | takes a single element and iterates over it infinitely. This is usually |
---|
244 | combined with other iterators and not used directly. |
---|
245 | |
---|
246 | Other Improvements to Cforall |
---|
247 | ---------------------------- |
---|
248 | All code snippets should be considered pseudo-code because they require some |
---|
249 | features that do not currently exist. There are also some that just run into |
---|
250 | current bugs. For instance, my proof of concept for ERange encountered a few |
---|
251 | warnings even after I got it working. |
---|
252 | |
---|
253 | This proposal might also benefit from associated types. Both is_iterable and |
---|
254 | is_iterator may not need to be generic over all their type parameters. |
---|
255 | |
---|
256 | Tuple enhancements might also help with iterators that want to return more |
---|
257 | than one value. |
---|
258 | |
---|
259 | Related Work |
---|
260 | ------------ |
---|
261 | Python has a robust iterator tool set. It uses the same iterator / iterable |
---|
262 | divide as described above and has the interfaces (note, Python uses duck |
---|
263 | typing instead of defined interfaces, so these are "imaginary"): |
---|
264 | |
---|
265 | class Iterable: |
---|
266 | |
---|
267 | def __iter__(self): |
---|
268 | # Return an iterator over the iterable's contents. |
---|
269 | |
---|
270 | class Iterator(Iterable): |
---|
271 | |
---|
272 | def __iter__(self): |
---|
273 | # Iterators are iterable over themselves. |
---|
274 | return self |
---|
275 | |
---|
276 | def __next__(self): |
---|
277 | # Return next value if present and advance internal state. |
---|
278 | # Otherwise raise StopIteration exception. |
---|
279 | |
---|
280 | It uses this in for loops, the for loop takes an iterable gets the iterable, |
---|
281 | and iterates until the iterable is exhausted. Iterators are iterables so they |
---|
282 | can be passed in directly as well. |
---|
283 | |
---|
284 | It also has a `range` built-in which handles integer loops, although only |
---|
285 | increasing half-open ranges. |
---|
286 | |
---|
287 | In addition, it has many dedicated iterator constructors and transformers, |
---|
288 | and many containers can both produce and be constructed from iterators. |
---|
289 | For example, the chain function goes though a series of iterators in sequence |
---|
290 | an can be used by: |
---|
291 | |
---|
292 | import itertools |
---|
293 | itertools.chain(iter_a, iter_b) |
---|
294 | |
---|
295 | Python enumerations are implemented in the standard library and are iterables |
---|
296 | over all their possible values. There is no special construct for this, in |
---|
297 | Python types are object, and the type of an enum type object is EnumType |
---|
298 | which supports the iterable interface. |
---|
299 | |
---|
300 | for elem in SomeEnum: |
---|
301 | print(elem) |
---|
302 | |
---|
303 | Python References |
---|
304 | + https://docs.python.org/3/reference/datamodel.html#object.__iter__ |
---|
305 | + https://docs.python.org/3/library/functions.html#func-range |
---|
306 | + https://docs.python.org/3/library/enum.html |
---|
307 | |
---|
308 | C++ has many iterator tools at well, except for the fact it's "iterators" are |
---|
309 | not what are usually called iterators (as above) but rather an abstraction of |
---|
310 | pointers. The notable missing feature is that a single iterator has no |
---|
311 | concept of being empty or not, instead it must be compared to the end |
---|
312 | iterator. |
---|
313 | |
---|
314 | However, C++ ranges have an interface much more similar to iterators. |
---|
315 | They do appear to be a wrapper around the "pointer" iterators. |
---|
316 | |
---|
317 | template< class T > |
---|
318 | concept range = requires( T& t ) { |
---|
319 | ranges::begin(t); |
---|
320 | ranges::end(t); |
---|
321 | }; |
---|
322 | |
---|
323 | C++ also has "structured bindings" which can be used to unpack an object and |
---|
324 | bind its components to different names. This can be used to iterate over |
---|
325 | multiple values simultaneously. In this example from the Cforall compiler |
---|
326 | using a custom utility function written without special support. |
---|
327 | |
---|
328 | for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) { |
---|
329 | ... |
---|
330 | } |
---|
331 | |
---|
332 | C++ References |
---|
333 | + https://en.cppreference.com/w/cpp/ranges |
---|
334 | + https://en.cppreference.com/w/cpp/language/structured_binding |
---|
335 | |
---|
336 | Rust also has a imperative implementation of a functional style of iterators, |
---|
337 | with iterator tools implemented as "Provided Methods". Provided methods |
---|
338 | including a great number of standard transformers. Otherwise, it is very |
---|
339 | similar to Python. |
---|
340 | |
---|
341 | pub trait Iterator { |
---|
342 | type Item; |
---|
343 | |
---|
344 | fn next(&mut self) -> Option<Self::Item>; |
---|
345 | |
---|
346 | // Many provided methods ... |
---|
347 | } |
---|
348 | |
---|
349 | Not that Rust has many of its iterator transformers stored as methods on the |
---|
350 | iterator. These methods are "provided" so you only have to define next, but |
---|
351 | you can redefine the other methods to provide optimizations. |
---|
352 | |
---|
353 | Rust also has range expressions. Based around `start .. end` and |
---|
354 | `start ..= end`, with the sub-expressions giving the start and end of the |
---|
355 | range being optional, except for end after `..=`. If start is omitted, the |
---|
356 | range has no lower bound, otherwise the range uses start as its inclusive |
---|
357 | lower bound. It is the same case for end and the upper bound, except that the |
---|
358 | `..=` will have an inclusive upper bound instead of an exclusive one. |
---|
359 | |
---|
360 | Rust achieves ordering and step control by iterator transformers. `.rev()` |
---|
361 | allows you to reverse a range and `.step_by(step)` provides a positive step |
---|
362 | value. |
---|
363 | |
---|
364 | Rust Reference |
---|
365 | + https://doc.rust-lang.org/std/iter/trait.Iterator.html |
---|
366 | + https://doc.rust-lang.org/reference/expressions/range-expr.html |
---|
367 | + https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev |
---|