| 1 | Iterators
 | 
|---|
| 2 | =========
 | 
|---|
| 3 | This is the proposal for adding iterators to Cforall and the standard
 | 
|---|
| 4 | library. Iterators provide a common interface for sequences of values in
 | 
|---|
| 5 | the language. Many inputs and outputs can be described in terms of sequences,
 | 
|---|
| 6 | creating a common interface that can be used in many places.
 | 
|---|
| 7 | 
 | 
|---|
| 8 | Related Traits
 | 
|---|
| 9 | --------------
 | 
|---|
| 10 | There are two groups of types that interact with this proposal.
 | 
|---|
| 11 | 
 | 
|---|
| 12 | Iterable
 | 
|---|
| 13 | 
 | 
|---|
| 14 | An iterable is a container of some type that has a default iterator that goes
 | 
|---|
| 15 | over its own contents. It may be a logical container, like a mathematical range,
 | 
|---|
| 16 | that does not store elements in memory, but it should have logical elements.
 | 
|---|
| 17 | A trait is defined to standardize how to get that iterator.
 | 
|---|
| 18 | 
 | 
|---|
| 19 |         forall(C, I)
 | 
|---|
| 20 |         trait is_iterable {
 | 
|---|
| 21 |                 I get_iterator(C&);
 | 
|---|
| 22 |         }
 | 
|---|
| 23 | 
 | 
|---|
| 24 | This is not particularly useful on its own, but is used as standardized glue
 | 
|---|
| 25 | between some other features below.
 | 
|---|
| 26 | 
 | 
|---|
| 27 | Iterator
 | 
|---|
| 28 | 
 | 
|---|
| 29 | An iterator is an object that can be advanced through a sequence of values.
 | 
|---|
| 30 | An iterator can be checked to see if it is at the end of the sequence, if not
 | 
|---|
| 31 | the next value can be retrieved from it and the iterator advanced. This can
 | 
|---|
| 32 | be done in as little as a single operation:
 | 
|---|
| 33 | 
 | 
|---|
| 34 |         forall(Iter, Elem)
 | 
|---|
| 35 |         trait is_iterator {
 | 
|---|
| 36 |                 maybe(Elem) next(Iter&);
 | 
|---|
| 37 | 
 | 
|---|
| 38 |                 // is_iterable(I, I&)
 | 
|---|
| 39 |                 Iter& get_iterator(Iter&);
 | 
|---|
| 40 |         }
 | 
|---|
| 41 | 
 | 
|---|
| 42 | This interface is not fixed, the function could change or be split into
 | 
|---|
| 43 | multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural,
 | 
|---|
| 44 | but also efficient to implement. This version will be used for examples in
 | 
|---|
| 45 | the proposal.
 | 
|---|
| 46 | 
 | 
|---|
| 47 | When next is called, if there are elements left in the iterators, the next
 | 
|---|
| 48 | element is returned in a maybe and the iterator is advanced. If the iterator
 | 
|---|
| 49 | is exhausted nothing is changed and an empty maybe is returned.
 | 
|---|
| 50 | 
 | 
|---|
| 51 | Iterators are also iterables. They must return themselves, this allows
 | 
|---|
| 52 | iterators to be used directly where iterables are expected.
 | 
|---|
| 53 | 
 | 
|---|
| 54 | For-Each Loop
 | 
|---|
| 55 | -------------
 | 
|---|
| 56 | The main part of this proposal which is part of the language definition,
 | 
|---|
| 57 | the rest is part of the standard library. And this is the new for-each loop:
 | 
|---|
| 58 | 
 | 
|---|
| 59 |         for ( IDENTIFIER ; EXPRESSION ) STATEMENT
 | 
|---|
| 60 | 
 | 
|---|
| 61 | While this uses similar syntax to the existing special for loops, but takes
 | 
|---|
| 62 | a standard expression. This expression is evaluated, and then the iterator is
 | 
|---|
| 63 | retrieved from it (this should be the identity function iterators). Each
 | 
|---|
| 64 | iteration of the loop then checks the iterator to see if it is exhausted, if
 | 
|---|
| 65 | not it uses the value to execute the statement. If the iterator is exhausted,
 | 
|---|
| 66 | the loop is completed, and control continues after the loop (or the else
 | 
|---|
| 67 | clause if provided).
 | 
|---|
| 68 | 
 | 
|---|
| 69 | The loop should show the same behaviour as the following code:
 | 
|---|
| 70 | (`typeof(?)` should be decided by the resolver.)
 | 
|---|
| 71 | 
 | 
|---|
| 72 |         {
 | 
|---|
| 73 |                 typeof(?) __iterable = EXPRESSION;
 | 
|---|
| 74 |                 typeof(?) __iterator = get_iterator(__iterable);
 | 
|---|
| 75 |                 typeof(?) __next = next(__iterator);
 | 
|---|
| 76 |                 while (has_value(&__next)) {
 | 
|---|
| 77 |                         typeof(?) IDENTIFIER = get(&__next)
 | 
|---|
| 78 |                         STATEMENT
 | 
|---|
| 79 |                         __next = next(__iterator);
 | 
|---|
| 80 |                 }
 | 
|---|
| 81 |         }
 | 
|---|
| 82 | 
 | 
|---|
| 83 | This new loop can (but may not need to) also support the existing variants,
 | 
|---|
| 84 | where the name is omitted or multiple iterators are advanced in lock-step:
 | 
|---|
| 85 | 
 | 
|---|
| 86 |         for ( EXPRESSION ) STATEMENT
 | 
|---|
| 87 |         for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT
 | 
|---|
| 88 | 
 | 
|---|
| 89 | These may not be needed. The unnamed variant is a nice short hand.
 | 
|---|
| 90 | The multiple iterator method could be replaced with an iterator transformer
 | 
|---|
| 91 | that has the same behaviour:
 | 
|---|
| 92 | 
 | 
|---|
| 93 |         for (i, j, k; zip(iter, range, container)) { ... }
 | 
|---|
| 94 | 
 | 
|---|
| 95 | This would require some further extensions, but would also allow single
 | 
|---|
| 96 | iterators to iterate over sets of related values. It also shows how library
 | 
|---|
| 97 | features can be used to extend the for-each loop.
 | 
|---|
| 98 | 
 | 
|---|
| 99 | Containers as Iterables
 | 
|---|
| 100 | -----------------------
 | 
|---|
| 101 | The primary iterator application is going over the contents of various
 | 
|---|
| 102 | collections. Most of the collections should support the iterable interface.
 | 
|---|
| 103 | Each of these iterators should go over each element in the container.
 | 
|---|
| 104 | 
 | 
|---|
| 105 | They could support multiple iterator interfaces. For example, a map could go
 | 
|---|
| 106 | over key-value-pairs, or just keys or just values.
 | 
|---|
| 107 | 
 | 
|---|
| 108 | There isn't a lot to say here, except that as the implementation progresses,
 | 
|---|
| 109 | various rules about iterator invalidation should be hammered out.
 | 
|---|
| 110 | 
 | 
|---|
| 111 | Ranges
 | 
|---|
| 112 | ------
 | 
|---|
| 113 | The existing range syntax is incompatible with the for-each loop. But this
 | 
|---|
| 114 | can be recovered by making at least some of the range syntax into operators.
 | 
|---|
| 115 | These operators create and return new range types.
 | 
|---|
| 116 | 
 | 
|---|
| 117 | The ranges are types that map a random numeric range. For iteration, a range
 | 
|---|
| 118 | is also considered an iterator over the numbers in the range.
 | 
|---|
| 119 | 
 | 
|---|
| 120 | The there could be up to 6 new operators to create ranges. These are:
 | 
|---|
| 121 |     `~` `+~` `~=` `+~=` `-~` `-~=`
 | 
|---|
| 122 | 
 | 
|---|
| 123 | Parsing issues might rule some of these out. Others may be rendered
 | 
|---|
| 124 | irrelevant by iterator transformers. For example, adding a transformer that
 | 
|---|
| 125 | reverses the iterator could replace having downwards range operators.
 | 
|---|
| 126 | 
 | 
|---|
| 127 | The one range that does not involve the operator is the lone number. This is
 | 
|---|
| 128 | a shorthand for a half-open/exclusive range from zero to the given number.
 | 
|---|
| 129 | That is the same behaviour as one (or two ~ and +~) of the existing range
 | 
|---|
| 130 | operators. So it only saves 2-3 characters.
 | 
|---|
| 131 | 
 | 
|---|
| 132 | But if we want to save those characters, there may be a special form that
 | 
|---|
| 133 | could be used (for example, in the unnamed case) or you could treat integers
 | 
|---|
| 134 | as iterables that create iterators that count from zero up to the limit.
 | 
|---|
| 135 | 
 | 
|---|
| 136 |         forall(N) struct Upto { N current; N limit; };
 | 
|---|
| 137 |         Upto(int) get_iterator(const int&);
 | 
|---|
| 138 | 
 | 
|---|
| 139 | Enumerations as Iterators
 | 
|---|
| 140 | -------------------------
 | 
|---|
| 141 | To convert enumerations to the new form, a new syntax will have to be created
 | 
|---|
| 142 | to treat an enumeration type as an expression returning a range.
 | 
|---|
| 143 | 
 | 
|---|
| 144 | Using the unmodified type name seems problematic, it could be done if we
 | 
|---|
| 145 | convert the type expression to some type object that can be passed to the
 | 
|---|
| 146 | get_iterator function. But there are other alternatives.
 | 
|---|
| 147 | 
 | 
|---|
| 148 | An operator could be created to convert the type into an iterable expression.
 | 
|---|
| 149 | 
 | 
|---|
| 150 |         range_over(EnumType)
 | 
|---|
| 151 | 
 | 
|---|
| 152 | It is also possible to create a library type that iterates over the range.
 | 
|---|
| 153 | Then we use a compound literal of a nullary type as a flag to get it.
 | 
|---|
| 154 | 
 | 
|---|
| 155 |         (ERange(EnumType)){}
 | 
|---|
| 156 | 
 | 
|---|
| 157 | Both of these are placed in the EXPRESSION part of the for-each loop.
 | 
|---|
| 158 | 
 | 
|---|
| 159 | Iterator Transformers
 | 
|---|
| 160 | ---------------------
 | 
|---|
| 161 | New library features could be created to work with iterators. One of the
 | 
|---|
| 162 | biggest groups are transformers, which allow for more complicated compound
 | 
|---|
| 163 | operations on iterators.
 | 
|---|
| 164 | 
 | 
|---|
| 165 | For a few examples:
 | 
|---|
| 166 | 
 | 
|---|
| 167 |         // Create a new iterator that transforms elements from a source iterator.
 | 
|---|
| 168 |         forall(I, T, U | is_iterator(I, T))
 | 
|---|
| 169 |         /* Iterator over U */ map(U (*func)(T), I iter);
 | 
|---|
| 170 | 
 | 
|---|
| 171 |         // Create an iterator that keeps only elements that satisfy a predicate.
 | 
|---|
| 172 |         forall(I, T | is_iterator(I, T))
 | 
|---|
| 173 |         /* Iterator over T */ filter(bool (*pred)(const T&), I iter);
 | 
|---|
| 174 | 
 | 
|---|
| 175 |         forall(Is..., Ts... | is_iterator(Is, Ts)...)
 | 
|---|
| 176 |         /* Iterator over [Ts] */ zip(Is iters);
 | 
|---|
| 177 | 
 | 
|---|
| 178 |         forall(Index, I, T | is_serial(Index) | is_iterator(I, T))
 | 
|---|
| 179 |         /* Iterator over [Index, T]) */ enumerate(Index, I);
 | 
|---|
| 180 | 
 | 
|---|
| 181 | The return type is omitted in each case, because usually these functions end
 | 
|---|
| 182 | up being thin wrappers around a constructor for a specialized type that is
 | 
|---|
| 183 | the iterator, in each case, this is the return type.
 | 
|---|
| 184 | 
 | 
|---|
| 185 | Also, these should be over iterables (with the conversion from iterable to
 | 
|---|
| 186 | iterator part of the thin wrapper), not just iterators. But the examples are
 | 
|---|
| 187 | easier to write out this way.
 | 
|---|
| 188 | 
 | 
|---|
| 189 | Here is one example expanded to cover those details:
 | 
|---|
| 190 | 
 | 
|---|
| 191 |         forall(Iter, Elem)
 | 
|---|
| 192 |         struct Filter {
 | 
|---|
| 193 |                 bool (*pred)(const Elem&);
 | 
|---|
| 194 |                 Iter iterator;
 | 
|---|
| 195 |         }
 | 
|---|
| 196 | 
 | 
|---|
| 197 |         forall(Iter, Elem | is_iterator(Iter, Elem))
 | 
|---|
| 198 |         maybe(Elem) next(Filter(Iter, Elem) & this) {
 | 
|---|
| 199 |                 maybe(Elem) ret;
 | 
|---|
| 200 |                 do {
 | 
|---|
| 201 |                         ret = next(this.iterator);
 | 
|---|
| 202 |                 } while (has_value(ret) && this.pred(get(&ret)));
 | 
|---|
| 203 |                 return ret;
 | 
|---|
| 204 |         }
 | 
|---|
| 205 | 
 | 
|---|
| 206 |         forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem))
 | 
|---|
| 207 |         Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) {
 | 
|---|
| 208 |                 return (Filter(Iter, Elem)){ pred, get_iterator(iterable) };
 | 
|---|
| 209 |         }
 | 
|---|
| 210 | 
 | 
|---|
| 211 | Iterator Consumers
 | 
|---|
| 212 | ------------------
 | 
|---|
| 213 | Eventually we will have to turn the iterators into something else. Iterator
 | 
|---|
| 214 | consumers take an iterator and turn it into something else. The biggest of
 | 
|---|
| 215 | these is the for-each loop, as described above. In addition various library
 | 
|---|
| 216 | functions can be added.
 | 
|---|
| 217 | 
 | 
|---|
| 218 | One group of functions are container constructors which take everything from
 | 
|---|
| 219 | the iterable and build a new container out of it. There would be various
 | 
|---|
| 220 | instances of this because you will need one for each container type.
 | 
|---|
| 221 | 
 | 
|---|
| 222 | Higher order folding functions (for example, the functional
 | 
|---|
| 223 | `foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does
 | 
|---|
| 224 | not work with anonymous functions quite as well. However, some of the
 | 
|---|
| 225 | specialized versions may still be useful. For example, getting the sum of an
 | 
|---|
| 226 | iterable's elements.
 | 
|---|
| 227 | 
 | 
|---|
| 228 |         forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) |
 | 
|---|
| 229 |                         { void ?{}(N&, zero_t); N ?+=?(N&, N); })
 | 
|---|
| 230 |         N sum(T & iterable) {
 | 
|---|
| 231 |                 N acc = 0;
 | 
|---|
| 232 |                 for ( i ; iterable ) acc += i;
 | 
|---|
| 233 |                 return acc;
 | 
|---|
| 234 |         }
 | 
|---|
| 235 | 
 | 
|---|
| 236 | Iterator Producers
 | 
|---|
| 237 | ------------------
 | 
|---|
| 238 | Another group of library features are iterator producers, that take some
 | 
|---|
| 239 | data and create an iterator out of it. Many of already discussed cases are
 | 
|---|
| 240 | actually examples of this (containers, ranges and enumerations).
 | 
|---|
| 241 | 
 | 
|---|
| 242 | There are a few more cases that might be useful. Such as "repeat", which
 | 
|---|
| 243 | takes a single element and iterates over it infinitely. This is usually
 | 
|---|
| 244 | combined with other iterators and not used directly.
 | 
|---|
| 245 | 
 | 
|---|
| 246 | Other Improvements to Cforall
 | 
|---|
| 247 | ----------------------------
 | 
|---|
| 248 | All code snippets should be considered pseudo-code because they require some
 | 
|---|
| 249 | features that do not currently exist. There are also some that just run into
 | 
|---|
| 250 | current bugs. For instance, my proof of concept for ERange encountered a few
 | 
|---|
| 251 | warnings even after I got it working.
 | 
|---|
| 252 | 
 | 
|---|
| 253 | This proposal might also benefit from associated types. Both is_iterable and
 | 
|---|
| 254 | is_iterator may not need to be generic over all their type parameters.
 | 
|---|
| 255 | 
 | 
|---|
| 256 | Tuple enhancements might also help with iterators that want to return more
 | 
|---|
| 257 | than one value.
 | 
|---|
| 258 | 
 | 
|---|
| 259 | Related Work
 | 
|---|
| 260 | ------------
 | 
|---|
| 261 | Python has a robust iterator tool set. It uses the same iterator / iterable
 | 
|---|
| 262 | divide as described above and has the interfaces (note, Python uses duck
 | 
|---|
| 263 | typing instead of defined interfaces, so these are "imaginary"):
 | 
|---|
| 264 | 
 | 
|---|
| 265 |         class Iterable:
 | 
|---|
| 266 | 
 | 
|---|
| 267 |                 def __iter__(self):
 | 
|---|
| 268 |                         # Return an iterator over the iterable's contents.
 | 
|---|
| 269 | 
 | 
|---|
| 270 |         class Iterator(Iterable):
 | 
|---|
| 271 | 
 | 
|---|
| 272 |                 def __iter__(self):
 | 
|---|
| 273 |                         # Iterators are iterable over themselves.
 | 
|---|
| 274 |                         return self
 | 
|---|
| 275 | 
 | 
|---|
| 276 |                 def __next__(self):
 | 
|---|
| 277 |                         # Return next value if present and advance internal state.
 | 
|---|
| 278 |                         # Otherwise raise StopIteration exception.
 | 
|---|
| 279 | 
 | 
|---|
| 280 | It uses this in for loops, the for loop takes an iterable gets the iterable,
 | 
|---|
| 281 | and iterates until the iterable is exhausted. Iterators are iterables so they
 | 
|---|
| 282 | can be passed in directly as well.
 | 
|---|
| 283 | 
 | 
|---|
| 284 | It also has a `range` built-in which handles integer loops, although only
 | 
|---|
| 285 | increasing half-open ranges.
 | 
|---|
| 286 | 
 | 
|---|
| 287 | In addition, it has many dedicated iterator constructors and transformers,
 | 
|---|
| 288 | and many containers can both produce and be constructed from iterators.
 | 
|---|
| 289 | For example, the chain function goes though a series of iterators in sequence
 | 
|---|
| 290 | an can be used by:
 | 
|---|
| 291 | 
 | 
|---|
| 292 |         import itertools
 | 
|---|
| 293 |         itertools.chain(iter_a, iter_b)
 | 
|---|
| 294 | 
 | 
|---|
| 295 | Python enumerations are implemented in the standard library and are iterables
 | 
|---|
| 296 | over all their possible values. There is no special construct for this, in
 | 
|---|
| 297 | Python types are object, and the type of an enum type object is EnumType
 | 
|---|
| 298 | which supports the iterable interface.
 | 
|---|
| 299 | 
 | 
|---|
| 300 |         for elem in SomeEnum:
 | 
|---|
| 301 |                 print(elem)
 | 
|---|
| 302 | 
 | 
|---|
| 303 | Python References
 | 
|---|
| 304 | +   https://docs.python.org/3/reference/datamodel.html#object.__iter__
 | 
|---|
| 305 | +   https://docs.python.org/3/library/functions.html#func-range
 | 
|---|
| 306 | +   https://docs.python.org/3/library/enum.html
 | 
|---|
| 307 | 
 | 
|---|
| 308 | C++ has many iterator tools at well, except for the fact it's "iterators" are
 | 
|---|
| 309 | not what are usually called iterators (as above) but rather an abstraction of
 | 
|---|
| 310 | pointers. The notable missing feature is that a single iterator has no
 | 
|---|
| 311 | concept of being empty or not, instead it must be compared to the end
 | 
|---|
| 312 | iterator.
 | 
|---|
| 313 | 
 | 
|---|
| 314 | However, C++ ranges have an interface much more similar to iterators.
 | 
|---|
| 315 | They do appear to be a wrapper around the "pointer" iterators.
 | 
|---|
| 316 | 
 | 
|---|
| 317 |         template< class T >
 | 
|---|
| 318 |         concept range = requires( T& t ) {
 | 
|---|
| 319 |                 ranges::begin(t);
 | 
|---|
| 320 |                 ranges::end(t);
 | 
|---|
| 321 |         };
 | 
|---|
| 322 | 
 | 
|---|
| 323 | C++ also has "structured bindings" which can be used to unpack an object and
 | 
|---|
| 324 | bind its components to different names. This can be used to iterate over
 | 
|---|
| 325 | multiple values simultaneously. In this example from the Cforall compiler
 | 
|---|
| 326 | using a custom utility function written without special support.
 | 
|---|
| 327 | 
 | 
|---|
| 328 |         for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) {
 | 
|---|
| 329 |                 ...
 | 
|---|
| 330 |         }
 | 
|---|
| 331 | 
 | 
|---|
| 332 | C++ References
 | 
|---|
| 333 | +   https://en.cppreference.com/w/cpp/ranges
 | 
|---|
| 334 | +   https://en.cppreference.com/w/cpp/language/structured_binding
 | 
|---|
| 335 | 
 | 
|---|
| 336 | Rust also has a imperative implementation of a functional style of iterators,
 | 
|---|
| 337 | with iterator tools implemented as "Provided Methods". Provided methods
 | 
|---|
| 338 | including a great number of standard transformers. Otherwise, it is very
 | 
|---|
| 339 | similar to Python.
 | 
|---|
| 340 | 
 | 
|---|
| 341 |         pub trait Iterator {
 | 
|---|
| 342 |                 type Item;
 | 
|---|
| 343 | 
 | 
|---|
| 344 |                 fn next(&mut self) -> Option<Self::Item>;
 | 
|---|
| 345 | 
 | 
|---|
| 346 |                 // Many provided methods ...
 | 
|---|
| 347 |         }
 | 
|---|
| 348 | 
 | 
|---|
| 349 | Not that Rust has many of its iterator transformers stored as methods on the
 | 
|---|
| 350 | iterator. These methods are "provided" so you only have to define next, but
 | 
|---|
| 351 | you can redefine the other methods to provide optimizations.
 | 
|---|
| 352 | 
 | 
|---|
| 353 | Rust also has range expressions. Based around `start .. end` and
 | 
|---|
| 354 | `start ..= end`, with the sub-expressions giving the start and end of the
 | 
|---|
| 355 | range being optional, except for end after `..=`. If start is omitted, the
 | 
|---|
| 356 | range has no lower bound, otherwise the range uses start as its inclusive
 | 
|---|
| 357 | lower bound. It is the same case for end and the upper bound, except that the
 | 
|---|
| 358 | `..=` will have an inclusive upper bound instead of an exclusive one.
 | 
|---|
| 359 | 
 | 
|---|
| 360 | Rust achieves ordering and step control by iterator transformers. `.rev()`
 | 
|---|
| 361 | allows you to reverse a range and `.step_by(step)` provides a positive step
 | 
|---|
| 362 | value.
 | 
|---|
| 363 | 
 | 
|---|
| 364 | Rust Reference
 | 
|---|
| 365 | +   https://doc.rust-lang.org/std/iter/trait.Iterator.html
 | 
|---|
| 366 | +   https://doc.rust-lang.org/reference/expressions/range-expr.html
 | 
|---|
| 367 | +   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev
 | 
|---|