Changeset 1f922f4 for doc/proposals/iterators.md
- Timestamp:
- Jul 23, 2024, 3:23:04 PM (8 weeks ago)
- Branches:
- master
- Children:
- 35c792f
- Parents:
- 719fdbc
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/iterators.md
r719fdbc r1f922f4 10 10 There are two groups of types that interact with this proposal. 11 11 12 Iterable 13 14 An iterable is a container of some type that has a default iterator that goes 15 over its own contents. It may be a logical container, like a mathematical range, 16 that does not store elements in memory, but it should have logical elements. 17 A trait is defined to standardize how to get that iterator. 18 19 forall(C, I) 20 trait is_iterable { 21 I get_iterator(C&); 22 } 23 24 This is not particularly useful on its own, but is used as standardized glue 25 between some other features below. 26 12 27 Iterator 13 28 14 An iterator has a very simple interface with a single operation. 15 The operation is "get the next value in the sequence", but this actually has 16 several parts, in that it has to check if there are move values, return the 17 next one if there is one, and update any internal information in the iterator. 18 For example: `Maybe(Item) next(Iter &);`. 19 20 Now, iterators can have other operations. Notably, they are often also 21 iterables that return themselves. They can also have a verity of iterator 22 transformers built in. 23 24 Iterable 25 26 Anything that you can get an iterator from is called an iterable. There 27 is an operation to get an iterator from an iterable. 28 29 Range For Loop 30 -------------- 31 One part of the language that could be reworked to make good use of this is 32 for loops. In short, remove most of the special rules that can be done inside 33 the identifier and make it a generic range for loop: 34 35 ``` 36 for ( IDENTIFIER ; EXPRESSION ) STATEMENT 37 ``` 38 39 The common way to implement this is that expression produces an iterable. 40 The for loop gets an iterator from the iterable (which is why iterators are 41 often iterables, so they can be passed in with the same interface) and stores 42 it. Then, for each value in the iterator, the loop binds the value to the 43 identifier and then executes the statement. The loop exits after every value 44 has been used and the iterator is exhausted. 45 46 For the chained for loop (`for (i; _: j; _)`) can still have its existing 47 behaviour, advancing through each range in parallel and stopping as soon 48 as the first one is exhausted. 29 An iterator is an object that can be advanced through a sequence of values. 30 An iterator can be checked to see if it is at the end of the sequence, if not 31 the next value can be retrieved from it and the iterator advanced. This can 32 be done in as little as a single operation: 33 34 forall(Iter, Elem) 35 trait is_iterator { 36 maybe(Elem) next(Iter&); 37 38 // is_iterable(I, I&) 39 Iter& get_iterator(Iter&); 40 } 41 42 This interface is not fixed, the function could change or be split into 43 multiple functions (ex: `bool next(Elem&, Iter&)`). It should be natural, 44 but also efficient to implement. This version will be used for examples in 45 the proposal. 46 47 When next is called, if there are elements left in the iterators, the next 48 element is returned in a maybe and the iterator is advanced. If the iterator 49 is exhausted nothing is changed and an empty maybe is returned. 50 51 Iterators are also iterables. They must return themselves, this allows 52 iterators to be used directly where iterables are expected. 53 54 For-Each Loop 55 ------------- 56 The main part of this proposal which is part of the language definition, 57 the rest is part of the standard library. And this is the new for-each loop: 58 59 for ( IDENTIFIER ; EXPRESSION ) STATEMENT 60 61 While this uses similar syntax to the existing special for loops, but takes 62 a standard expression. This expression is evaluated, and then the iterator is 63 retrieved from it (this should be the identity function iterators). Each 64 iteration of the loop then checks the iterator to see if it is exhausted, if 65 not it uses the value to execute the statement. If the iterator is exhausted, 66 the loop is completed, and control continues after the loop (or the else 67 clause if provided). 68 69 The loop should show the same behaviour as the following code: 70 (`typeof(?)` should be decided by the resolver.) 71 72 { 73 typeof(?) __iterable = EXPRESSION; 74 typeof(?) __iterator = get_iterator(__iterable); 75 typeof(?) __next = next(__iterator); 76 while (has_value(&__next)) { 77 typeof(?) IDENTIFIER = get(&__next) 78 STATEMENT 79 __next = next(__iterator); 80 } 81 } 82 83 This new loop can (but may not need to) also support the existing variants, 84 where the name is omitted or multiple iterators are advanced in lock-step: 85 86 for ( EXPRESSION ) STATEMENT 87 for ( IDENTIFIER0 ; EXPRESSION0 : IDENTIFIER1 ; EXPRESSION1 ) STATEMENT 88 89 These may not be needed. The unnamed variant is a nice short hand. 90 The multiple iterator method could be replaced with an iterator transformer 91 that has the same behaviour: 92 93 for (i, j, k; zip(iter, range, container)) { ... } 94 95 This would require some further extensions, but would also allow single 96 iterators to iterate over sets of related values. It also shows how library 97 features can be used to extend the for-each loop. 98 99 Containers as Iterables 100 ----------------------- 101 The primary iterator application is going over the contents of various 102 collections. Most of the collections should support the iterable interface. 103 Each of these iterators should go over each element in the container. 104 105 They could support multiple iterator interfaces. For example, a map could go 106 over key-value-pairs, or just keys or just values. 107 108 There isn't a lot to say here, except that as the implementation progresses, 109 various rules about iterator invalidation should be hammered out. 49 110 50 111 Ranges 51 112 ------ 52 Ranges, which may be a data type or a trait, are containers that contain 53 a sequence of values. Unlike an array or vector, these values are stored 54 logically instead of by copy. 55 56 The purpose of this container is to bridge the new iterator interfaces with 57 the existing range syntax. The range syntax would become an operator that 58 returns a range object, which can be used as any other type. 59 60 It might not cover every single case with the same syntax (the `@` syntax may 61 not translate to operators very well), but should be able to maintain every 62 option with some library range. 63 64 Library Enhancements 65 -------------------- 66 There are various other tools in the library that should be improved. 67 The simplest is to make sure most containers are iterables. 68 69 Also, new utilities for manipulating iterators should be created. The exact 70 list would have to wait but here are some examples. 71 72 Transformers take in an iterator and produce another iterator. 73 Examples include map, which modifies each element in turn, and filter, 74 which checks each element and removes the ones that fail. 75 76 Producers create new iterators from other information. 77 Most practical iterators tend to be iterable containers, which produce all 78 the elements in the container, this includes ranges. Others include infinite 79 series of one element. 80 81 Consumers take an iterator and convert it into something else. 82 They might be converted into a container or used in a for loop. Dedicated 83 consumers will be some form of folding function. 113 The existing range syntax is incompatible with the for-each loop. But this 114 can be recovered by making at least some of the range syntax into operators. 115 These operators create and return new range types. 116 117 The ranges are types that map a random numeric range. For iteration, a range 118 is also considered an iterator over the numbers in the range. 119 120 The there could be up to 6 new operators to create ranges. These are: 121 `~` `+~` `~=` `+~=` `-~` `-~=` 122 123 Parsing issues might rule some of these out. Others may be rendered 124 irrelevant by iterator transformers. For example, adding a transformer that 125 reverses the iterator could replace having downwards range operators. 126 127 The one range that does not involve the operator is the lone number. This is 128 a shorthand for a half-open/exclusive range from zero to the given number. 129 That is the same behaviour as one (or two ~ and +~) of the existing range 130 operators. So it only saves 2-3 characters. 131 132 But if we want to save those characters, there may be a special form that 133 could be used (for example, in the unnamed case) or you could treat integers 134 as iterables that create iterators that count from zero up to the limit. 135 136 forall(N) struct Upto { N current; N limit; }; 137 Upto(int) get_iterator(const int&); 138 139 Enumerations as Iterators 140 ------------------------- 141 To convert enumerations to the new form, a new syntax will have to be created 142 to treat an enumeration type as an expression returning a range. 143 144 Using the unmodified type name seems problematic, it could be done if we 145 convert the type expression to some type object that can be passed to the 146 get_iterator function. But there are other alternatives. 147 148 An operator could be created to convert the type into an iterable expression. 149 150 range_over(EnumType) 151 152 It is also possible to create a library type that iterates over the range. 153 Then we use a compound literal of a nullary type as a flag to get it. 154 155 (ERange(EnumType)){} 156 157 Both of these are placed in the EXPRESSION part of the for-each loop. 158 159 Iterator Transformers 160 --------------------- 161 New library features could be created to work with iterators. One of the 162 biggest groups are transformers, which allow for more complicated compound 163 operations on iterators. 164 165 For a few examples: 166 167 // Create a new iterator that transforms elements from a source iterator. 168 forall(I, T, U | is_iterator(I, T)) 169 /* Iterator over U */ map(U (*func)(T), I iter); 170 171 // Create an iterator that keeps only elements that satisfy a predicate. 172 forall(I, T | is_iterator(I, T)) 173 /* Iterator over T */ filter(bool (*pred)(const T&), I iter); 174 175 forall(Is..., Ts... | is_iterator(Is, Ts)...) 176 /* Iterator over [Ts] */ zip(Is iters); 177 178 forall(Index, I, T | is_serial(Index) | is_iterator(I, T)) 179 /* Iterator over [Index, T]) */ enumerate(Index, I); 180 181 The return type is omitted in each case, because usually these functions end 182 up being thin wrappers around a constructor for a specialized type that is 183 the iterator, in each case, this is the return type. 184 185 Also, these should be over iterables (with the conversion from iterable to 186 iterator part of the thin wrapper), not just iterators. But the examples are 187 easier to write out this way. 188 189 Here is one example expanded to cover those details: 190 191 forall(Iter, Elem) 192 struct Filter { 193 bool (*pred)(const Elem&); 194 Iter iterator; 195 } 196 197 forall(Iter, Elem | is_iterator(Iter, Elem)) 198 maybe(Elem) next(Filter(Iter, Elem) & this) { 199 maybe(Elem) ret; 200 do { 201 ret = next(this.iterator); 202 } while (has_value(ret) && this.pred(get(&ret))); 203 return ret; 204 } 205 206 forall(T, Iter, Elem | is_iterable(T, Iter) | is_iterator(Iter, Elem)) 207 Filter(Iter, Elem) filter(bool (*pred)(const Elem&), T& iterable) { 208 return (Filter(Iter, Elem)){ pred, get_iterator(iterable) }; 209 } 210 211 Iterator Consumers 212 ------------------ 213 Eventually we will have to turn the iterators into something else. Iterator 214 consumers take an iterator and turn it into something else. The biggest of 215 these is the for-each loop, as described above. In addition various library 216 functions can be added. 217 218 One group of functions are container constructors which take everything from 219 the iterable and build a new container out of it. There would be various 220 instances of this because you will need one for each container type. 221 222 Higher order folding functions (for example, the functional 223 `foldr : (a -> b -> b) -> b -> [a]`) are a bit less useful since Cforall does 224 not work with anonymous functions quite as well. However, some of the 225 specialized versions may still be useful. For example, getting the sum of an 226 iterable's elements. 227 228 forall(T, I, N | is_iterable(T, I) | is_iterator(I, N) | 229 { void ?{}(N&, zero_t); N ?+=?(N&, N); }) 230 N sum(T & iterable) { 231 N acc = 0; 232 for ( i ; iterable ) acc += i; 233 return acc; 234 } 235 236 Iterator Producers 237 ------------------ 238 Another group of library features are iterator producers, that take some 239 data and create an iterator out of it. Many of already discussed cases are 240 actually examples of this (containers, ranges and enumerations). 241 242 There are a few more cases that might be useful. Such as "repeat", which 243 takes a single element and iterates over it infinitely. This is usually 244 combined with other iterators and not used directly. 245 246 Other Improvements to Cforall 247 ---------------------------- 248 All code snippets should be considered pseudo-code because they require some 249 features that do not currently exist. There are also some that just run into 250 current bugs. For instance, my proof of concept for ERange encountered a few 251 warnings even after I got it working. 252 253 This proposal might also benefit from associated types. Both is_iterable and 254 is_iterator may not need to be generic over all their type parameters. 255 256 Tuple enhancements might also help with iterators that want to return more 257 than one value. 84 258 85 259 Related Work 86 260 ------------ 87 Python has a robust iterator tool set. It also has a `range` built-in which 88 does many of the same things as the special for loops (the finite and 89 half-open ranges). 261 Python has a robust iterator tool set. It uses the same iterator / iterable 262 divide as described above and has the interfaces (note, Python uses duck 263 typing instead of defined interfaces, so these are "imaginary"): 264 265 class Iterable: 266 267 def __iter__(self): 268 # Return an iterator over the iterable's contents. 269 270 class Iterator(Iterable): 271 272 def __iter__(self): 273 # Iterators are iterable over themselves. 274 return self 275 276 def __next__(self): 277 # Return next value if present and advance internal state. 278 # Otherwise raise StopIteration exception. 279 280 It uses this in for loops, the for loop takes an iterable gets the iterable, 281 and iterates until the iterable is exhausted. Iterators are iterables so they 282 can be passed in directly as well. 283 284 It also has a `range` built-in which handles integer loops, although only 285 increasing half-open ranges. 90 286 91 287 In addition, it has many dedicated iterator constructors and transformers, 92 288 and many containers can both produce and be constructed from iterators. 93 289 For example, the chain function goes though a series of iterators in sequence 290 an can be used by: 291 292 import itertools 293 itertools.chain(iter_a, iter_b) 294 295 Python enumerations are implemented in the standard library and are iterables 296 over all their possible values. There is no special construct for this, in 297 Python types are object, and the type of an enum type object is EnumType 298 which supports the iterable interface. 299 300 for elem in SomeEnum: 301 print(elem) 302 303 Python References 94 304 + https://docs.python.org/3/reference/datamodel.html#object.__iter__ 95 305 + https://docs.python.org/3/library/functions.html#func-range 306 + https://docs.python.org/3/library/enum.html 96 307 97 308 C++ has many iterator tools at well, except for the fact it's "iterators" are … … 104 315 They do appear to be a wrapper around the "pointer" iterators. 105 316 317 template< class T > 318 concept range = requires( T& t ) { 319 ranges::begin(t); 320 ranges::end(t); 321 }; 322 323 C++ also has "structured bindings" which can be used to unpack an object and 324 bind its components to different names. This can be used to iterate over 325 multiple values simultaneously. In this example from the Cforall compiler 326 using a custom utility function written without special support. 327 328 for ( auto const & [index, handler] : enumerate( terminate_handlers ) ) { 329 ... 330 } 331 332 C++ References 106 333 + https://en.cppreference.com/w/cpp/ranges 334 + https://en.cppreference.com/w/cpp/language/structured_binding 107 335 108 336 Rust also has a imperative implementation of a functional style of iterators, 337 with iterator tools implemented as "Provided Methods". Provided methods 109 338 including a great number of standard transformers. Otherwise, it is very 110 339 similar to Python. 111 340 112 + https://doc.rust-lang.org/std/iter/index.html 341 pub trait Iterator { 342 type Item; 343 344 fn next(&mut self) -> Option<Self::Item>; 345 346 // Many provided methods ... 347 } 348 349 Not that Rust has many of its iterator transformers stored as methods on the 350 iterator. These methods are "provided" so you only have to define next, but 351 you can redefine the other methods to provide optimizations. 352 353 Rust also has range expressions. Based around `start .. end` and 354 `start ..= end`, with the sub-expressions giving the start and end of the 355 range being optional, except for end after `..=`. If start is omitted, the 356 range has no lower bound, otherwise the range uses start as its inclusive 357 lower bound. It is the same case for end and the upper bound, except that the 358 `..=` will have an inclusive upper bound instead of an exclusive one. 359 360 Rust achieves ordering and step control by iterator transformers. `.rev()` 361 allows you to reverse a range and `.step_by(step)` provides a positive step 362 value. 363 364 Rust Reference 365 + https://doc.rust-lang.org/std/iter/trait.Iterator.html 366 + https://doc.rust-lang.org/reference/expressions/range-expr.html 367 + https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev
Note: See TracChangeset
for help on using the changeset viewer.