source: doc/proposals/tuples.md@ 86841ee

Last change on this file since 86841ee was 7a0e8c8, checked in by Andrew Beach <ajbeach@…>, 12 months ago

Finally finished the tuple proposal. Feedback is welcome. Moved the old proposal into the thesis, it could also be deleted.

  • Property mode set to 100644
File size: 26.2 KB
Line 
1Tuples
2======
3
4This proposal is to update tuples, as they were created by Rob in his thesis,
5to address problems have appeared in their use since then. This proposal is
6an attempt to address some of those problems. Adding new functionality,
7updating existing features and removing some problematic features.
8
9The core of change is breaking the current restructurable tuples into
10unstructured tuples and structured tuples. Unstructured tuples handle most
11of the existing uses, with structured tuples filling in a few missing
12use cases.
13
14Current State of Tuples
15-----------------------
16An overview of the current design, as the starting place for the proposal.
17
18### Tuple Syntax
19
20An overview of the syntax for the main three components of tuples, the types
21of tuples, the tuple expressions/values (constructing tuples) and tuple index
22expressions (deconstructing tuples).
23
24Current syntax for tuple types:
25- Nullary: [void] or []
26- Unary: [TYPE]
27- Binary: [TYPE, TYPE]
28- The pattern continues for three or more elements.
29
30Current syntax for tuple expressions:
31- Nullary: None. (Same as `void`, use return without an expression.)
32- Unary: {EXPR} (This might be an error, but I can't make [EXPR] work.)
33- Binary: [EXPR, EXPR]
34- The pattern from binary continues for three or more elements.
35
36Current syntax for tuple index expressions, is much simpler. It uses the
37member index expression syntax, except the member name is replaced with the
38element index, which is an integer literal, showing the index of the element
39to get.
40
41Here is a brief example showing all three parts.
42
43 [char, int] tup;
44 tup = ['a', 0];
45 int value = t.1;
46
47### Mass and Multi-Assignment
48
49Two special forms of assignment can be used to set values in tuples.
50Mass Assignment assigns every element in the tuple to a single source value.
51
52 [int, long, float] dst;
53 int src = 4
54 dst = src;
55 // Expands to roughly: dst.0 = src, dst.1 = src, dst.2 = src
56
57Multi-Assignment assigns every element to the matching element of a source
58tuple (both tuples must be the same side).
59
60 [int, long, float] dst;
61 [int, long, double] src = [1, 20, 300.0];
62 dst = src;
63 // Expands to roughly: dst.0 = src.0, dst.1 = src.1, dst.2 = src.2
64 // Note that the last assignment is not an exact match,
65 // an implicit conversion will be applied.
66
67### Tuple Restructuring
68
69Tuples can be restructured as part of overload resolution. Function calls
70will unpack tuples and repack tuples to match signatures. This is a type of
71implicit conversion and is considered during overload resolution.
72
73A simple example is matching multiple parameters of a function to a single
74argument expression, each parameter is bound to a different element of the
75returned tuple.
76
77 [int, int] paramFunc();
78 void callFunc(int a, int b, int c, int d);
79
80 void restructure() {
81 callFunc(paramFunc(), paramFunc());
82 }
83 // Roughly equivilent to:
84 void restructure() {
85 [int, int] fst = paramFunc();
86 [int, int] snd = paramFunc();
87 callFunc(fst.0, fst.1, snd.0, snd.1);
88 }
89
90This is the unique feature of Cforall tuples. There are a few languages have
91multiple return values, but they are usually are a stand alone feature.
92
93### Tuple Casts
94
95C-style casts can be used on tuples. These usually conversion casts (casts
96that preform operations on the cast type, as opposed to reinterpreting the
97existing value).
98
99Tuple casts can remove elements from the end of a tuple and apply a recursive
100cast to any of the elements. As an example:
101
102 [char, char, char] x = ['a', 'b', 'c'];
103 ([int, char])x;
104
105This casts the first element type from a char to an int and drops the last
106element. The second line can be replaced the following code, that creates
107a new tuple of by casting the kept elements of the old tuple:
108
109 [(int)x.0, (char)x.1];
110
111Note, tuple casting is more restricted than the implicit tuple restructuring.
112It cannot do any restructuring beyond dropping the trailing elements of
113tuples. For example, you cannot cast `[int, [bool, char], int]` to be a
114`[int, bool, char, int]`.
115
116### Polymorphic Tuple Parameters
117
118One kind of polymorphic parameter is the tuple type parameter, previously
119noted by the `ttype` keyword and now marked by a tailing ellipses (`...`).
120Although described as a tuple type, it is usually viewed as its flattened
121sequence of types.
122
123 forall(T, Ts... | { T max(T, T); T max(Ts); })
124 T max(T arg, Ts args) {
125 return max(arg, max(args));
126 }
127
128This introduces a type name into scope. It is used as a type but because the
129tuple is flattened, the second assertion "T max(Ts);" matches types with
130multiple parameters, although it is used as a tuple function inside the
131function body. For example, in three argument case (all three of the
132same type), both assertions can match the same function. Then "Ts args"
133introduces args as a tuple, where it is past to max.
134
135Issues with the Current State
136-----------------------------
137There are a veriaty of problems with the current implementation which we
138would like to fix.
139
140### Tuples are not Objects
141
142Spoilers, this proposal actually takes them further away from being objects.
143But this illistrates why the current version is not as useful is it could be.
144
145Tuples do not have the lifetime operators (copy construction, copy assignment
146and destructor) and cannot be passed in as a bundle to polymorphic functions.
147The multi-assignment operation is not a single function and is not passed
148in as an assertion.
149
150This prevents tuples from being interwoven with regular polymorphic code.
151
152### Providing TType Arguments is Inconistent
153
154The syntax for ttype arguments is slightly inconsistent. It hasn't come up
155much yet because you do not directly provide ttype polymorphic arguments to
156functions and the are very few existing use-cases for ttype structures.
157
158Passing arguments to a function inlines the arguments into inlines the
159arguments while passing them to a polymorphic type requires them to be
160enclosed in a tuple. Compare `function(x, y, z)` with `Type(A, [B, C])`.
161
162This did not come up previously as there was little reason to explicity
163provide ttype arguments. They are implicit for functions and there is very
164little use case for a ttype on a struct or union.
165
166### Syntax Conflict
167
168The tuple syntax conflicts with designators and the new attribute syntax.
169These conflicts break C compatability goals of Cforall. Designators have had
170to their syntax change and Cforall cannot parse the new attributes.
171
172Although most of this redesign is about the semantics of tuples, but an
173update to tuple syntax that removes these conflicts that would improve the
174compatability of Cforall going forward (and also open up the new attribute
175syntax for cforall features).
176
177Change in Model
178---------------
179This proposal modifies the existing tuples to better handle its main use
180cases. There are some use cases that are no longer handled, and a new
181struct tuple is added to cover those cases.
182
183The existing tuples will be even more "unstructured" than they are now.
184Tuples are considered a sequence of types or typed entities. These packs are
185then unpacked into the surounding context. Put differently, tuples are now
186flattened as much as possible, with some places (like parameter lists) being
187treated as an implicit tuple and the tuple being flattened into that.
188
189Structred tuples are now a separate feature, a structure called "tuple".
190These are polymorphic structures, an instance should act as a structure
191except that use indices instead of field names. These structures shouldn't
192have to be used often, but fill in the use cases that unstructured tuples
193no longer support.
194
195Note that the underlying implementation might not actually look like this.
196
197Changed Features
198----------------
199Some of the concrete changes to the features of the languages.
200
201### Structured Tuple Type
202
203There are still use cases for a structured tuple type. The issues in
204supporting both were because there was one feature trying to support both
205uses. Hence, it is added back in as its own feature.
206
207There should be a standard library or built-in type named `tuple`, it doesn't
208need a special syntax to write types or instances. The type definition might
209use some primitive support, but if supported as a regular type would look
210something like this:
211
212 forall(Ts...)
213 struct tuple {
214 inline Ts all;
215 };
216
217This will be constructed the same way as most types, a list initializer with
218each tuple argument, and the lifetime functions (copy construction, copy
219assignment and destruction) work the same. Field access works two ways, the
220first is accessing the all field, effectively converting the structured
221"struct" tuple into an unstructured tuple, the other is to use tuple indexing
222directly on the structure as if it was an unstructured tuple.
223
224(If inline doesn't work, just rename all to `get`. It does make things a bit
225longer but has no change in functionality. If the all access doesn't work,
226that is a bit more of a problem, tuple slicing might provide a work around.)
227
228### Type Qualifier Distribution
229
230Because tuples are no longer object types, applying type modifiers to them,
231such as cv-qualifiers and pointers, no longer makes sense. That syntax is
232now considered distribution, applying the type modifier to each element of
233the tuple.
234
235Previously `[int, bool] &` would mean a reference to a tuple of an integer
236and a boolean. Now it would be an alias for `[int &, bool &]`, a tuple of a
237reference to an integer and a reference to a boolean. This also applies to
238polymorphic tuple type packs `Ts &` in polymorphic functions.
239
240This allows to specify restrictions on types as you would with a single
241type variable. For example, this can help replace the special cased
242tuple operations, multi-assignment (N-to-N) and mass-assignment (1-to-N).
243
244 // Multi-Assignment
245 forall(T, Us... |
246 { T & ?=?(T &, T const &); Us & ?=?(Us &, Us const &); })
247 [T, Us] & ?=?(T & dst, Us & dsts, T const & src, Us const & srcs) {
248 dst = src;
249 dsts = srcs;
250 return [dst, dsts];
251 }
252
253 // Mass-Assignment
254 forall(T, U, Vs... |
255 { U & ?=?(U &, T const &); Vs & ?=?(Vs &, T const &); })
256 [U, Vs] & ?=?(U & dst, Vs & dsts, T const & src) {
257 dst = src;
258 dsts = src;
259 return [dst, dsts];
260 }
261
262These may not work exactly as given (for one, the copy assignment assertion
263in the first function would likely be redundent/conflicting with the implicit
264assertions on the parameters), but they show the pattern. Multi-assignment
265is also would be very hard to write with simple tuple types, because the
266relationship between the two halves of the parameter list does have to line
267up, that cannot be enforced with two different tuples.
268
269### Type Packs
270
271This is not a new feature, but a reframing/extension of existing tuple tuple
272polymorphic parameters as polymorphic type packs. The `Vars...` syntax
273introduces a pack of types into scope. It can be used in many of the same
274ways as a tuple tuple, but in some new ways to.
275
276The primary existing use remains, you can use a polymorphic pack in a
277parameter list, both as part of an assertion and in the signature of the
278main function. The difference, is that this is not an enclosed tuple, but
279a series of types. The only effective difference this makes is it doesn't
280prefer to match another tuple/pack.
281
282This pattern continues to a parameter defined with a pack of types, which
283is considered a pack of parameters, and the name of it introduces is a pack
284of variables, or a flattened tuple.
285
286 forall(Params...)
287 void function(Params values);
288
289New uses cases include declarations of members and variables. For example,
290the creation the structured tuple structure:
291
292 forall(Fields...)
293 struct tuple {
294 Fields get;
295 };
296
297This is again, treated as a pack of members. They have the same layout as
298if they were written out by hand. Now, the name get is still accessed is if
299it was a regular, singular, member. The result of that expression is a
300pack expression, a tuple of all the field accesses, which can be used with a
301tuple index expression to access the underlying members. For example:
302
303 tuple(bool, char, int) data = { ... };
304 // This expression:
305 data.get.2;
306 // Is the same as (if the fields had these names):
307 data.[__anonymous0, __anonymous1, __anonymous2].2;
308
309For local declarations it works similarly, except the name introduced is
310directly usable as a tuple.
311
312 forall(Objects...)
313 void function( ??? ) {
314 ???
315 Objects objs = ???;
316 ???
317 }
318
319### Tuple Declaration/Deconstruction
320
321Declaring a tuple acts as a pack of variable declarations. When this is done
322with written out type (as opposed to a polymorphic parameter above), then the
323elements of the tuple can be named.
324
325 [int quo, int rem] ret = divmod(a, b);
326
327Here `ret` refers to the tuple, the entire pack, while `quo` and `rem`
328give explicit names to elements of the tuple. Not all the names have to be
329provided, at the very least, any element name can be omitted if the pack name
330is provided, and the pack name can be omitted if all the element names are
331provided. That would ensure every element can be accessed, but it could be
332reduced even more, effectively immediately dropping some values if there is
333no named to access it.
334
335### Tuple Casts
336
337Tuple casts are no longer allowed to do any restructuring. Internal
338restructuring would not be useful, as there is no internal structure.
339Dropping tail elements could be added back in but it is a narrow use case
340so it may be replaced with other features (for example: tuple slicing).
341
342### Forbidden Tuples
343
344The unstructured tuple cannot represent all the types that the previous
345semi-structured tuple could. These cases still exist in various ways,
346special in the internals of a polymorphic type, but general should be
347considered in their reduced form.
348
349Forbidding some of these tuples may remove ambiguity and solve some syntax
350conflicts that currently exist.
351
352Nullary, or 0 element tuples, are equivlent to void, the type that carries
353no data, because they do not either. It should either be replaced with void
354or removed entirely when it appears in a larger sequence.
355
356 // For example, this function:
357 forall(Ts...) Ts example(int first
358 // With Ts = [], should not be treated as:
359 [] example(int first, [] middle, int last);
360 // But instead:
361 void example(int first, int last);
362
363Unary, or 1 element tuples, should be the same as their element type. That
364is to say a single type in an unstructured tuple is a no-op.
365
366Lastly, nested tuples are always flattened into to form a one deep tuple.
367This means that `[bool, [char, int], float]` is resolved as
368`[bool, char, int, float]`, with the internal structure of the tuple ignored.
369
370The flatten into a large sequence rule mentioned above, is actually just an
371application of this. Unstructured tuples can already be restructured, even
372at the top level of an function call. This can be expressed by considering
373the argument list as a tuple:
374
375 call(false, ['a', -7], 3.14)
376 call([false, ['a', -7], 3.14])
377 call([false, 'a', -7, 3.14])
378 call(false, 'a', -7, 3.14)
379
380The ability to write nested tuples may remain so tuple deconstruction can be
381used to name a slice of the tuple.
382
383### Tuple Slicing (Range Tuple Indexing)
384
385(Disclaimer: this is a bit more impulsive, see end for notes.)
386
387This is an extension to tuple indexing. Currently, you may index single
388location of a tuple, extracting the element at that location. By extending
389the index expression to be a range we can grab a slice of the existing tuple
390as another smaller tuple.
391
392 [int_x, char_y] = [int_a, char_b, double_c].(0+~=1)
393
394To get the new tuple, the range has to be constructed and traversed at
395compile time. The size of the range is the size of the result tuple, with
396each element in the result tuple decided by the matching element of the
397range, which gives the index of the original tuple to place there.
398The type of the indices may be and intergal type, but all indices must be in
399range, otherwise it is a compile time error.
400
401In terms of the existing range loops, if you could write this dynamically, it
402would look something like this:
403
404 // Implementation of: output = input.RANGE
405 dyn output = []
406 for ( index : RANGE ) { output = [output, input.index] }
407
408The result of the expression is only has to be considered a tuple if the
409range has two or more values, otherwise it can be considered void or the same
410as indexing the single value in the range.
411
412Some closing notes, this is dependent on generalised range expressions.
413The iterators proposal has a section on some new range types, this feature
414is intended to be built on that feature. Not simply reuse some of the syntax
415that is also used in the special for loop. And compile-time evaluation may
416need to be improved.
417
418Implementation
419--------------
420An overview of the implementation details of the new proposal.
421
422### Structured Tuple Implementation
423Under the hood, unstructured tuples will probably just be implemented as
424structured tuples, with the restructuring code inserted wherever needed.
425In short, the base implementation should stay mostly the same.
426
427Actually inlining tuples can be done in some cases, it may even help with
428some forms like a fixed tuple decomposition. However, polymorphic tuples
429still need to be reduced to a single generic form, which would be a
430polymorphic container, a tuple.
431
432### AST Updates
433The current AST cannot represent all the new features. Particularly, an
434object declaration cannot name elements of the tuple. To this end a new
435node type, `TupleDecl`, should be added to handle tuple deconstruction
436declarations (other cases can still be handled with `ObjectDecl`).
437
438This would act much like a `FunctionDecl` except for tuples, narrowing the
439possible types, to `TupleType` instances instead of `FunctionType` instances,
440and storing some additonal information, in this case the names of the
441elements of the tuples.
442
443(I'm not actually going to decide the implementation now, but some early
444examination of the problem suggests that it might be better off wrapping a
445series of `ObjectDecl` rather than just some strings to hold the names.)
446
447### Field Packs
448Field packs in structures will probably have to be written out in full by
449the specialization pass. If they are not it could have some negative effects
450on layout, causing a structure to take up extra space. It may be able to
451reuse some of the specialization code for the existing tuples.
452
453Related Features in Other Languages
454-----------------------------------
455Other languages have similar features. Organized by the related feature.
456
457(In hindsight, I may have gone overboard with the number of examples.)
458
459### Structured Tuples
460
461There are many languages with structured tuples. Usually just called tuples,
462but they usually follow the same rules as a polymorphic anonymous structures,
463that is to say you provide N types to create a new N-arity tuple. They also
464usually have some special syntax to index the tuples, because there are no
465field names to use.
466
467#### Rust
468Rust has the standard tuples as a primitive in the language. Each arity of
469tuple is a different polymorphic type.
470
471Tuple types and expressions are written with a parenthesized, comma seperated
472list of types or expressions. To avoid confusion with the grouping "(...)",
473one element tuples must have a trailing comma (and larger tuples may have a
474trailing comma).
475
476 const empty: () = ();
477 const single: (usize,) = (12,)
478 const double: (usize, isize) = (34, -56);
479
480Element access uses similar syntax to field access (that is "."), but uses an
481interger literal instead of a field name (for example: "pair.1"). Tuples
482support pattern matching with similar syntax to their expression form.
483
484Some tuple features only apply up to 12-arity tuples.
485It is not directly stated, but I believe the difference is the more limited
486features are implemented in the standard library, for each arity of tuple.
487
488- https://doc.rust-lang.org/std/primitive.tuple.html
489- https://doc.rust-lang.org/reference/types/tuple.html
490- https://doc.rust-lang.org/reference/expressions/tuple-expr.html
491
492#### C++
493Implemented as a template type defined in the standard library. No special
494language features exist to support this, due to the power of C++'s template
495meta-programming.
496
497C++ is also one of the few languages with support for varadic polymorphic
498types, so there is one template that defines all the types. It is written
499as `std::tuple<TYPE...>`, where "TYPE..." is any number of comma separated
500types. This is the standard notation for template type instances.
501
502There is no special syntax for member access of a tuple. You have to use a
503template function for example `std::get<0>( tuple )`.
504
505C++ also has structured binding, a kind of limited pattern matching. In a
506structured binding declaration, you can write an auto typed declaration with
507a list of identifiers in a `[]` list.
508For example, `auto [first, second] = getSize2Tuple();`.
509
510- https://en.cppreference.com/w/cpp/utility/tuple
511- https://en.cppreference.com/w/cpp/language/structured_binding
512
513#### Haskell
514Haskell has a special syntax for tuples, but otherwise they are completely
515normal polymorphic types. Because of this, tuples have a maximum size.
516Haskell (98) supports tuples of up to 15 elements and the standard library
517has functions for tuples of up to 7 elements.
518
519The syntax for tuples is a comma seperated list of elements. Either element
520types to create a tuple type, or element values to create a tuple value. The
521context decides which one we are looking for. Such as `(6, "six")` or
522`(Bool, Char, Int)`.
523
524You can also remove all the elements, getting an expression like "(,)" or
525"(,,,)", which can be then be used as a function, for a type or expression.
526
527Haskell supports pattern matching as the main way to extract values from a
528tuple, although helper functions "fst" and "snd" are provided for field
529access on two element tuples.
530
531Haskell does not have 0 or 1-element tuples. The nil type, written "()" for
532both type and value, is effectively the 0 element tuple. There is also a type
533called "Solo" that is a polymorphic structure with one field and is used as
534the 1-element tuple, but has regular Haskell data type syntax.
535
536- https://www.haskell.org/onlinereport/basic.html
537
538#### OCaml
539OCaml only supports multi-element (2 or more) tuples. Tuple types are written
540as a '*' separated list of types, tuple values are written as ',' separated
541list of expressions. Pattern matching tuples is supported and uses the same
542syntax as values. Parenthesizing these lists is only needed for grouping.
543
544OCaml does not support 0 or 1 element tuples. It does however have the `unit`
545type which has one value written `()`.
546
547- https://ocaml.org/docs/basic-data-types#tuples
548
549#### Swift
550Swift has tuple types that use the basic parenthesized list of comma
551separated list of types or values. It only supports 0 and 2 or more element
552tuples (the `Void` type is an alias for the empty tuple type).
553
554Swift also supports named tuples. Names can be added to before the tuple
555element, both for the tuple type and value. The syntax is a name followed by
556a colon, for example `(first: int, second: int)`. These names are a fixed
557part of the type, and can be used as part of field access notation (otherwise
558numbers are used in-place of field names `tuple.0` vs. `tuple.first`).
559
560- https://docs.swift.org/swift-book/documentation/the-swift-programming-language/types/#Tuple-Type
561
562#### Python
563In Python tuples are immutable lists. Because they are dynamically typed
564there is only one tuple type `tuple`.
565
566It also has various named tuples. The first, namedtuple, just allows you to
567add a name the elements of the tuple. The second, NamedTuple, is actually a
568way of creating a typed record in a normally untyped language.
569
570- https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
571- https://docs.python.org/3/library/collections.html#collections.namedtuple
572- https://docs.python.org/3/library/typing.html#typing.NamedTuple
573
574#### LISP
575As LISP is dynamically typed, its cons data type is an untyped pair and is
576(perhaps infamously) the main constructor of all compound data types.
577The function `cons` takes two arguments and builds a pair. Functions `car`
578and `cdr` get the first and second elements of the pair.
579
580### Packs
581
582Packs (or unstructures tuples) are a much less common feature. In fact, there
583might just be one language, C++, that supports packs. The most common use
584case for unstructured tuples is returning multiple values, so there is a
585comparison to languages that have that as a special feature.
586
587#### Go
588Go does not have built in tuple types, but it has multi-return syntax that
589looks like the tuple syntax of many other languages.
590
591```
592func main() {
593 i, j := returnIntInt()
594 ...
595}
596
597func returnIntInt() (int, int) {
598 return 12, 34
599}
600```
601
602- https://golangdocs.com/functions-in-golang
603- https://go.dev/src/go/types/tuple.go
604
605#### Lua
606Lua is a scripting languague, is dynamically typed and stack based. Although
607the stack usually only directly visible in the C-API, it does allow any
608function to return any number of values. Even in a single return, if the
609return expression
610
611```
612local funcion f()
613 return 12, 34
614end
615
616local i, j = f()
617```
618
619#### C++
620C++ templates can take various types of parameters, including parameter
621packs. These contain series of values. These are used in pack expansion,
622which usually expand to a comma separated list, but it can also be a chain of
623boolean binary operator applications. For example, if the parameter
624`typename... Ts` is in scope, then you can declare `std::tuple<Ts...>` to
625introduce a tuple type, if Ts = bool, char, int then the type becomes
626`std::tuple<bool, char, int>`.
627
628A common use is for perfect argument forwarding, which shows some different
629uses of the pattern:
630
631```
632template<typename inner_t>
633class Outer {
634 inner_t inner;
635public:
636 template<typename... Args>
637 Outer(Args&&... args) : inner(std::foward<Args>(args)...) {}
638};
639```
640
641In the first application, `Args&&... args` both uses a pack and introduces
642another one. `Arg0&& arg0, Arg1&& arg1, Arg2&& arg2, ...` is the pattern
643it explands too. The `&&` is used to copy the argument type's reference
644qualifier (the default can strip references away).
645
646The second application, `std::forward<Args>(args)...` uses two packs. These
647are expanded in parellel, and must be the same length (in this case they
648always will be). It also helps show that the `...` is actually the bit doing
649the expansion, it is a suffix "operator" to the expansion pattern.
650
651There are also fold expressions that use binary operators to combine a pack
652into a single expression. For example, `args + ... + 0` which adds every
653element of the `args` pack together.
654
655C++ about the best you could ask for in this area, but it does a lot of work
656at compile time to make this happen.
657
658- https://en.cppreference.com/w/cpp/language/template_parameters
659- https://en.cppreference.com/w/cpp/language/parameter_pack
660- https://en.cppreference.com/w/cpp/language/fold
Note: See TracBrowser for help on using the repository browser.