Tuples ====== Tuples were introduced by Dave Till in K-W C, added to CFA by Rodolfo Esteves, and extended by Robert Schluntz. This proposal discusses updates for CFA tuples to address problems that have appeared in their usage over the past 6 years. The proposal attempts to address problems by adding new functionality, updating existing features, and removing some problematic ones. The core change is breaking the current restructurable tuples into unstructured and structured tuples. Unstructured tuples handle most of the existing uses, with structured tuples filling in a few missing use cases. Current State of Tuples ----------------------- An overview of the current tuples design is the starting place for the proposal. ### Tuple Syntax Currently, tuples have three main components: tuple types, tuple expressions/values (constructing tuples), and tuple index expressions (deconstructing tuples). Current syntax for tuple types. - Nullary: [void] or [] - Unary: [TYPE] - Nary: [TYPE, TYPE, ...] Tuple types can appear in a function return and parameter declaration, or a tuple variable declaration. Note, the Nullary tuple is only meaningful in the return context, void f(...); // equivalent [void] f(...); [] f(...); as C does not support a void declaration. int f( void x ); // disallowed int f( [void] x ); int f( [] x ); Current syntax for tuple expressions: - Nullary: (Same as `void`, use return without an expression.) [] f( ) { return; } [] f( ) { return [] ; } - Unary: [int] f( ) { return 3; } [int] f( ) { return [3]; } - Nary: [EXPR, EXPR] [int,int] f( ) { return [3,4]; } Currently, there is a parsing problem for nullary and unary tuple expression, which is being looked at. Hence, only these two forms work. [] f( ) { return; } // nullary [int] f( ) { return 3; } // unary Current syntax for tuple indexing is an integer constant, where its value selects a tuple member, e.g.: [char, int] tup; tup = ['a', 0]; char ch = t.0; // select first tuple member int value = t.1; // select second tuple member ### Mass and Multi-Assignment Two special forms of assignment can be used to set values in tuples: mass and multi. Mass assignment assigns every element in the destination tuple to a single source value. [int, long, float] dst; int src = 4 dst = src; // => dst.0 = src; dst.1 = src; dst.2 = src Multi-assignment assigns every element in the destination tuple to the corresponding element in the source tuple. Both tuples must be the same size and the elements assignment compatible => conversions. [long, int, float] dst; [int, char, double] src = [1, 'a', 300.0]; dst = src; // => dst.0 = src.0; dst.1 = src.1; dst.2 = src.1 ### Tuple Restructuring Tuples can be restructured as part of overload resolution. Function calls unpack tuples and repack tuples to match signatures. This semantics is a form of implicit conversion and is considered during overload resolution. A simple example is matching multiple parameters of a function to a single argument expression, where each parameter is bound to a different element of the returned tuple. [int, int] argFunc(); void parmFunc(int a, int b, int c, int d); parmFunc(argFunc(), argFunc()); // Roughly equivilent to: [int, int] fst = argFunc(); [int, int] snd = argFunc(); parmFunc(fst.0, fst.1, snd.0, snd.1); There are few languages supporting multiple return-values as a standalone feature (SETL). Go has multiple return-values but restricts their use in matching arguments to parameters. func argFunc() (int, int) { return 3, 7 } func parmFunc( a int, b int ) { fmt.Println(a, b ) } func main() { parmFunc2( argFunc() ); // arguments must match exactly with parameters } ### Tuple Casts C-style casts can be used on tuples. These are usually conversion casts (casts that perform operations on the cast type, as opposed to reinterpreting the existing value). Tuple casts can remove elements from the end of a tuple and apply a recursive cast to any of the elements. As an example: [char, char, char] x = ['a', 'b', 'c']; ([int, char])x; This casts the first element type of x from a char to an int and drops the last element. The second line can be replaced with the following code, which creates a new tuple by directly casting the kept elements of the old tuple: [(int)x.0, (char)x.1]; Note, tuple casting is more restricted than the implicit tuple restructuring. It cannot do any restructuring beyond dropping the trailing elements of tuples. For example, int i; char c; bool b; [i, b, c, i] = [i, [b, c], i]; [int, [bool, char], int] w; [i, b, c, i] = w; [i, b, c, i] = ([int, bool, char, int])w; // fails [i, b, c] = ([int, [bool, char]])w; // works ### Polymorphic Tuple Parameters One kind of polymorphic parameter is the tuple type parameter, previously noted by the `ttype` keyword and now marked by a tailing ellipses (`...`). Although described as a tuple type, it is usually viewed as its flattened sequence of types. forall( T | { int ?>?( T, T ); } ) T max( T v1, T v2 ) { return v1 > v2 ? v1 : v2; } // base case forall(T, Ts... | { T max(T, T); T max(Ts); }) // recursive case T max(T arg, Ts args) { return max(arg, max(args)); } This feature introduces a type name into scope (Ts). It is used as a type but because the tuple is flattened, the second assertion "T max(Ts);" matches types with multiple parameters (the `...`), although it is used as a tuple function inside the function body (max(args)). The first non-recursive max function is the polymorphic base-case for the recursion, i.e., find the maximum of two identically typed values with a greater-than (>) operator. The second recursive max function takes two parameters, a T and a Ts tuple, handling all argument lengths greater than two. The recursive function computes the maximum for the first argument and the maximum value of the rest of the tuple. The call of max with one argument is the recursive call, where the tuple is converted into two arguments by taking the first value (lisp car) from the tuple pack as the first argument (flattening) and the remaining pack becomes the second argument (lisp cdr). The recursion stops when the tuple is empty. For example, max( 2, 3, 4 ) matches with the recursive function, which performs return max( 2, max( [3, 4] ) ) and one more step yields return max( 2, max( 3, 4 ) ), so the tuple is empty. Issues with the Current State ----------------------------- There are a variety of problems with the current implementation which need to be fixed. ### Tuples are not Objects Spoilers: this proposal actually takes them even further away from being objects, but illustrates why the current version is not as useful is it could be. Because of the fluid nature of tuples (flattening/structuring), routines like constructions, destructor, or assignment do not make sense, e.g. this constructor matches multiple a tuple types: void ?{} ( [int, int, int] ? this ); [int, [int, int]] x; // all match constructor type by flattening [int, int, int] y; [[int, int], int] z; as could a similarly typed destructor or assignment operator. This prevents tuples from being interwoven with regular polymorphic code. ### Providing TType Arguments is Inconsistent The syntax for ttype arguments is slightly inconsistent. It has not come up much yet, because you do not directly provide ttype polymorphic arguments to functions and there are very few existing use-cases for ttype structures. Passing arguments to a function inlines the arguments while passing them to a polymorphic type requires them to be enclosed in a tuple. Compare `function(x, y, z)` with `Type(A, [B, C])`. This did not come up previously as there was little reason to explicitly provide ttype arguments. They are implicit for functions and there is very little use case for a ttype on a struct or union. ### Syntax Conflict The tuple syntax conflicts with designators and the new C++-style attribute syntax. struct S { int a[10]; } = { [2] = 3 }; // [2] looks like a tuple [[ unused ]] [[3, 4]]; // look ahead problem These conflicts break C compatibility goals of Cforall. Designators had to their syntax change and Cforall cannot parse the new attributes. Although most of this redesign is about the semantics of tuples, but an update to tuple syntax that removes these conflicts would improve the compatibility of Cforall going forward (and also open up the new attribute syntax for cforall features). Change in Model --------------- This proposal modifies the existing tuples to better handle its main use cases. There are some use cases that are no longer handled, and a new struct tuple is added to cover those cases. The new tuples is even more "unstructured" than before. New tuples are considered a sequence of types or typed entities. These packs are then unpacked into the surrounding context. Put differently, tuples are now flattened as much as possible, with some places (like parameter lists) being treated as an implicit tuple and the tuple being flattened into that. Structured tuples are now a separate feature: a structure called "tuple". These are polymorphic structures; an instance should act as a structure, except its fields are accessed using indices instead of field names. Experience so far is that structured tuples are not used often, but fill in the use cases that unstructured tuples no longer support. Note that the underlying implementation might not actually look like this. Changed Features ---------------- Some of the concrete changes to the features of the language. ### Structured Tuple Type There is a standard library or built-in type named `tuple`, it does not need a special syntax to write types or instances. The type definition might need some primitive support, but if supported as a regular type would look something like this: forall(Ts...) struct tuple { inline Ts all; // inline all specified fields }; This type is constructed the same way as most types, a list initializer with each tuple argument, and the lifetime functions (construction, assignment and destruction) work the same. Field access works two ways, the first is accessing the `all` field, effectively converting the structured tuple into an unstructured tuple, the other is to use tuple indexing directly on the structure as if it is an unstructured tuple. (If `inline` does not work, just rename all to `get`. It does make things a bit longer but has no change in functionality. If the `all` access does not work, that is more problematic, and tuple slicing might provide a work around.) ### Type Qualifier Distribution Because tuples are no longer object types, applying type modifiers to them, such as cv-qualifiers and pointers, no longer makes sense. That syntax is now considered distribution, applying the type modifier to each element of the tuple. Previously `const [int, bool] &` would mean a const reference to a tuple of an integer and a boolean. Now it means an alias for `[const int &, const bool &]`, a tuple of a reference to a constant integer and a reference to a constant boolean. This also applies to polymorphic tuple type packs `Ts &` in polymorphic functions. This new approach can specify restrictions on tuple variables as for a single type variable. For example, this approach can replace the special cased tuple operations multi-assignment (N-to-N) and mass-assignment (1-to-N). // Multi-Assignment forall(T, Us... | { T & ?=?(T &, T const &); Us & ?=?(Us &, Us const &); }) [T, Us] & ?=?(T & dst, Us & dsts, T const & src, Us const & srcs) { dst = src; dsts = srcs; return [dst, dsts]; } // Mass-Assignment forall(T, U, Vs... | { U & ?=?(U &, T const &); Vs & ?=?(Vs &, T const &); }) [U, Vs] & ?=?(U & dst, Vs & dsts, T const & src) { dst = src; dsts = src; return [dst, dsts]; } These may not work exactly as given (for one, the copy assignment assertion in the first function would likely be redundant/conflicting with the implicit assertions on the parameters), but they show the pattern. Multi-assignment also would be very hard to write with simple tuple types, because the relationship between the two halves of the parameter list does have to line up, that cannot be enforced with two different tuples. ### Type Packs This approach is not a new feature, but a reframing/extension of existing tuple tuple polymorphic parameters as polymorphic type packs. The old `Vars...` syntax introduces a pack of types into scope. It can be used in much the same way as a tuple, but in some new ways to. The primary existing use remains: to use a polymorphic pack in a parameter list, both as part of an assertion and in the signature of the main function. The difference is that this is not an enclosed tuple, but a series of types. The only effective difference this makes is it does not prefer to match another tuple/pack. This pattern continues to a parameter defined with a pack of types, which is considered a pack of parameters, and the name it introduces is a pack of variables, or a flattened tuple. forall(Params...) void function(Params values); New use cases include declarations of members and variables. For example, the creation of a structured tuple structure: forall(Fields...) struct tuple { Fields get; }; This is again, treated as a pack of members. They have the same layout as if they were written out by hand. Now, the name get is still accessed as if it was a regular, singular, member. The result of that expression is a pack expression, a tuple of all the field accesses, which can be used with a tuple index expression to access the underlying members. For example: tuple(bool, char, int) data = { ... }; // This expression: data.get.2; // Is the same as (if the fields had these names): data.[__anonymous0, __anonymous1, __anonymous2].2; For local declarations it works similarly, except the name introduced is directly usable as a tuple. forall(Objects...) void function( ??? ) { ??? Objects objs = ???; ??? } ### Tuple Declaration/Deconstruction Declaring a tuple acts as a pack of variable declarations. When this is done with a written out type (as opposed to a polymorphic parameter above), then the elements of the tuple can be named. [int quo, int rem] ret = divmod(a, b); Here `ret` refers to the tuple, the entire pack, while `quo` and `rem` give explicit names to elements of the tuple. Not all the names have to be provided, at the very least, any element name can be omitted if the pack name is provided, and the pack name can be omitted if all the element names are provided. That would ensure every element can be accessed, but it could be reduced even more, effectively immediately dropping some values if there is no named to access it. PAB: I do understand the point of this. ### Tuple Casts Tuple casts are no longer allowed to do any restructuring. Internal restructuring would not be useful, as there is no internal structure. Dropping tail elements could be added back in but it is a narrow use case so it may be replaced with other features (for example: tuple slicing). ### Forbidden Tuples The unstructured tuple cannot represent all the types that the previous semi-structured tuple could. These cases still exist in various ways, specifically in the internals of a polymorphic type, but in general should be considered in their reduced form. Forbidding some of these tuples may remove ambiguity and solve some syntax conflicts that currently exist. Nullary, or 0 element tuples, are equivalent to void, the type that carries no data, because they do not either. It should either be replaced with void or removed entirely when it appears in a larger sequence. // For example, this function: forall(Ts...) Ts example(int first, ????) // With Ts = [], should not be treated as: [] example(int first, [] middle, int last); // But instead: void example(int first, int last); Unary, or 1 element tuples, should be the same as their element type. That is to say a single type in an unstructured tuple is a no-op. Lastly, nested tuples are always flattened to a one-depth tuple. This means that `[bool, [char, int], float]` is resolved as `[bool, char, int, float]`, with the internal structure of the tuple ignored. The flatten into a large sequence rule mentioned above is actually just an application of this. Unstructured tuples can already be restructured, even at the top level of an function call. This can be expressed by considering the argument list as a tuple: call(false, ['a', -7], 3.14) call([false, ['a', -7], 3.14]) call([false, 'a', -7, 3.14]) call(false, 'a', -7, 3.14) The ability to write nested tuples may remain so tuple deconstruction can be used to name a slice of the tuple. ### Tuple Slicing (Range Tuple Indexing) (Disclaimer: this is a bit more impulsive, see end for notes.) This is an extension to tuple indexing. Currently, only single location of a tuple can be index, extracting the element at that location. By extending the index expression to be a list-range a multiple sub-tuples can be extracted. [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].0~2,5~9~2 produces the tuple [0, 1, 2, 5, 7, 9] (Note the position and values are the same to simplify the example.) That is, index the first 3 tuple elements, and then indexes elements 5 to 9 in steps of 2. Not all selections are possible with this mechanism (e.g., index the Fibonacci elements), but it covers many cases. To get the new tuple, the range has to be constructed and traversed at compile time. The size of the range is the size of the result tuple, with each element in the result tuple decided by the matching element of the range, which gives the index of the original tuple to place there. The type of the indices may be an integral type, but all indices must be in range, otherwise it is a compile time error. In terms of the existing range loops, if you could write this dynamically, it would look something like this: // Implementation of: output = input.RANGE dyn output = [] for ( index : RANGE ) { output = [output, input.index] } The result of the expression is only has to be considered a tuple if the range has two or more values, otherwise it can be considered void or the same as indexing the single value in the range. Some closing notes, this is dependent on generalized range expressions. The iterators proposal has a section on some new range types, this feature is intended to be built on that feature. Not simply reuse some of the syntax that is also used in the special for loop. And compile-time evaluation may need to be improved. PAB, I don't understand this last part as the index range is compile time not runtime. Implementation -------------- An overview of the implementation details of the new proposal. ### Structured Tuple Implementation Under the hood, unstructured tuples are implemented as structured tuples, with the restructuring code inserted wherever needed. In short, the base implementation should stay mostly the same. PAB: The current implementation does not use convert unstructured tuples to structured tuples. Look at the code generated for int x, y; [x, y] = 3; [x, y] = [y, x]; Actually inlining tuples can be done in some cases, it may even help with some forms like a fixed tuple decomposition. However, polymorphic tuples still need to be reduced to a single generic form, which would be a polymorphic container, a tuple. ### AST Updates The current AST cannot represent all the new features. Particularly, an object declaration cannot name elements of the tuple. To this end a new node type, `TupleDecl`, should be added to handle tuple deconstruction declarations (other cases can still be handled with `ObjectDecl`). This would act much like a `FunctionDecl` except for tuples, narrowing the possible types, to `TupleType` instances instead of `FunctionType` instances, and storing some additional information. In this case, the names of the elements of the tuples. PAB: the following parses: [int x, int y] foo( int p ); and discussed by Till. (I'm not actually going to decide the implementation now, but some early examination of the problem suggests that it might be better off wrapping a series of `ObjectDecl` rather than just some strings to hold the names.) ### Field Packs Field packs in structures probably have to be written out in full by the specialization pass. If not, it could have some negative effects on layout, causing a structure to take up extra space. It may be able to reuse some of the specialization code for the existing tuples. Related Features in Other Languages ----------------------------------- Other languages have similar features. Organized by the related feature. (In hindsight, I may have gone overboard with the number of examples.) ### Structured Tuples There are many languages with structured tuples. Usually just called tuples, but they usually follow the same rules as a polymorphic anonymous structures, that is to say N types are provided to create a new N-arity tuple. They also usually have some special syntax to index the tuples, because there are no field names. #### Rust Rust has the standard tuples as a primitive in the language. Each arity of tuple is a different polymorphic type. Tuple types and expressions are written with a parenthesized, comma separated list of types or expressions. To avoid confusion with the grouping "(...)", one-element tuples must have a trailing comma (and larger tuples may have a trailing comma). const empty: () = (); const single: (usize,) = (12,) const double: (usize, isize) = (34, -56); Element access uses similar syntax to field access (that is "."), but uses an integer literal instead of a field name (for example: "pair.1"). Tuples support pattern matching with similar syntax to their expression form. Some tuple features only apply up to 12-arity tuples. It is not directly stated, but I believe the difference in the more limited features are implemented in the standard library, one for each arity of tuple. - https://doc.rust-lang.org/std/primitive.tuple.html - https://doc.rust-lang.org/reference/types/tuple.html - https://doc.rust-lang.org/reference/expressions/tuple-expr.html #### C++ tuple Implemented as a template type defined in the standard library. No special language features exist to support this, due to the power of C++'s template meta-programming. C++ is also one of the few languages with support for variadic polymorphic types, so there is one template that defines all the types. It is written as `std::tuple`, where "TYPE..." is any number of comma separated types. This is the standard notation for template type instances. There is no special syntax for member access of a tuple. A template function is used, e.g., `std::get<0>( tuple )`. C++ also has structured binding, a kind of limited pattern matching. In a structured binding declaration, you can write an auto typed declaration with a list of identifiers in a `[]` list. For example, `auto [first, second] = getSize2Tuple();`. - https://en.cppreference.com/w/cpp/utility/tuple - https://en.cppreference.com/w/cpp/language/structured_binding PAB: I do not understand the syntax `auto [first, second]`. Where does it come from? #### C++ template C++ templates can take various types of parameters, including parameter packs. These contain series of values. These are used in pack expansion, which usually expand to a comma separated list, but it can also be a chain of boolean binary operator applications. For example, if the parameter `typename... Ts` is in scope, then you can declare `std::tuple` to introduce a tuple type, if Ts = bool, char, int then the type becomes `std::tuple`. A common use is for perfect argument forwarding, which shows some different uses of the pattern: ``` template class Outer { inner_t inner; public: template Outer(Args&&... args) : inner(std::forward(args)...) {} }; ``` In the first application, `Args&&... args` both uses a pack and introduces another one. `Arg0&& arg0, Arg1&& arg1, Arg2&& arg2, ...` is the pattern it expands too. The `&&` is used to copy the argument type's reference qualifier (the default can strip references away). The second application, `std::forward(args)...` uses two packs. These are expanded in parallel, and must be the same length (in this case they always will be). It also helps show that the `...` is actually the bit doing the expansion, it is a suffix "operator" to the expansion pattern. There are also fold expressions that use binary operators to combine a pack into a single expression. For example, `args + ... + 0` which adds every element of the `args` pack together. C++ is about the best you could ask for in this area, but it does a lot of work at compile time to make this happen. - https://en.cppreference.com/w/cpp/language/template_parameters - https://en.cppreference.com/w/cpp/language/parameter_pack - https://en.cppreference.com/w/cpp/language/fold #### Haskell Haskell has a special syntax for tuples, but otherwise they are completely normal polymorphic types. Because of this, tuples have a maximum size. Haskell (98) supports tuples of up to 15 elements and the standard library has functions for tuples of up to 7 elements. The syntax for tuples is a comma separated list of elements. Either element types to create a tuple type, or element values to create a tuple value. The context decides among them, such as `(6, "six")` or `(Bool, Char, Int)`. Also all the elements can be removed, getting an expression like "(,)" or "(,,,)", which can be then be used as a function, for a type, or an expression. Haskell supports pattern matching as the main way to extract values from a tuple, although helper functions "fst" and "snd" are provided for field access on two element tuples. Haskell does not have 0 or 1-element tuples. The nil type, written "()" for both type and value, is effectively the 0 element tuple. There is also a type called "Solo" that is a polymorphic structure with one field and is used as the 1-element tuple, but has regular Haskell data-type syntax. - https://www.haskell.org/onlinereport/basic.html #### OCaml OCaml only supports multi-element (2 or more) tuples. It does have the `unit` type, which has one value, written `()`. Tuple types are written as a '*' separated list of types, tuple values are written as ',' separated list of expressions. Pattern matching tuples is supported and uses the same syntax as values. Parenthesizing these lists is only needed for grouping. - https://ocaml.org/docs/basic-data-types#tuples #### Swift Swift has tuple types that use the basic parenthesized, comma separated list of types or values. It only supports 0 and 2 or more element tuples (the `Void` type is an alias for the empty tuple type). Swift also supports named tuples. Names can be added before the tuple element, both for the tuple type and value. The syntax is a name followed by a colon, e.g., `(first: int, second: int)`. These names are a fixed part of the type, and can be used as part of field access notation (otherwise numbers are used in-place of field names `tuple.0` vs. `tuple.first`). - https://docs.swift.org/swift-book/documentation/the-swift-programming-language/types/#Tuple-Type #### Python In Python tuples are immutable lists. Because they are dynamically typed, there is only one tuple type `tuple`. It also has various named tuples. The first, namedtuple, allows naming the elements of the tuple. The second, NamedTuple, is actually a way of creating a typed record in a normally untyped language. - https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range - https://docs.python.org/3/library/collections.html#collections.namedtuple - https://docs.python.org/3/library/typing.html#typing.NamedTuple #### LISP As LISP is dynamically typed, its `cons` data type is an untyped pair and is (perhaps infamously) the main constructor of all compound data types. The function `cons` takes two arguments and builds a pair. Functions `car` and `cdr` get the first and second elements of the pair. ### Packs Packs (or unstructured tuples) are a much less common feature. In fact, there might just be one language, C++, that supports packs. The most common use case for unstructured tuples is returning multiple values, so there is a comparison to languages that have that as a special feature. #### Go Go does not have built in tuple types, but it has multi-return syntax that looks like the tuple syntax of many other languages. ``` func main() { i, j := returnIntInt() ... } func returnIntInt() (int, int) { return 12, 34 } ``` - https://golangdocs.com/functions-in-golang - https://go.dev/src/go/types/tuple.go #### Lua Lua is a scripting language that is dynamically typed and stack based. Although the stack is usually only directly visible in the C-API, it does allow any function to return any number of values, even a single return, in the return expression ``` local funcion f() return 12, 34 end local i, j = f() ```