source: doc/proposals/tuples/tuples.tex @ d67cdb7

Last change on this file since d67cdb7 was 2298a7b8, checked in by Rob Schluntz <rschlunt@…>, 8 years ago

added ctor and tuple design docs

  • Property mode set to 100644
File size: 22.5 KB
1% inline code ©...© (copyright symbol) emacs: C-q M-)
2% red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
3% blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
4% green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
5% LaTex escape §...§ (section symbol) emacs: C-q M-'
6% keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
7% math escape $...$ (dollar symbol)
13% Latex packages used in the document (copied from CFA user manual).
14\usepackage[T1]{fontenc}                                % allow Latin1 (extended ASCII) characters
19\usepackage{upquote}                                                                    % switch curled `'" to straight
23\usepackage{varioref}                                                                   % extended references
24\usepackage{listings}                                                                   % format program code
25\usepackage[flushmargin]{footmisc}                                              % support label/reference in footnote
26\usepackage{latexsym}                                   % \Box glyph
27\usepackage{mathptmx}                                   % better math font with "times"
31\input{common}                                          % bespoke macros used in the document
36\setlength{\topmargin}{-0.45in}                                                 % move running title into header
46\Huge \vspace*{1in} Tuple Design in \CFA \\
51\huge Rob Schluntz \\
52\Large \vspace*{0.1in} \texttt{} \\
53\Large Cheriton School of Computer Science \\
54\Large University of Waterloo
63\newcommand{\bigO}[1]{O\!\left( #1 \right)}
67% changed after setting pagestyle
68\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
69\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
71% \linenumbers                                            % comment out to turn off line numbering
87This document describes my understanding of the existing tuple design~\cite{Buhr94a,Till89}, mixed with my thoughts on improvements after various discussions with Peter, Aaron, and Thierry.
89\section{Tuple Expressions}
90A tuple expression is an expression which produces a fixed-size, ordered list of values of heterogeneous types.
91The type of a tuple expression is the tuple of the subexpression types.
92In Cforall, a tuple expression is denoted by a list of expressions enclosed in square brackets.
94For example, the expression ©[5, 'x', 10.5]© has type ©[int, char, double]©.
95Tuples are a compile-time phenomenon and have little to no run-time presence.
97\subsection{Tuple Variables}
98It is possible to have variables of tuple type, pointer to tuple type, and array of tuple type.
99Tuple types can be composed of any types, except for array types \footnote{I did not see this issue mentioned at all in the original design. A tuple containing an array type seems to make sense up until you try to use a tuple containing an array type as a function parameter. At this point you lose information about the size of the array, which makes tuple assignment difficult. Rather than allowing arrays in most situations and disallowing only as function parameters, it seems like it would be better to be consistent across the board.}.
102[double, int] di;
103[double, int] * pdi
104[double, int] adi[10];
106The program above declares a variable of type ©[double, int]©, a variable of type pointer to ©[double, int]©, and an array of ten ©[double, int]©.
108\subsection{Flattening and Structuring}
109In Cforall, tuples do not have a rigid structure.
110In function call contexts, tuples support implicit flattening and restructuring \footnote{In the original tuple design, four tuple coercions were described: opening, closing, flattening, and structuring. I've combined flattening with opening and structuring with closing in my description, as the distinctions do not seem useful in Cforall since these coercions happen only as arguments to function calls, and I believe all of the semantics are properly covered by the simplified descriptions.}. Tuple flattening recursively expands a tuple into the list of its basic components. Tuple structuring packages a list of expressions into a value of tuple type.
113int f(int, int);
114int g([int, int]);
115int h(int, [int, int]);
116[int, int] x;
117int y;
120g(y, 10);
121h(x, y);
123In Cforall, each of these calls is valid.
124In the call to ©f©, ©x© is implicitly flattened so that the components of ©x© are passed as the two arguments to ©f©.
125For the call to ©g©, the values ©y© and ©10© are structured into a single argument of type ©[int, int]© to match the type of the parameter of ©g©.
126Finally, in the call to ©h©, ©y© is flattened to yield an argument list of length 3, of which the first component of ©x© is passed as the first parameter of ©h©, and the second component of ©x© and ©y© are structured into the second argument of type ©[int, int]©.
129\subsection{Argument Passing}
130In resolving a function call, all of the arguments to the call are flattened.
131While determining if a particular function/argument-list combination is valid, the arguments are structured to match the shape of each formal parameter, in order.
133For example, given a function declaration ©[int] f(int, [double, int])©, the call ©f([5, 10.2], 0)© can be satisfied by first flattening the tuple to yield the expression ©f(5, 10.2, 0)© and then structuring the argument list to match the formal parameter list structure as ©f(5, [10.2, 0])©.
135\subsection{Multiple-Return-Value Functions}
136Functions can be declared to return more than one value.
137Multiple return values are packaged into a tuple value when the function returns.
138A multiple-returning function with return type ©T© can return any expression which is implicitly convertible to ©T©.
140\subsection{Tuple Assignment}
141An assignment where the left side of the assignment operator has a tuple type is called tuple assignment.
142There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called Multiple Assignment and Mass Assignment respectively.
143Let $L_i$ for $i$ in $[0, n)$ represent each component of the flattened left side, $R_i$ represent each component of the flattened right side of a multiple assignment, and $R$ represent the right side of a mass assignment.
145For a multiple assignment to be valid, both tuples must have the same number of elements when flattened. Multiple assignment assigns $R_i$ to $L_i$ for each $i$.
146That is, ©?=?(&$L_i$, $R_i$)© must be a well-typed expression.
148Mass assignment assigns the value $R$ to each $L_i$. For a mass assignment to be valid, ©?=?(&$L_i$, $R$)© must be a well-typed expression.
149This differs from C cascading assignment (e.g. ©a=b=c©) in that conversions are applied to $R$ in each individual assignment, which prevents data loss from the chain of conversions that can happen during a cascading assignment.
151Both kinds of tuple assignment have parallel semantics, such that each value on the left side and right side is evaluated \emph{before} any assignments occur.
152Tuple assignment is an expression where the result type is the type of the left side of the assignment, as in normal assignment (i.e. the tuple of the types of the left side expressions) \footnote{This is a change from the original tuple design, wherein tuple assignment was a statement. This decision appears to have been made in an attempt to fix what was seen as a problem with assignment, but at the same time this doesn't seem to fit C or Cforall very well. In another language, tuple assignment as a statement could be reasonable, but I don't see a good justification for making this the only kind of assignment that isn't an expression. In this case, I would value consistency over idealism}.
153These semantics allow cascading tuple assignment to work out naturally in any context where a tuple is permitted.
155The following example shows multiple, mass, and cascading assignment in one expression
157  int a, b;
158  double c, d;
159  [void] f([int, int]);
160  f([c, a] = [b, d] = 1.5);
162First a mass assignment of ©1.5© into ©[b, d]©, which assigns ©1.5© into ©b©, which is truncated to ©1©, and ©1© into ©d©, producing the tuple ©[1, 1.5]© as a result.
163That tuple is used as the right side of the multiple assignment (i.e. ©[c, a] = [1, 1.5]©) which assigns ©1© into ©c© and ©1.5© into ©a©, which is truncated to ©1©, producing the result ©[1, 1]©.
164Finally, the tuple ©[1, 1]© is used as an expression in the call to ©f©.
166\subsection{Tuple Construction}
167Tuple construction and destruction follow the same rules and semantics as tuple assignment, except that in the case where there is no right side, the default constructor or destructor is called on each component of the tuple.
169It is possible to define constructors and assignment functions for tuple types that provide new semantics.
170For example, the function ©void ?{}([T, U] *, S);© can be defined to allow a tuple variable to be constructed from a value of type ©S©. Due to the structure of generated constructors, it is possible to pass a tuple to a generated constructor for a type with a member prefix that matches the type of the tuple (e.g. an instance of ©struct S { int x; double y; int z }© can be constructed with a tuple of type ©[int, double]©, `out of the box').
172\section{Other Tuple Expressions}
173\subsection{Member Tuple Expression}
174It is possible to access multiple fields from a single expression using a Member Tuple Expression \footnote{Called ``record field tuple'' in the original design, but there's no reason to limit this feature to only structs, so ``field tuple'' or ``member tuple'' feels more appropriate.}.
175The result is a single tuple-valued expression whose type is the tuple of the types of the members.
176A member tuple expression has the form ©a.[x, y, z];© where ©a© is an expression with type ©T©, where ©T© supports member access expressions, and ©x, y, z© are all members of ©T© with types ©T$_x$©, ©T$_y$©, and ©T$_z$© respectively.
177Then the type of ©a.[x, y, z]© is ©[T$_x$, T$_y$, T$_z$]©.
178It is possible for a member tuple expression to contain other member access expressions, e.g. ©a.[x, y.[i, j], z.k]©.
179This expression is equivalent to ©[a.x, [a.y.i, a.y.j], a.z.k]©.
180It is guaranteed that the aggregate expression to the left of the ©.© on a member tuple expression is evaluated once.
182\subsection{Tuple Indexing}
183Sometimes it is desirable to access a single component of a tuple-valued expression without creating unnecessary temporary variables to assign to.
184Given a tuple-valued expression ©e© and a compile-time constant integer $i$ where $0 \leq i < n$, where $n$ is the number of components in ©e©, ©e.i© will access the $i$\textsuperscript{th} component of ©e©.
185% \footnote{If this syntax cannot be parsed, we can make it ©_i©, and a semantic check ensures that ©_i© has the right form. The capability to access a component of a tuple is helpful internally, and there doesn't seem to be a disadvantage to exposing it to users. On the other hand, it is more general than casting and much more explicit, while also being less verbose.}.
186It is possible to use a member tuple expression with tuple indexing to manually restructure a tuple (rearrange components, drop components, duplicate components, etc.).
188% TODO: mention that Tuple.member_name and Aggregate.index could have sensible semantics, but introduce complexity into the model. Agg.idx could mean get the ith member of the aggregate (further, this could be extended for enumerations as well, where the LHS is a type instead of a value), but it's not clear there is a compelling use-case. Tuple.member_name can either mean "distribute the member across the elements of the tuple" [effectively a compile-time map], or alternatively array.member_name (to mean basically the same thing). The problem with this is that it takes this expression's meaning from being clear at compile-time to needing resolver support, as the member name needs to appropriately distribute across every member of the tuple, which could itself be a tuple, etc. Again, the extra complexity is not currently justified.
190For example
192  [int, double] x;
193  [double, int, double] y = [x.1, x.0, x.1];  // (1)
195  [int, int, int] f();
196  [x.0, y.1] = f().[0, 2];                    // (2)
199(1) ©y© is initialized using a tuple expression which selects components from the tuple variable ©x©.
201(2) A mass assignment of the first and third components from the return value of ©f© into the first component of ©x© and the second component of ©y©.
204A cast to tuple type is valid when $T_n \leq S_m$, where $T_n$ is the number of components in the target type and $S_m$ is the number of components in the source type, and for each $i$ in $[0, n)$, $S_i$ can be cast to $T_i$.
205Excess elements ($S_j$ for all $j$ in $[n, m)$) are evaluated, but their values are discarded so that they are not included in the result expression.
206This naturally follows the way that a cast to void works in C.
208For example,
210  [int, int, int] f();
211  [int, [int, int], int] g();
213  ([int, double])f();           // (1)
214  ([int, int, int])g();         // (2)
215  ([void, [int, int]])g();      // (3)
216  ([int, int, int, int])g();    // (4)
217  ([int, [int, int, int]])g();  // (5)
220(1) discards the last element of the return value and converts the second element to type double.
222(2) discards the second component of the second element of the return value of ©g© (if ©g© is free of side effects, this is equivalent to ©[(int)(g().0), (int)(g().1.0), (int)(g().2)]©).
224(3) discards the first and third return values (equivalent to ©[(int)(g().1.0), (int)(g().1.1)]©).
226(4) is invalid because the cast target type contains 4 components, while the source type contains only 3.
228(5) is invalid because the cast ©([int, int, int])(g().1)© is invalid (i.e. it is invalid to cast ©[int, int]© to ©[int, int, int]©)
231\section{Tuples for Variadic Functions}
232Functions with tuple parameters can be used to provide type-safe variadic functions.
233It appears that it would be possible to leverage tuples to get similar power to what \CC vardiadic templates provide, but with the ability to separately compile them.
235\subsection{Option 1: Allow type parameters to match whole tuples, rather than just their components}
236This option could be implemented with two phases of argument matching when a function contains type parameters and the argument list contains tuple arguments.
237If flattening and structuring fail to produce a match, a second attempt at matching the function and argument combination is made where tuple arguments are not expanded and structure must match exactly, modulo implicit conversions. \footnote{It may be desirable to skip the exact matching rule if flattening and structuring produce a match that fails when inferring assertion parameters, at least in the current resolver since our assertion inference appears to be very inefficient.}
239For example:
241  forall(otype T, otype U | { T g(U); })
242  void f(T, U);
244  [int, int] g([int, int, int, int]);
246  f([1, 2], [3, 4, 5, 6]);
248With flattening and structuring, the call is first transformed into ©f(1, 2, 3, 4, 5, 6)©.
249Since the first argument of type ©T© does not have a tuple type, unification decides that ©T=int© and ©1© is matched as the first parameter.
250Likewise, ©U© does not have a tuple type, so ©U=int© and ©2© is accepted as the second parameter.
251There are now no remaining formal parameters, there are remaining arguments, and the function is not variadic, so the match fails.
253With exact matching, ©T=[int,int]© and ©U=[int,int,int,int]© and so the arguments type check.
254When inferring assertion ©g©, a match is found.
255\footnote{This type of interaction between tuple arguments and type parameters is desirable for perfect forwarding, but it's not obvious to me exactly how this should interact with assertion inference. Ideally, the same rules should apply for assertion satisfaction as apply to argument matching (i.e. flattening \& structuring should be attempted, followed by an exact match attempt on failure), but this may be more complicated than it sounds for assertion satisfaction. Aaron, I'm especially interested to hear your thoughts on this with respect to efficiency in the resolver redesign.
257For example, should we allow this to match?
259  forall(otype T, otype U | { T g(U); })
260  void f(T, U);
262  [int, int] g(int, ®[int, int]®, int);
264  f([1, 2], [3, 4, 5, 6]);
266To only have an exact matching rule here feels too strict. At the very least, it would be nice to accept ©[int, int] g(int, int, int, int)©, since that would allow for argument lists to be packaged and sent off to polymorphic functions and then directly forwarded to other functions.}.
268The addition of an exact matching rule only affects the outcome for polymorphic type binding when tuples are involved.
269For non-tuple arguments, exact matching and flattening \& structuring are equivalent. For tuple arguments to a function without polymorphic formal parameters, flattening and structuring work whenever an exact match would have worked (the tuple is flattened and implicitly restructured to its original structure).
270Thus there is nothing to be gained from permitting the exact matching rule to take effect when a function does not contain polymorphism and none of the arguments are tuples.
272\subsection{Option 2: Add another type parameter kind}
273Perhaps a simpler alternative would be to add another kind of type parameter (e.g., ©ttype©).
274There should be at most one ©ttype© parameter which must occur last in a parameter list.
275Matching against a ©ttype© parameter would consume/package all remaining argument components into a tuple, and would also match no arguments.
276These semantics more closely match normal variadic semantics, while being type-safe. C variadic syntax and ©ttype© polymorphism probably should not be mixed, since it is not clear where to draw the line to decide which arguments belong where.\footnote{In fact, if we go with this proposal, it might be desirable to disallow polymorphic functions to use C variadic syntax to encourage a Cforall style. Aside from maybe calling C variadic functions, it's not obvious to me there would be anything you can do with C variadics that couldn't also be done with ©ttype© parameters. }
278Example 1: taken from Wikipedia, demonstrates variadic templates done in a Cforall style
280  void func(void) {}                           // termination version (1)
281  forall(otype T, ttype Params | { void process(const T &); void func(const Params &); })
282  void func(const T& arg1, const Params & p) { // (2)
283    process(arg1);
284    func(p);
285  }
286  void process(int);                           // (3)
287  void process(double);                        // (4)
288  func(1, 2.0, 3.5, 4);
290In the call to ©func©, the value ©1© is taken as the first parameter, so ©T© unifies with ©int©, and the arguments ©2.0©, ©3.5©, and ©4© are consumed to form a tuple argument of type ©[double, double, int]©.
291To satisfy the assertions to ©func©, the functions (3) and (2) are implicitly selected to satisfy the requirements of ©void process(const T &)© and ©void func(const Params &)© respectively.
292Since (2) requires assertion parameters, the process repeats selecting (4) and (2).
293The matching process continues recursively until reaching the base case where (3) and (1) are selected.
294The end result is semantically equivalent to ©process(1), process(2.0), process(3.5), process(4)©.
296Since (2) is not an exact match for the expected assertion parameter, a thunk is generated that wraps a call to ©func© that accepts an argument of type ©[double, double, int]©.
297This conversion already occurs in the Cforall translator, but may require some modification to handle all of the cases present here.
299Example 2: new (i.e. type-safe malloc + constructors)
301  forall(dtype T, ttype Params | sized(T) | { void ?{}(T *, Params); })
302  T * new(Params p) {
303    return malloc(){ p };
304  }
305  array(int) * x = new(1, 2, 3, 4);
307In the call to ©new©, ©array(int)© is selected to match ©T©, and ©Params© is expanded ot match ©[int, int, int, int]©. To satisfy the assertions, a constructor with an interface compatible with ©void ?{}(array(int) *, int, int, int, int)© must exist in the current scope.
309Assertion inference can also be special cased to match functions that take tuples of any structure only for ttype parameters, if desired.
313With either option, we can generate a thunk to perform the conversion from the actual argument's structure to the structure expected by the assertion parameter and that function would be passed as the assertion argument, in a manner similar to the other thunks that are already generated.
315I prefer option 2, because it is simpler and I think the semantics are clearer.
316I wouldn't be surprised if it was also noticeably more efficient, because of the lack of backtracking.
318As a side note, option 1 also requires calls to be written explicitly, e.g. ©array(int) * x = new([1, 2, 3, 4]);©, which isn't particularly appealing.
319It shifts the burden from the compiler to the programmer, which is almost always wrong, and doesn't match with the way our tuples can be used elsewhere.
320The more I think about it, the more I'm convinced option 1 is the wrong approach, but I'm putting it out anyway in case someone has a good thought on how to make it work correctly.
326%\addcontentsline{toc}{section}{\indexname} % add index name to table of contents
328%Italic page numbers give the location of the main entry for the referenced term.
329%Plain page numbers denote uses of the indexed term.
330%Entries for grammar non-terminals are italicized.
331%A typewriter font is used for grammar terminals and program identifiers.
Note: See TracBrowser for help on using the repository browser.