source: doc/proposals/tuples/tuples.tex @ 2ebcb28

Last change on this file since 2ebcb28 was 484ee53, checked in by Peter A. Buhr <pabuhr@…>, 6 years ago

update Makefiles so ${Build} is order only

  • Property mode set to 100644
File size: 22.6 KB
5% Latex packages used in the document (copied from CFA user manual).
6\usepackage[T1]{fontenc}                                % allow Latin1 (extended ASCII) characters
12\usepackage{upquote}                                    % switch curled `'" to straight
16\usepackage{varioref}                                   % extended references
17\usepackage{listings}                                   % format program code
18\usepackage[flushmargin]{footmisc}                      % support label/reference in footnote
19\usepackage{latexsym}                                   % \Box glyph
20\usepackage{mathptmx}                                   % better math font with "times"
24\input{common}                                          % bespoke macros used in the document
29\setlength{\topmargin}{-0.45in}                         % move running title into header
38\CFAStyle                                               % use default CFA format-style
39% inline code ©...© (copyright symbol) emacs: C-q M-)
40% red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
41% blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
42% green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
43% LaTex escape §...§ (section symbol) emacs: C-q M-'
44% keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
45% math escape $...$ (dollar symbol)
49\Huge \vspace*{1in} Tuple Design in \CFA \\
54\huge Rob Schluntz \\
55\Large \vspace*{0.1in} \texttt{} \\
56\Large Cheriton School of Computer Science \\
57\Large University of Waterloo
66\newcommand{\bigO}[1]{O\!\left( #1 \right)}
70% changed after setting pagestyle
71\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
72\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
74% \linenumbers                                            % comment out to turn off line numbering
90This document describes my understanding of the existing tuple design~\cite{Buhr94a,Till89}, mixed with my thoughts on improvements after various discussions with Peter, Aaron, and Thierry.
92\section{Tuple Expressions}
93A tuple expression is an expression which produces a fixed-size, ordered list of values of heterogeneous types.
94The type of a tuple expression is the tuple of the subexpression types.
95In Cforall, a tuple expression is denoted by a list of expressions enclosed in square brackets.
97For example, the expression ©[5, 'x', 10.5]© has type ©[int, char, double]©.
98Tuples are a compile-time phenomenon and have little to no run-time presence.
100\subsection{Tuple Variables}
101It is possible to have variables of tuple type, pointer to tuple type, and array of tuple type.
102Tuple types can be composed of any types, except for array types \footnote{I did not see this issue mentioned at all in the original design. A tuple containing an array type seems to make sense up until you try to use a tuple containing an array type as a function parameter. At this point you lose information about the size of the array, which makes tuple assignment difficult. Rather than allowing arrays in most situations and disallowing only as function parameters, it seems like it would be better to be consistent across the board.}.
105[double, int] di;
106[double, int] * pdi
107[double, int] adi[10];
109The program above declares a variable of type ©[double, int]©, a variable of type pointer to ©[double, int]©, and an array of ten ©[double, int]©.
111\subsection{Flattening and Structuring}
112In Cforall, tuples do not have a rigid structure.
113In function call contexts, tuples support implicit flattening and restructuring \footnote{In the original tuple design, four tuple coercions were described: opening, closing, flattening, and structuring. I've combined flattening with opening and structuring with closing in my description, as the distinctions do not seem useful in Cforall since these coercions happen only as arguments to function calls, and I believe all of the semantics are properly covered by the simplified descriptions.}. Tuple flattening recursively expands a tuple into the list of its basic components. Tuple structuring packages a list of expressions into a value of tuple type.
116int f(int, int);
117int g([int, int]);
118int h(int, [int, int]);
119[int, int] x;
120int y;
123g(y, 10);
124h(x, y);
126In Cforall, each of these calls is valid.
127In the call to ©f©, ©x© is implicitly flattened so that the components of ©x© are passed as the two arguments to ©f©.
128For the call to ©g©, the values ©y© and ©10© are structured into a single argument of type ©[int, int]© to match the type of the parameter of ©g©.
129Finally, in the call to ©h©, ©y© is flattened to yield an argument list of length 3, of which the first component of ©x© is passed as the first parameter of ©h©, and the second component of ©x© and ©y© are structured into the second argument of type ©[int, int]©.
132\subsection{Argument Passing}
133In resolving a function call, all of the arguments to the call are flattened.
134While determining if a particular function/argument-list combination is valid, the arguments are structured to match the shape of each formal parameter, in order.
136For example, given a function declaration ©[int] f(int, [double, int])©, the call ©f([5, 10.2], 0)© can be satisfied by first flattening the tuple to yield the expression ©f(5, 10.2, 0)© and then structuring the argument list to match the formal parameter list structure as ©f(5, [10.2, 0])©.
138\subsection{Multiple-Return-Value Functions}
139Functions can be declared to return more than one value.
140Multiple return values are packaged into a tuple value when the function returns.
141A multiple-returning function with return type ©T© can return any expression which is implicitly convertible to ©T©.
143\subsection{Tuple Assignment}
144An assignment where the left side of the assignment operator has a tuple type is called tuple assignment.
145There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called Multiple Assignment and Mass Assignment respectively.
146Let $L_i$ for $i$ in $[0, n)$ represent each component of the flattened left side, $R_i$ represent each component of the flattened right side of a multiple assignment, and $R$ represent the right side of a mass assignment.
148For a multiple assignment to be valid, both tuples must have the same number of elements when flattened. Multiple assignment assigns $R_i$ to $L_i$ for each $i$.
149That is, ©?=?(&$L_i$, $R_i$)© must be a well-typed expression.
151Mass assignment assigns the value $R$ to each $L_i$. For a mass assignment to be valid, ©?=?(&$L_i$, $R$)© must be a well-typed expression.
152This differs from C cascading assignment (e.g. ©a=b=c©) in that conversions are applied to $R$ in each individual assignment, which prevents data loss from the chain of conversions that can happen during a cascading assignment.
154Both kinds of tuple assignment have parallel semantics, such that each value on the left side and right side is evaluated \emph{before} any assignments occur.
155Tuple assignment is an expression where the result type is the type of the left side of the assignment, as in normal assignment (i.e. the tuple of the types of the left side expressions) \footnote{This is a change from the original tuple design, wherein tuple assignment was a statement. This decision appears to have been made in an attempt to fix what was seen as a problem with assignment, but at the same time this doesn't seem to fit C or Cforall very well. In another language, tuple assignment as a statement could be reasonable, but I don't see a good justification for making this the only kind of assignment that isn't an expression. In this case, I would value consistency over idealism}.
156These semantics allow cascading tuple assignment to work out naturally in any context where a tuple is permitted.
158The following example shows multiple, mass, and cascading assignment in one expression
160  int a, b;
161  double c, d;
162  [void] f([int, int]);
163  f([c, a] = [b, d] = 1.5);
165First a mass assignment of ©1.5© into ©[b, d]©, which assigns ©1.5© into ©b©, which is truncated to ©1©, and ©1© into ©d©, producing the tuple ©[1, 1.5]© as a result.
166That tuple is used as the right side of the multiple assignment (i.e. ©[c, a] = [1, 1.5]©) which assigns ©1© into ©c© and ©1.5© into ©a©, which is truncated to ©1©, producing the result ©[1, 1]©.
167Finally, the tuple ©[1, 1]© is used as an expression in the call to ©f©.
169\subsection{Tuple Construction}
170Tuple construction and destruction follow the same rules and semantics as tuple assignment, except that in the case where there is no right side, the default constructor or destructor is called on each component of the tuple.
172It is possible to define constructors and assignment functions for tuple types that provide new semantics.
173For example, the function ©void ?{}([T, U] *, S);© can be defined to allow a tuple variable to be constructed from a value of type ©S©. Due to the structure of generated constructors, it is possible to pass a tuple to a generated constructor for a type with a member prefix that matches the type of the tuple (e.g. an instance of ©struct S { int x; double y; int z }© can be constructed with a tuple of type ©[int, double]©, `out of the box').
175\section{Other Tuple Expressions}
176\subsection{Member Tuple Expression}
177It is possible to access multiple fields from a single expression using a Member Tuple Expression \footnote{Called ``record field tuple'' in the original design, but there's no reason to limit this feature to only structs, so ``field tuple'' or ``member tuple'' feels more appropriate.}.
178The result is a single tuple-valued expression whose type is the tuple of the types of the members.
179A member tuple expression has the form ©a.[x, y, z];© where ©a© is an expression with type ©T©, where ©T© supports member access expressions, and ©x, y, z© are all members of ©T© with types ©T$_x$©, ©T$_y$©, and ©T$_z$© respectively.
180Then the type of ©a.[x, y, z]© is ©[T$_x$, T$_y$, T$_z$]©.
181It is possible for a member tuple expression to contain other member access expressions, e.g. ©a.[x, y.[i, j], z.k]©.
182This expression is equivalent to ©[a.x, [a.y.i, a.y.j], a.z.k]©.
183It is guaranteed that the aggregate expression to the left of the ©.© on a member tuple expression is evaluated once.
185\subsection{Tuple Indexing}
186Sometimes it is desirable to access a single component of a tuple-valued expression without creating unnecessary temporary variables to assign to.
187Given a tuple-valued expression ©e© and a compile-time constant integer $i$ where $0 \leq i < n$, where $n$ is the number of components in ©e©, ©e.i© will access the $i$\textsuperscript{th} component of ©e©.
188% \footnote{If this syntax cannot be parsed, we can make it ©_i©, and a semantic check ensures that ©_i© has the right form. The capability to access a component of a tuple is helpful internally, and there doesn't seem to be a disadvantage to exposing it to users. On the other hand, it is more general than casting and much more explicit, while also being less verbose.}.
189It is possible to use a member tuple expression with tuple indexing to manually restructure a tuple (rearrange components, drop components, duplicate components, etc.).
191% TODO: mention that Tuple.member_name and Aggregate.index could have sensible semantics, but introduce complexity into the model. Agg.idx could mean get the ith member of the aggregate (further, this could be extended for enumerations as well, where the LHS is a type instead of a value), but it's not clear there is a compelling use-case. Tuple.member_name can either mean "distribute the member across the elements of the tuple" [effectively a compile-time map], or alternatively array.member_name (to mean basically the same thing). The problem with this is that it takes this expression's meaning from being clear at compile-time to needing resolver support, as the member name needs to appropriately distribute across every member of the tuple, which could itself be a tuple, etc. Again, the extra complexity is not currently justified.
193For example
195  [int, double] x;
196  [double, int, double] y = [x.1, x.0, x.1];  // (1)
198  [int, int, int] f();
199  [x.0, y.1] = f().[0, 2];                    // (2)
202(1) ©y© is initialized using a tuple expression which selects components from the tuple variable ©x©.
204(2) A mass assignment of the first and third components from the return value of ©f© into the first component of ©x© and the second component of ©y©.
207A cast to tuple type is valid when $T_n \leq S_m$, where $T_n$ is the number of components in the target type and $S_m$ is the number of components in the source type, and for each $i$ in $[0, n)$, $S_i$ can be cast to $T_i$.
208Excess elements ($S_j$ for all $j$ in $[n, m)$) are evaluated, but their values are discarded so that they are not included in the result expression.
209This naturally follows the way that a cast to void works in C.
211For example,
213  [int, int, int] f();
214  [int, [int, int], int] g();
216  ([int, double])f();           // (1)
217  ([int, int, int])g();         // (2)
218  ([void, [int, int]])g();      // (3)
219  ([int, int, int, int])g();    // (4)
220  ([int, [int, int, int]])g();  // (5)
223(1) discards the last element of the return value and converts the second element to type double.
225(2) discards the second component of the second element of the return value of ©g© (if ©g© is free of side effects, this is equivalent to ©[(int)(g().0), (int)(g().1.0), (int)(g().2)]©).
227(3) discards the first and third return values (equivalent to ©[(int)(g().1.0), (int)(g().1.1)]©).
229(4) is invalid because the cast target type contains 4 components, while the source type contains only 3.
231(5) is invalid because the cast ©([int, int, int])(g().1)© is invalid (i.e. it is invalid to cast ©[int, int]© to ©[int, int, int]©)
234\section{Tuples for Variadic Functions}
235Functions with tuple parameters can be used to provide type-safe variadic functions.
236It appears that it would be possible to leverage tuples to get similar power to what \CC vardiadic templates provide, but with the ability to separately compile them.
238\subsection{Option 1: Allow type parameters to match whole tuples, rather than just their components}
239This option could be implemented with two phases of argument matching when a function contains type parameters and the argument list contains tuple arguments.
240If flattening and structuring fail to produce a match, a second attempt at matching the function and argument combination is made where tuple arguments are not expanded and structure must match exactly, modulo implicit conversions. \footnote{It may be desirable to skip the exact matching rule if flattening and structuring produce a match that fails when inferring assertion parameters, at least in the current resolver since our assertion inference appears to be very inefficient.}
242For example:
244  forall(otype T, otype U | { T g(U); })
245  void f(T, U);
247  [int, int] g([int, int, int, int]);
249  f([1, 2], [3, 4, 5, 6]);
251With flattening and structuring, the call is first transformed into ©f(1, 2, 3, 4, 5, 6)©.
252Since the first argument of type ©T© does not have a tuple type, unification decides that ©T=int© and ©1© is matched as the first parameter.
253Likewise, ©U© does not have a tuple type, so ©U=int© and ©2© is accepted as the second parameter.
254There are now no remaining formal parameters, there are remaining arguments, and the function is not variadic, so the match fails.
256With exact matching, ©T=[int,int]© and ©U=[int,int,int,int]© and so the arguments type check.
257When inferring assertion ©g©, a match is found.
258\footnote{This type of interaction between tuple arguments and type parameters is desirable for perfect forwarding, but it's not obvious to me exactly how this should interact with assertion inference. Ideally, the same rules should apply for assertion satisfaction as apply to argument matching (i.e. flattening \& structuring should be attempted, followed by an exact match attempt on failure), but this may be more complicated than it sounds for assertion satisfaction. Aaron, I'm especially interested to hear your thoughts on this with respect to efficiency in the resolver redesign.
260For example, should we allow this to match?
262  forall(otype T, otype U | { T g(U); })
263  void f(T, U);
265  [int, int] g(int, ®[int, int]®, int);
267  f([1, 2], [3, 4, 5, 6]);
269To only have an exact matching rule here feels too strict. At the very least, it would be nice to accept ©[int, int] g(int, int, int, int)©, since that would allow for argument lists to be packaged and sent off to polymorphic functions and then directly forwarded to other functions.}.
271The addition of an exact matching rule only affects the outcome for polymorphic type binding when tuples are involved.
272For non-tuple arguments, exact matching and flattening \& structuring are equivalent. For tuple arguments to a function without polymorphic formal parameters, flattening and structuring work whenever an exact match would have worked (the tuple is flattened and implicitly restructured to its original structure).
273Thus there is nothing to be gained from permitting the exact matching rule to take effect when a function does not contain polymorphism and none of the arguments are tuples.
275\subsection{Option 2: Add another type parameter kind}
276Perhaps a simpler alternative would be to add another kind of type parameter (e.g., ©ttype©).
277There should be at most one ©ttype© parameter which must occur last in a parameter list.
278Matching against a ©ttype© parameter would consume/package all remaining argument components into a tuple, and would also match no arguments.
279These semantics more closely match normal variadic semantics, while being type-safe. C variadic syntax and ©ttype© polymorphism probably should not be mixed, since it is not clear where to draw the line to decide which arguments belong where.\footnote{In fact, if we go with this proposal, it might be desirable to disallow polymorphic functions to use C variadic syntax to encourage a Cforall style. Aside from maybe calling C variadic functions, it's not obvious to me there would be anything you can do with C variadics that couldn't also be done with ©ttype© parameters. }
281Example 1: taken from Wikipedia, demonstrates variadic templates done in a Cforall style
283  void func(void) {}                           // termination version (1)
284  forall(otype T, ttype Params | { void process(const T &); void func(const Params &); })
285  void func(const T& arg1, const Params & p) { // (2)
286    process(arg1);
287    func(p);
288  }
289  void process(int);                           // (3)
290  void process(double);                        // (4)
291  func(1, 2.0, 3.5, 4);
293In the call to ©func©, the value ©1© is taken as the first parameter, so ©T© unifies with ©int©, and the arguments ©2.0©, ©3.5©, and ©4© are consumed to form a tuple argument of type ©[double, double, int]©.
294To satisfy the assertions to ©func©, the functions (3) and (2) are implicitly selected to satisfy the requirements of ©void process(const T &)© and ©void func(const Params &)© respectively.
295Since (2) requires assertion parameters, the process repeats selecting (4) and (2).
296The matching process continues recursively until reaching the base case where (3) and (1) are selected.
297The end result is semantically equivalent to ©process(1), process(2.0), process(3.5), process(4)©.
299Since (2) is not an exact match for the expected assertion parameter, a thunk is generated that wraps a call to ©func© that accepts an argument of type ©[double, double, int]©.
300This conversion already occurs in the Cforall translator, but may require some modification to handle all of the cases present here.
302Example 2: new (i.e. type-safe malloc + constructors)
304  forall(dtype T, ttype Params | sized(T) | { void ?{}(T *, Params); })
305  T * new(Params p) {
306    return malloc(){ p };
307  }
308  array(int) * x = new(1, 2, 3, 4);
310In the call to ©new©, ©array(int)© is selected to match ©T©, and ©Params© is expanded ot match ©[int, int, int, int]©. To satisfy the assertions, a constructor with an interface compatible with ©void ?{}(array(int) *, int, int, int, int)© must exist in the current scope.
312Assertion inference can also be special cased to match functions that take tuples of any structure only for ttype parameters, if desired.
316With either option, we can generate a thunk to perform the conversion from the actual argument's structure to the structure expected by the assertion parameter and that function would be passed as the assertion argument, in a manner similar to the other thunks that are already generated.
318I prefer option 2, because it is simpler and I think the semantics are clearer.
319I wouldn't be surprised if it was also noticeably more efficient, because of the lack of backtracking.
321As a side note, option 1 also requires calls to be written explicitly, e.g. ©array(int) * x = new([1, 2, 3, 4]);©, which isn't particularly appealing.
322It shifts the burden from the compiler to the programmer, which is almost always wrong, and doesn't match with the way our tuples can be used elsewhere.
323The more I think about it, the more I'm convinced option 1 is the wrong approach, but I'm putting it out anyway in case someone has a good thought on how to make it work correctly.
329%\addcontentsline{toc}{section}{\indexname} % add index name to table of contents
331%Italic page numbers give the location of the main entry for the referenced term.
332%Plain page numbers denote uses of the indexed term.
333%Entries for grammar non-terminals are italicized.
334%A typewriter font is used for grammar terminals and program identifiers.
Note: See TracBrowser for help on using the repository browser.