Context Navigation

source: doc/rob_thesis/intro.tex @ 309be81

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsdeferred_resndemanglerenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprnew-envno_listpersistent-indexerpthread-emulationqualifiedEnumresolv-newwith_gc

Last change on this file since 309be81 was 9c14ae9, checked in by Rob Schluntz <rschlunt@…>, 7 years ago
add thesis source
Property mode set to `100644`
File size: 38.9 KB

Line
1	%======================================================================
2	\chapter{Introduction}
3	%======================================================================
4
5	\section{\CFA Background}
6	\label{s:background}
7	\CFA is a modern extension to the C programming language.
8	As it is an extension of C, there is already a wealth of existing C code and principles that govern the design of the language.
9	Among the goals set out in the original design of \CFA, four points stand out \cite{Bilson03}.
10	\begin{enumerate}
11	\item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler.
12	\item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler.
13	\item \CFA code must be at least as portable as standard C code.
14	\item Extensions introduced by \CFA must be translated in the most efficient way possible.
15	\end{enumerate}
16	Therefore, these design principles must be kept in mind throughout the design and development of new language features.
17	In order to appeal to existing C programmers, great care must be taken to ensure that new features naturally feel like C.
18	The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail. % TODO: harmonize with?
19
20	\subsection{C Background}
21	\label{sub:c_background}
22	One of the lesser-known features of standard C is \emph{designations}.
23	Designations are similar to named parameters in languages such as Python and Scala, except that they only apply to aggregate initializers.
24	\begin{cfacode}
25	struct A {
26	int w, x, y, z;
27	};
28	A a0 = { .x:4 .z:1, .x:8 };
29	A a1 = { 1, .y:7, 6 };
30	A a2[4] = { [2]:a0, [0]:a1, { .z:3 } };
31	// equvialent to
32	// A a0 = { 0, 8, 0, 1 };
33	// A a1 = { 1, 0, 7, 6 };
34	// A a2[4] = { a1, { 0, 0, 0, 3 }, a0, { 0, 0, 0, 0 } };
35	\end{cfacode}
36	Designations allow specifying the field to initialize by name, rather than by position.
37	Any field not explicitly initialized is initialized as if it had static storage duration \cite[p.~141]{C11}.
38	A designator specifies the current object for initialization, and as such any undesignated subobjects pick up where the last initialization left off.
39	For example, in the initialization of @a1@, the initializer of @y@ is @7@, and the unnamed initializer @6@ initializes the next subobject, @z@.
40	Later initializers override earlier initializers, so a subobject for which there is more than one initializer is only initailized by its last initializer.
41	This can be seen in the initialization of @a0@, where @x@ is designated twice, and thus initialized to @8@.
42	Note that in \CFA, designations use a colon separator, rather than an equals sign as in C.
43
44	C also provides \emph{compound literal} expressions, which provide a first-class mechanism for creating unnamed objects.
45	\begin{cfacode}
46	struct A { int x, y; };
47	int f(A, int);
48	int g(int *);
49
50	f((A){ 3, 4 }, (int){ 5 } = 10);
51	g((int[]){ 1, 2, 3 });
52	g(&(int){ 0 });
53	\end{cfacode}
54	Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}.
55	Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and is never an lvalue.
56
57	\subsection{Overloading}
58	\label{sub:overloading}
59	Overloading is the ability to specify multiple entities with the same name.
60	The most common form of overloading is function overloading, wherein multiple functions can be defined with the same name, but with different signatures.
61	Like in \CC, \CFA allows overloading based both on the number of parameters and on the types of parameters.
62	\begin{cfacode}
63	void f(void); // (1)
64	void f(int); // (2)
65	void f(char); // (3)
66
67	f('A'); // selects (3)
68	\end{cfacode}
69	In this case, there are three @f@ procedures, where @f@ takes either 0 or 1 arguments, and if an argument is provided then it may be of type @int@ or of type @char@.
70	Exactly which procedure is executed depends on the number and types of arguments passed.
71	If there is no exact match available, \CFA attempts to find a suitable match by examining the C built-in conversion heuristics.
72	\begin{cfacode}
73	void g(long long);
74
75	g(12345);
76	\end{cfacode}
77	In the above example, there is only one instance of @g@, which expects a single parameter of type @long long@.
78	Here, the argument provided has type @int@, but since all possible values of type @int@ can be represented by a value of type @long long@, there is a safe conversion from @int@ to @long long@, and so \CFA calls the provided @g@ routine.
79
80	In addition to this form of overloading, \CFA also allows overloading based on the number and types of \emph{return} values.
81	This extension is a feature that is not available in \CC, but is available in other programming languages such as Ada \cite{Ada95}.
82	\begin{cfacode}
83	int g(); // (1)
84	double g(); // (2)
85
86	int x = g(); // selects (1)
87	\end{cfacode}
88	Here, the only difference between the signatures of the different versions of @g@ is in the return values.
89	The result context is used to select an appropriate routine definition.
90	In this case, the result of @g@ is assigned into a variable of type @int@, so \CFA prefers the routine that returns a single @int@, because it is an exact match.
91
92	There are times when a function should logically return multiple values.
93	Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure t0 package multiple return-values.
94	\begin{cfacode}
95	int f(int * ret) { // returns a value through parameter ret
96	*ret = 37;
97	return 123;
98	}
99
100	int res1, res2; // allocate return value
101	int res1 = g(&res2); // explicitly pass storage
102	\end{cfacode}
103	The former solution is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all.
104	\begin{cfacode}
105	struct A {
106	int x, y;
107	};
108	struct A g() { // returns values through a structure
109	return (struct A) { 123, 37 };
110	}
111	struct A res3 = g();
112	... res3.x ... res3.y ... // use result values
113	\end{cfacode}
114	The latter approach requires the caller to either learn the field names of the structure or learn the names of helper routines to access the individual return values.
115	Both solutions are syntactically unnatural.
116
117	In \CFA, it is possible to directly declare a function returning mutliple values.
118	This provides important semantic information to the caller, since return values are only for output.
119	\begin{cfacode}
120	[int, int] f() { // don't need to create a new type
121	return [123, 37];
122	}
123	\end{cfacode}
124	However, the ability to return multiple values requires a syntax for accepting the results from a function.
125	In standard C, return values are most commonly assigned directly into local variables, or are used as the arguments to another function call.
126	\CFA allows both of these contexts to accept multiple return values.
127	\begin{cfacode}
128	int res1, res2;
129	[res1, res2] = f(); // assign return values into local variables
130
131	void g(int, int);
132	g(f()); // pass both return values of f to g
133	\end{cfacode}
134	As seen in the example, it is possible to assign the results from a return value directly into local variables.
135	These local variables can be referenced naturally, without requiring any unpacking as in structured return values.
136	Perhaps more interesting is the fact that multiple return values can be passed to multiple parameters seamlessly, as in the call @g(f())@.
137	In this call, the return values from @f@ are linked to the parameters of @g@ so that each of the return values is passed directly to the corresponding parameter of @g@, without any explicit storing, unpacking, or additional naming.
138
139	An extra quirk introduced by multiple return values is in the resolution of function calls.
140	\begin{cfacode}
141	int f(); // (1)
142	[int, int] f(); // (2)
143
144	void g(int, int);
145
146	int x, y;
147	[x, y] = f(); // selects (2)
148	g(f()); // selects (2)
149	\end{cfacode}
150	In this example, the only possible call to @f@ that can produce the two @int@s required by @g@ is the second option.
151	A similar reasoning holds for assigning into multiple variables.
152
153	In \CFA, overloading also applies to operator names, known as \emph{operator overloading}.
154	Similar to function overloading, a single operator is given multiple meanings by defining new versions of the operator with different signatures.
155	In \CC, this can be done as follows
156	\begin{cppcode}
157	struct A { int i; };
158	int operator+(A x, A y);
159	bool operator<(A x, A y);
160	\end{cppcode}
161
162	In \CFA, the same example can be written as follows.
163	\begin{cfacode}
164	struct A { int i; };
165	int ?+?(A x, A y);
166	bool ?<?(A x, A y);
167	\end{cfacode}
168	Notably, the only difference in this example is syntax.
169	Most of the operators supported by \CC for operator overloading are also supported in \CFA.
170	Of notable exception are the logical operators (e.g. @\|\|@), the sequence operator (i.e. @,@), and the member-access operators (e.g. @.@ and \lstinline{->}).
171
172	Finally, \CFA also permits overloading variable identifiers.
173	This feature is not available in \CC.
174	\begin{cfacode} % TODO: pick something better than x? max, zero, one?
175	struct Rational { int numer, denom; };
176	int x = 3; // (1)
177	double x = 1.27; // (2)
178	Rational x = { 4, 11 }; // (3)
179
180	void g(double);
181
182	x += 1; // chooses (1)
183	g(x); // chooses (2)
184	Rational y = x; // chooses (3)
185	\end{cfacode}
186	In this example, there are three definitions of the variable @x@.
187	Based on the context, \CFA attempts to choose the variable whose type best matches the expression context.
188
189	Finally, the values @0@ and @1@ have special status in standard C.
190	In particular, the value @0@ is both an integer and a pointer literal, and thus its meaning depends on the context.
191	In addition, several operations can be redefined in terms of other operations and the values @0@ and @1@.
192	For example,
193	\begin{cfacode}
194	int x;
195	if (x) { // if (x != 0)
196	x++; // x += 1;
197	}
198	\end{cfacode}
199	Every if statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
200	Due to these rewrite rules, the values @0@ and @1@ have the types \zero and \one in \CFA, which allow for overloading various operations that connect to @0@ and @1@ \footnote{In the original design of \CFA, @0@ and @1@ were overloadable names \cite[p.~7]{cforall}.}.
201	The types \zero and \one have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
202	\begin{cfacode}
203	// lvalue is similar to returning a reference in C++
204	lvalue Rational ?+=?(Rational *a, Rational b);
205	Rational ?=?(Rational * dst, zero_t) {
206	return *dst = (Rational){ 0, 1 };
207	}
208
209	Rational sum(Rational *arr, int n) {
210	Rational r;
211	r = 0; // use rational-zero_t assignment
212	for (; n > 0; n--) {
213	r += arr[n-1];
214	}
215	return r;
216	}
217	\end{cfacode}
218	This function takes an array of @Rational@ objects and produces the @Rational@ representing the sum of the array.
219	Note the use of an overloaded assignment operator to set an object of type @Rational@ to an appropriate @0@ value.
220
221	\subsection{Polymorphism}
222	\label{sub:polymorphism}
223	In its most basic form, polymorphism grants the ability to write a single block of code that accepts different types.
224	In particular, \CFA supports the notion of parametric polymorphism.
225	Parametric polymorphism allows a function to be written generically, for all values of all types, without regard to the specifics of a particular type.
226	For example, in \CC, the simple identity function for all types can be written as
227	\begin{cppcode}
228	template<typename T>
229	T identity(T x) { return x; }
230	\end{cppcode}
231	\CC uses the template mechanism to support parametric polymorphism. In \CFA, an equivalent function can be written as
232	\begin{cfacode}
233	forall(otype T)
234	T identity(T x) { return x; }
235	\end{cfacode}
236	Once again, the only visible difference in this example is syntactic.
237	Fundamental differences can be seen by examining more interesting examples.
238	In \CC, a generic sum function is written as follows
239	\begin{cppcode}
240	template<typename T>
241	T sum(T *arr, int n) {
242	T t;
243	for (; n > 0; n--) t += arr[n-1];
244	return t;
245	}
246	\end{cppcode}
247	Here, the code assumes the existence of a default constructor, assignment operator, and an addition operator over the provided type @T@.
248	If any of these required operators are not available, the \CC compiler produces an error message stating which operators could not be found.
249
250	A similar sum function can be written in \CFA as follows
251	\begin{cfacode}
252	forall(otype T \| R{ T ?=?(T , zero_t); T ?+=?(T , T); }R)
253	T sum(T *arr, int n) {
254	T t = 0;
255	for (; n > 0; n--) t = t += arr[n-1];
256	return t;
257	}
258	\end{cfacode}
259	The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@.
260	In particular, the assertions above specify that there must be a an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
261	The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class.
262	In addition to @otype@, there are currently two other type-classes.
263	The three type parameter kinds are summarized in \autoref{table:types}
264
265	\begin{table}[h!]
266	\begin{center}
267	\begin{tabular}{\|c\|\|c\|c\|c\|\|c\|c\|c\|}
268	\hline
269	name & object type & incomplete type & function type & can assign value & can create & has size \\ \hline
270	@otype@ & X & & & X & X & X \\ \hline
271	@dtype@ & X & X & & & & \\ \hline
272	@ftype@ & & & X & & & \\ \hline
273	\end{tabular}
274	\end{center}
275	\caption{\label{table:types} The different kinds of type parameters in \CFA}
276	\end{table}
277
278	A major difference between the approaches of \CC and \CFA to polymorphism is that the set of assumed properties for a type is \emph{explicit} in \CFA.
279	One of the major limiting factors of \CC's approach is that templates cannot be separately compiled.
280	In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled.
281
282	In \CFA, a set of assertions can be factored into a \emph{trait}.
283	\begin{cfacode}
284	trait Addable(otype T) {
285	T ?+?(T, T);
286	T ++?(T);
287	T ?++(T);
288	}
289	forall(otype T \| Addable(T)) void f(T);
290	forall(otype T \| Addable(T) \| { T --?(T); }) T g(T);
291	forall(otype T, U \| Addable(T) \| { T ?/?(T, U); }) U h(T, U);
292	\end{cfacode}
293	This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration.
294
295	\section{Invariants}
296	% TODO: discuss software engineering benefits of ctor/dtors: {pre/post} conditions, invariants
297	% an important invariant is the state of the environment (memory, resources)
298	% some objects pass their contract to the object user
299	An \emph{invariant} is a logical assertion that true for some duration of a program's execution.
300	Invariants help a programmer to reason about code correctness and prove properties of programs.
301
302	In object-oriented programming languages, type invariants are typically established in a constructor and maintained throughout the object's lifetime.
303	This is typically achieved through a combination of access control modifiers and a restricted interface.
304	Typically, data which requires the maintenance of an invariant is hidden from external sources using the \emph{private} modifier, which restricts reads and writes to a select set of trusted routines, including member functions.
305	It is these trusted routines that perform all modifications to internal data in a way that is consistent with the invariant, by ensuring that the invariant holds true at the end of the routine call.
306
307	In C, the @assert@ macro is often used to ensure invariants are true.
308	Using @assert@, the programmer can check a condition and abort execution if the condition is not true.
309	This is a powerful tool that forces the programmer to deal with logical inconsistencies as they occur.
310	For production, assertions can be removed by simply defining the preprocessor macro @NDEBUG@, making it simple to ensure that assertions are 0-cost for a performance intensive application.
311	\begin{cfacode}
312	struct Rational {
313	int n, d;
314	};
315	struct Rational create_rational(int n, int d) {
316	assert(d != 0); // precondition
317	if (d < 0) {
318	n *= -1;
319	d *= -1;
320	}
321	assert(d > 0); // postcondition
322	// rational invariant: d > 0
323	return (struct Rational) { n, d };
324	}
325	struct Rational rat_abs(struct Rational r) {
326	assert(r.d > 0); // check invariant, since no access control
327	r.n = abs(r.n);
328	assert(r.d > 0); // ensure function preserves invariant on return value
329	return r;
330	}
331	\end{cfacode}
332
333	Some languages, such as D, provide language-level support for specifying program invariants.
334	In addition to providing a C-like @assert@ expression, D allows specifying type invariants that are automatically checked at the end of a constructor, beginning of a destructor, and at the beginning and end of every public member function.
335	\begin{dcode}
336	import std.math;
337	struct Rational {
338	invariant {
339	assert(d > 0, "d <= 0");
340	}
341	int n, d;
342	this(int n, int d) { // constructor
343	assert(d != 0);
344	this.n = n;
345	this.d = d;
346	// implicitly check invariant
347	}
348	Rational abs() {
349	// implicitly check invariant
350	return Rational(std.math.abs(n), d);
351	// implicitly check invariant
352	}
353	}
354	\end{dcode}
355	The D compiler is able to assume that assertions and invariants hold true and perform optimizations based on those assumptions.
356
357	An important invariant is the state of the execution environment, including the heap, the open file table, the state of global variables, etc.
358	Since resources are finite, it is important to ensure that objects clean up properly when they are finished, restoring the execution environment to a stable state so that new objects can reuse resources.
359
360	\section{Resource Management}
361	\label{s:ResMgmt}
362
363	Resource management is a problem that pervades every programming language.
364
365	In standard C, resource management is largely a manual effort on the part of the programmer, with a notable exception to this rule being the program stack.
366	The program stack grows and shrinks automatically with each function call, as needed for local variables.
367	However, whenever a program needs a variable to outlive the block it is created in, the storage must be allocated dynamically with @malloc@ and later released with @free@.
368	This pattern is extended to more complex objects, such as files and sockets, which also outlive the block where they are created, but at their core is resource management.
369	Once allocated storage escapes a block, the responsibility for deallocating the storage is not specified in a function's type, that is, that the return value is owned by the caller.
370	This implicit convention is provided only through documentation about the expectations of functions.
371
372	In other languages, a hybrid situation exists where resources escape the allocation block, but ownership is precisely controlled by the language.
373	This pattern requires a strict interface and protocol for a data structure, where the protocol consists of a pre-initialization and a post-termination call, and all intervening access is done via interface routines.
374	This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it contains a significant portion of resource management cases.
375
376	For example, \CC directly supports this pattern through class types and an idiom known as RAII \footnote{Resource Acquisition is Initialization} by means of constructors and destructors.
377	Constructors and destructors are special routines that are automatically inserted into the appropriate locations to bookend the lifetime of an object.
378	Constructors allow the designer of a type to establish invariants for objects of that type, since it is guaranteed that every object must be initialized through a constructor.
379	In particular, constructors allow a programmer to ensure that all objects are initially set to a valid state.
380	On the other hand, destructors provide a simple mechanism for tearing down an object and resetting the environment in which the object lived.
381	RAII ensures that if all resources are acquired in a constructor and released in a destructor, there are no resource leaks, even in exceptional circumstances.
382	A type with at least one non-trivial constructor or destructor will henceforth be referred to as a \emph{managed type}.
383	In the context of \CFA, a non-trivial constructor is either a user defined constructor or an auto generated constructor that calls a non-trivial constructor.
384
385	For the remaining resource ownership cases, programmer must follow a brittle, explicit protocol for freeing resources or an implicit porotocol implemented via the programming language.
386
387	In garbage collected languages, such as Java, resources are largely managed by the garbage collector.
388	Still, garbage collectors are typically focus only on memory management.
389	There are many kinds of resources that the garbage collector does not understand, such as sockets, open files, and database connections.
390	In particular, Java supports \emph{finalizers}, which are similar to destructors.
391	Sadly, finalizers come with far fewer guarantees, to the point where a completely conforming JVM may never call a single finalizer. % TODO: citation JVM spec; http://stackoverflow.com/a/2506514/2386739
392	Due to operating system resource limits, this is unacceptable for many long running tasks. % TODO: citation?
393	Instead, the paradigm in Java requires programmers manually keep track of all resource \emph{except} memory, leading many novices and experts alike to forget to close files, etc.
394	Complicating the picture, uncaught exceptions can cause control flow to change dramatically, leaking a resource which appears on first glance to be closed.
395	\begin{javacode}
396	void write(String filename, String msg) throws Exception {
397	FileOutputStream out = new FileOutputStream(filename);
398	FileOutputStream log = new FileOutputStream(filename);
399	out.write(msg.getBytes());
400	log.write(msg.getBytes());
401	log.close();
402	out.close();
403	}
404	\end{javacode}
405	Any line in this program can throw an exception.
406	This leads to a profusion of finally blocks around many function bodies, since it isn't always clear when an exception may be thrown.
407	\begin{javacode}
408	public void write(String filename, String msg) throws Exception {
409	FileOutputStream out = new FileOutputStream(filename);
410	try {
411	FileOutputStream log = new FileOutputStream("log.txt");
412	try {
413	out.write(msg.getBytes());
414	log.write(msg.getBytes());
415	} finally {
416	log.close();
417	}
418	} finally {
419	out.close();
420	}
421	}
422	\end{javacode}
423	In Java 7, a new \emph{try-with-resources} construct was added to alleviate most of the pain of working with resources, but ultimately it still places the burden squarely on the user rather than on the library designer.
424	Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources which can throw an exception on close can leak nested resources. % TODO: cite oracle article http://www.oracle.com/technetwork/articles/java/trywithresources-401775.html?
425	\begin{javacode}
426	public void write(String filename, String msg) throws Exception {
427	try (
428	FileOutputStream out = new FileOutputStream(filename);
429	FileOutputStream log = new FileOutputStream("log.txt");
430	) {
431	out.write(msg.getBytes());
432	log.write(msg.getBytes());
433	} // automatically closes out and log in every exceptional situation
434	}
435	\end{javacode}
436	On the other hand, the Java compiler generates more code if more resources are declared, meaning that users must be more familiar with each type and library designers must provide better documentation.
437
438	% D has constructors and destructors that are worth a mention (under classes) https://dlang.org/spec/spec.html
439	% also https://dlang.org/spec/struct.html#struct-constructor
440	% these are declared in the struct, so they're closer to C++ than to CFA, at least syntactically. Also do not allow for default constructors
441	% D has a GC, which already makes the situation quite different from C/C++
442	The programming language, D, also manages resources with constructors and destructors \cite{D}.
443	In D, @struct@s are stack allocated and managed via scoping like in \CC, whereas @class@es are managed automatically by the garbage collector.
444	Like Java, using the garbage collector means that destructors may never be called, requiring the use of finally statements to ensure dynamically allocated resources that are not managed by the garbage collector, such as open files, are cleaned up.
445	Since D supports RAII, it is possible to use the same techniques as in \CC to ensure that resources are released in a timely manner.
446	Finally, D provides a scope guard statement, which allows an arbitrary statement to be executed at normal scope exit with \emph{success}, at exceptional scope exit with \emph{failure}, or at normal and exceptional scope exit with \emph{exit}. % cite? https://dlang.org/spec/statement.html#ScopeGuardStatement
447	It has been shown that the \emph{exit} form of the scope guard statement can be implemented in a library in \CC. % cite: http://www.drdobbs.com/184403758
448
449	% TODO: discussion of lexical scope vs. dynamic
450	% see Peter's suggestions
451	% RAII works in both cases. Guaranteed to work in stack case, works in heap case if root is deleted (but it's dangerous to rely on this, because of exceptions)
452
453	\section{Tuples}
454	\label{s:Tuples}
455	In mathematics, tuples are finite-length sequences which, unlike sets, allow duplicate elements.
456	In programming languages, tuples are a construct that provide fixed-sized heterogeneous lists of elements.
457	Many programming languages have tuple constructs, such as SETL, \KWC, ML, and Scala.
458
459	\KWC, a predecessor of \CFA, introduced tuples to C as an extension of the C syntax, rather than as a full-blown data type \cite{Till89}.
460	In particular, Till noted that C already contains a tuple context in the form of function parameter lists.
461	The main contributions of that work were in the form of adding tuple contexts to assignment in the form of multiple assignment and mass assignment (discussed in detail in section \ref{s:TupleAssignment}), function return values (see section \ref{s:MRV_Functions}), and record field access (see section \ref{s:MemberAccessTuple}).
462	Adding tuples to \CFA has previously been explored by Esteves \cite{Esteves04}.
463
464	The design of tuples in \KWC took much of its inspiration from SETL.
465	SETL is a high-level mathematical programming language, with tuples being one of the primary data types.
466	Tuples in SETL allow a number of operations, including subscripting, dynamic expansion, and multiple assignment.
467
468	\CCeleven introduced @std::tuple@ as a library variadic template struct.
469	Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values.
470	\begin{cppcode}
471	tuple<int, int, int> triple(10, 20, 30);
472	get<1>(triple); // access component 1 => 30
473
474	tuple<int, double> f();
475	int i;
476	double d;
477	tie(i, d) = f(); // assign fields of return value into local variables
478
479	tuple<int, int, int> greater(11, 0, 0);
480	triple < greater; // true
481	\end{cppcode}
482	Tuples are simple data structures with few specific operations.
483	In particular, it is possible to access a component of a tuple using @std::get<N>@.
484	Another interesting feature is @std::tie@, which creates a tuple of references, which allows assigning the results of a tuple-returning function into separate local variables, without requiring a temporary variable.
485	Tuples also support lexicographic comparisons, making it simple to write aggregate comparators using @std::tie@.
486
487	There is a proposal for \CCseventeen called \emph{structured bindings}, that introduces new syntax to eliminate the need to pre-declare variables and use @std::tie@ for binding the results from a function call. % TODO: cite http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0144r0.pdf
488	\begin{cppcode}
489	tuple<int, double> f();
490	auto [i, d] = f(); // unpacks into new variables i, d
491
492	tuple<int, int, int> triple(10, 20, 30);
493	auto & [t1, t2, t3] = triple;
494	t2 = 0; // changes triple
495
496	struct S { int x; double y; };
497	S s = { 10, 22.5 };
498	auto [x, y] = s; // unpack s
499	\end{cppcode}
500	Structured bindings allow unpacking any struct with all public non-static data members into fresh local variables.
501	The use of @&@ allows declaring new variables as references, which is something that cannot be done with @std::tie@, since \CC references do not support rebinding.
502	This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must documented with some other mechanism.
503	Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables.
504
505	Like \CC, D provides tuples through a library variadic template struct.
506	In D, it is possible to name the fields of a tuple type, which creates a distinct type.
507	\begin{dcode} % TODO: cite http://dlang.org/phobos/std_typecons.html
508	Tuple!(float, "x", float, "y") point2D;
509	Tuple!(float, float) float2; // different types
510
511	point2D[0]; // access first element
512	point2D.x; // access first element
513
514	float f(float x, float y) {
515	return x+y;
516	}
517
518	f(point2D.expand);
519	\end{dcode}
520	Tuples are 0-indexed and can be subscripted using an integer or field name, if applicable.
521	The @expand@ method produces the components of the tuple as a list of separate values, making it possible to call a function that takes $N$ arguments using a tuple with $N$ components.
522
523	Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML.
524	A function in SML always accepts exactly one argument.
525	There are two ways to mimic multiple argument functions: the first through currying and the second by accepting tuple arguments.
526	\begin{smlcode}
527	fun fact (n : int) =
528	if (n = 0) then 1
529	else n*fact(n-1)
530
531	fun binco (n: int, k: int) =
532	real (fact n) / real (fact k * fact (n-k))
533	\end{smlcode}
534	Here, the function @binco@ appears to take 2 arguments, but it actually takes a single argument which is implicitly decomposed via pattern matching.
535	Tuples are a foundational tool in SML, allowing the creation of arbitrarily complex structured data types.
536
537	Scala, like \CC, provides tuple types through the standard library.
538	Scala provides tuples of size 1 through 22 inclusive through generic data structures.
539	Tuples support named access and subscript access, among a few other operations.
540	\begin{scalacode}
541	val a = new Tuple3[Int, String, Double](0, "Text", 2.1) // explicit creation
542	val b = (6, 'a', 1.1f) // syntactic sugar for Tuple3[Int, Char, Float]
543	val (i, _, d) = triple // extractor syntax, ignore middle element
544
545	println(a._2) // named access => print "Text"
546	println(b.productElement(0)) // subscript access => print 6
547	\end{scalacode}
548	In Scala, tuples are primarily used as simple data structures for carrying around multiple values or for returning multiple values from a function.
549	The 22-element restriction is an odd and arbitrary choice, but in practice it doesn't cause problems since large tuples are uncommon.
550	Subscript access is provided through the @productElement@ method, which returns a value of the top-type @Any@, since it is impossible to receive a more precise type from a general subscripting method due to type erasure.
551	The disparity between named access beginning at @_1@ and subscript access starting at @0@ is likewise an oddity, but subscript access is typically avoided since it discards type information.
552	Due to the language's pattern matching facilities, it is possible to extract the values from a tuple into named variables, which is a more idiomatic way of accessing the components of a tuple.
553
554
555	\Csharp has similarly strange limitations, allowing tuples of size up to 7 components. % TODO: cite https://msdn.microsoft.com/en-us/library/system.tuple(v=vs.110).aspx
556	The officially supported workaround for this shortcoming is to nest tuples in the 8th component.
557	\Csharp allows accessing a component of a tuple by using the field @Item$N$@ for components 1 through 7, and @Rest@ for the nested tuple.
558
559
560	% TODO: cite 5.3 https://docs.python.org/3/tutorial/datastructures.html
561	In Python, tuples are immutable sequences that provide packing and unpacking operations.
562	While the tuple itself is immutable, and thus does not allow the assignment of components, there is nothing preventing a component from being internally mutable.
563	The components of a tuple can be accessed by unpacking into multiple variables, indexing, or via field name, like D.
564	Tuples support multiple assignment through a combination of packing and unpacking, in addition to the common sequence operations.
565
566	% TODO: cite https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/Types.html#//apple_ref/doc/uid/TP40014097-CH31-ID448
567	Swift, like D, provides named tuples, with components accessed by name, index, or via extractors.
568	Tuples are primarily used for returning multiple values from a function.
569	In Swift, @Void@ is an alias for the empty tuple, and there are no single element tuples.
570
571	\section{Variadic Functions}
572	\label{sec:variadic_functions}
573	In statically-typed programming languages, functions are typically defined to receive a fixed number of arguments of specified types.
574	Variadic argument functions provide the ability to define a function that can receive a theoretically unbounded number of arguments.
575
576	C provides a simple implementation of variadic functions.
577	A function whose parameter list ends with @, ...@ is a variadic function.
578	Among the most common variadic functions is @printf@.
579	\begin{cfacode}
580	int printf(const char * fmt, ...);
581	printf("%d %g %c %s", 10, 3.5, 'X', "a string");
582	\end{cfacode}
583	Through the use of a format string, @printf@ allows C programmers to print any of the standard C data types.
584	Still, @printf@ is extremely limited, since the format codes are specified by the C standard, meaning users cannot define their own format codes to extend @printf@ for new data types or new formatting rules.
585
586	C provides manipulation of variadic arguments through the @va_list@ data type, which abstracts details of the manipulation of variadic arguments.
587	Since the variadic arguments are untyped, it is up to the function to interpret any data that is passed in.
588	Additionally, the interface to manipulate @va_list@ objects is essentially limited to advancing to the next argument, without any built-in facility to determine when the last argument is read.
589	This requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types.
590	The format string in @printf@ is one such example of an argument descriptor.
591	\begin{cfacode}
592	int f(const char * fmt, ...) {
593	va_list args;
594	va_start(args, fmt); // initialize va_list
595	for (const char * c = fmt; *c != '\0'; ++c) {
596	if (*c == '%') {
597	++c;
598	switch (*c) {
599	case 'd': {
600	int i = va_arg(args, int); // have to specify type
601	// ...
602	break;
603	}
604	case 'g': {
605	double d = va_arg(args, double);
606	// ...
607	break;
608	}
609	...
610	}
611	}
612	}
613	va_end(args);
614	return ...;
615	}
616	\end{cfacode}
617	Every case must be handled explicitly, since the @va_arg@ macro requires a type argument to determine how the next set of bytes is to be interpreted.
618	Furthermore, if the user makes a mistake, compile-time checking is typically restricted to standard format codes and their corresponding types.
619	In general, this means that C's variadic functions are not type-safe, making them difficult to use properly.
620
621	% When arguments are passed to a variadic function, they undergo \emph{default argument promotions}.
622	% Specifically, this means that
623
624	\CCeleven added support for \emph{variadic templates}, which add much needed type-safety to C's variadic landscape.
625	It is possible to use variadic templates to define variadic functions and variadic data types.
626	\begin{cppcode}
627	void print(int);
628	void print(char);
629	void print(double);
630	...
631
632	void f() {} // base case
633
634	template<typename T, typename... Args>
635	void f(const T & arg, const Args &... rest) {
636	print(arg); // print the current element
637	f(rest...); // handle remaining arguments recursively
638	}
639	\end{cppcode}
640	Variadic templates work largely through recursion on the \emph{parameter pack}, which is the argument with @...@ following its type.
641	A parameter pack matches 0 or more elements, which can be types or expressions depending on the context.
642	Like other templates, variadic template functions rely on an implicit set of constraints on a type, in this example a @print@ routine.
643	That is, it is possible to use the @f@ routine any any type provided there is a corresponding @print@ routine, making variadic templates fully open to extension, unlike variadic functions in C.
644
645	Recent \CC standards (\CCfourteen, \CCseventeen) expand on the basic premise by allowing variadic template variables and providing convenient expansion syntax to remove the need for recursion in some cases, amongst other things.
646
647	% D has variadic templates that deserve a mention http://dlang.org/ctarguments.html
648
649	In Java, a variadic function appears similar to a C variadic function in syntax.
650	\begin{javacode}
651	int sum(int... args) {
652	int s = 0;
653	for (int x : args) {
654	s += x;
655	}
656	return s;
657	}
658
659	void print(Object... objs) {
660	for (Object obj : objs) {
661	System.out.print(obj);
662	}
663	}
664
665	print("The sum from 1 to 10 is ", sum(1,2,3,4,5,6,7,8,9,10), ".\n");
666	\end{javacode}
667	The key difference is that Java variadic functions are type-safe, because they specify the type of the argument immediately prior to the ellipsis.
668	In Java, variadic arguments are syntactic sugar for arrays, allowing access to length, subscripting operations, and for-each iteration on the variadic arguments, among other things.
669	Since the argument type is specified explicitly, the top-type @Object@ can be used to accept arguments of any type, but to do anything interesting on the argument requires a down-cast to a more specific type, landing Java in a similar situation to C in that writing a function open to extension is difficult.
670
671	The other option is to restrict the number of types that can be passed to the function by using a more specific type.
672	Unfortunately, Java's use of nominal inheritance means that types must explicitly inherit from classes or interfaces in order to be considered a subclass.
673	The combination of these two issues greatly restricts the usefulness of variadic functions in Java.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: