source: doc/rob_thesis/intro.tex @ 4635c79

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsdeferred_resndemanglerenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprnew-envno_listpersistent-indexerpthread-emulationqualifiedEnumresolv-newwith_gc
Last change on this file since 4635c79 was 0111dc7, checked in by Rob Schluntz <rschlunt@…>, 8 years ago

penultimate thesis draft

  • Property mode set to 100644
File size: 43.2 KB
Line 
1%======================================================================
2\chapter{Introduction}
3%======================================================================
4
5\section{\CFA Background}
6\label{s:background}
7\CFA \footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is a modern non-object-oriented extension to the C programming language.
8As it is an extension of C, there is already a wealth of existing C code and principles that govern the design of the language.
9Among the goals set out in the original design of \CFA, four points stand out \cite{Bilson03}.
10\begin{enumerate}
11\item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler.
12\item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler.
13\item \CFA code must be at least as portable as standard C code.
14\item Extensions introduced by \CFA must be translated in the most efficient way possible.
15\end{enumerate}
16Therefore, these design principles must be kept in mind throughout the design and development of new language features.
17In order to appeal to existing C programmers, great care must be taken to ensure that new features naturally feel like C.
18These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used.
19Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated.
20
21The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail.
22
23\subsection{C Background}
24\label{sub:c_background}
25One of the lesser-known features of standard C is \emph{designations}.
26Designations are similar to named parameters in languages such as Python and Scala, except that they only apply to aggregate initializers.
27\begin{cfacode}
28struct A {
29  int w, x, y, z;
30};
31A a0 = { .x:4 .z:1, .x:8 };
32A a1 = { 1, .y:7, 6 };
33A a2[4] = { [2]:a0, [0]:a1, { .z:3 } };
34// equivalent to
35// A a0 = { 0, 8, 0, 1 };
36// A a1 = { 1, 0, 7, 6 };
37// A a2[4] = { a1, { 0, 0, 0, 3 }, a0, { 0, 0, 0, 0 } };
38\end{cfacode}
39Designations allow specifying the field to initialize by name, rather than by position.
40Any field not explicitly initialized is initialized as if it had static storage duration \cite[p.~141]{C11}.
41A designator specifies the current object for initialization, and as such any undesignated sub-objects pick up where the last initialization left off.
42For example, in the initialization of @a1@, the initializer of @y@ is @7@, and the unnamed initializer @6@ initializes the next sub-object, @z@.
43Later initializers override earlier initializers, so a sub-object for which there is more than one initializer is only initialized by its last initializer.
44These semantics can be seen in the initialization of @a0@, where @x@ is designated twice, and thus initialized to @8@.
45Note that in \CFA, designations use a colon separator, rather than an equals sign as in C, because this syntax is one of the few places that conflicts with the new language features.
46
47C also provides \emph{compound literal} expressions, which provide a first-class mechanism for creating unnamed objects.
48\begin{cfacode}
49struct A { int x, y; };
50int f(A, int);
51int g(int *);
52
53f((A){ 3, 4 }, (int){ 5 } = 10);
54g((int[]){ 1, 2, 3 });
55g(&(int){ 0 });
56\end{cfacode}
57Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}.
58Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and coercions and is never an lvalue.
59
60\subsection{Overloading}
61\label{sub:overloading}
62Overloading is the ability to specify multiple entities with the same name.
63The most common form of overloading is function overloading, wherein multiple functions can be defined with the same name, but with different signatures.
64C provides a small amount of built-in overloading, \eg + is overloaded for the basic types.
65Like in \CC, \CFA allows user-defined overloading based both on the number of parameters and on the types of parameters.
66  \begin{cfacode}
67  void f(void);  // (1)
68  void f(int);   // (2)
69  void f(char);  // (3)
70
71  f('A');        // selects (3)
72  \end{cfacode}
73In this case, there are three @f@ procedures, where @f@ takes either 0 or 1 arguments, and if an argument is provided then it may be of type @int@ or of type @char@.
74Exactly which procedure is executed depends on the number and types of arguments passed.
75If there is no exact match available, \CFA attempts to find a suitable match by examining the C built-in conversion heuristics.
76  \begin{cfacode}
77  void g(long long);
78
79  g(12345);
80  \end{cfacode}
81In the above example, there is only one instance of @g@, which expects a single parameter of type @long long@.
82Here, the argument provided has type @int@, but since all possible values of type @int@ can be represented by a value of type @long long@, there is a safe conversion from @int@ to @long long@, and so \CFA calls the provided @g@ routine.
83
84In addition to this form of overloading, \CFA also allows overloading based on the number and types of \emph{return} values.
85This extension is a feature that is not available in \CC, but is available in other programming languages such as Ada \cite{Ada95}.
86  \begin{cfacode}
87  int g();         // (1)
88  double g();      // (2)
89
90  int x = g();     // selects (1)
91  \end{cfacode}
92Here, the only difference between the signatures of the different versions of @g@ is in the return values.
93The result context is used to select an appropriate routine definition.
94In this case, the result of @g@ is assigned into a variable of type @int@, so \CFA prefers the routine that returns a single @int@, because it is an exact match.
95
96There are times when a function should logically return multiple values.
97Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure to package multiple return-values.
98For example, the first approach:
99\begin{cfacode}
100int f(int * ret) {        // returns a value through parameter ret
101  *ret = 37;
102  return 123;
103}
104
105int res1, res2;           // allocate return value
106int res1 = g(&res2);      // explicitly pass storage
107\end{cfacode}
108is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all.
109The second approach:
110\begin{cfacode}
111struct A {
112  int x, y;
113};
114struct A g() {            // returns values through a structure
115  return (struct A) { 123, 37 };
116}
117struct A res3 = g();
118... res3.x ... res3.y ... // use result values
119\end{cfacode}
120is awkward because the caller has to either learn the field names of the structure or learn the names of helper routines to access the individual return values.
121Both approaches are syntactically unnatural.
122
123In \CFA, it is possible to directly declare a function returning multiple values.
124This extension provides important semantic information to the caller, since return values are only for output.
125\begin{cfacode}
126[int, int] f() {       // no new type
127  return [123, 37];
128}
129\end{cfacode}
130However, the ability to return multiple values is useless without a syntax for accepting the results from the function.
131
132In standard C, return values are most commonly assigned directly into local variables, or are used as the arguments to another function call.
133\CFA allows both of these contexts to accept multiple return values.
134\begin{cfacode}
135int res1, res2;
136[res1, res2] = f();    // assign return values into local variables
137
138void g(int, int);
139g(f());                // pass both return values of f to g
140\end{cfacode}
141As seen in the example, it is possible to assign the results from a return value directly into local variables.
142These local variables can be referenced naturally, without requiring any unpacking as in structured return values.
143Perhaps more interesting is the fact that multiple return values can be passed to multiple parameters seamlessly, as in the call @g(f())@.
144In this call, the return values from @f@ are linked to the parameters of @g@ so that each of the return values is passed directly to the corresponding parameter of @g@, without any explicit storing, unpacking, or additional naming.
145
146An extra quirk introduced by multiple return values is in the resolution of function calls.
147  \begin{cfacode}
148  int f();            // (1)
149  [int, int] f();     // (2)
150
151  void g(int, int);
152
153  int x, y;
154  [x, y] = f();       // selects (2)
155  g(f());             // selects (2)
156  \end{cfacode}
157In this example, the only possible call to @f@ that can produce the two @int@s required for assigning into the variables @x@ and @y@ is the second option.
158A similar reasoning holds calling the function @g@.
159
160In \CFA, overloading also applies to operator names, known as \emph{operator overloading}.
161Similar to function overloading, a single operator is given multiple meanings by defining new versions of the operator with different signatures.
162In \CC, this can be done as follows
163  \begin{cppcode}
164  struct A { int i; };
165  int operator+(A x, A y);
166  bool operator<(A x, A y);
167  \end{cppcode}
168
169In \CFA, the same example can be written as follows.
170  \begin{cfacode}
171  struct A { int i; };
172  int ?+?(A x, A y);    // '?'s represent operands
173  bool ?<?(A x, A y);
174  \end{cfacode}
175Notably, the only difference is syntax.
176Most of the operators supported by \CC for operator overloading are also supported in \CFA.
177Of notable exception are the logical operators (\eg @||@), the sequence operator (\ie @,@), and the member-access operators (\eg @.@ and \lstinline{->}).
178
179Finally, \CFA also permits overloading variable identifiers.
180This feature is not available in \CC.
181  \begin{cfacode}
182  struct Rational { int numer, denom; };
183  int x = 3;               // (1)
184  double x = 1.27;         // (2)
185  Rational x = { 4, 11 };  // (3)
186
187  void g(double);
188
189  x += 1;                  // chooses (1)
190  g(x);                    // chooses (2)
191  Rational y = x;          // chooses (3)
192  \end{cfacode}
193In this example, there are three definitions of the variable @x@.
194Based on the context, \CFA attempts to choose the variable whose type best matches the expression context.
195When used judiciously, this feature allows names like @MAX@, @MIN@, and @PI@ to apply across many types.
196
197Finally, the values @0@ and @1@ have special status in standard C.
198In particular, the value @0@ is both an integer and a pointer literal, and thus its meaning depends on the context.
199In addition, several operations can be redefined in terms of other operations and the values @0@ and @1@.
200For example,
201\begin{cfacode}
202int x;
203if (x) {  // if (x != 0)
204  x++;    //   x += 1;
205}
206\end{cfacode}
207Every if- and iteration-statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
208Due to these rewrite rules, the values @0@ and @1@ have the types \zero and \one in \CFA, which allow for overloading various operations that connect to @0@ and @1@ \footnote{In the original design of \CFA, @0@ and @1@ were overloadable names \cite[p.~7]{cforall}.}.
209The types \zero and \one have special built-in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
210  \begin{cfacode}
211  // lvalue is similar to returning a reference in C++
212  lvalue Rational ?+=?(Rational *a, Rational b);
213  Rational ?=?(Rational * dst, zero_t) {
214    return *dst = (Rational){ 0, 1 };
215  }
216
217  Rational sum(Rational *arr, int n) {
218    Rational r;
219    r = 0;     // use rational-zero_t assignment
220    for (; n > 0; n--) {
221      r += arr[n-1];
222    }
223    return r;
224  }
225  \end{cfacode}
226This function takes an array of @Rational@ objects and produces the @Rational@ representing the sum of the array.
227Note the use of an overloaded assignment operator to set an object of type @Rational@ to an appropriate @0@ value.
228
229\subsection{Polymorphism}
230\label{sub:polymorphism}
231In its most basic form, polymorphism grants the ability to write a single block of code that accepts different types.
232In particular, \CFA supports the notion of parametric polymorphism.
233Parametric polymorphism allows a function to be written generically, for all values of all types, without regard to the specifics of a particular type.
234For example, in \CC, the simple identity function for all types can be written as
235  \begin{cppcode}
236  template<typename T>
237  T identity(T x) { return x; }
238  \end{cppcode}
239\CC uses the template mechanism to support parametric polymorphism. In \CFA, an equivalent function can be written as
240  \begin{cfacode}
241  forall(otype T)
242  T identity(T x) { return x; }
243  \end{cfacode}
244Once again, the only visible difference in this example is syntactic.
245Fundamental differences can be seen by examining more interesting examples.
246In \CC, a generic sum function is written as follows
247  \begin{cppcode}
248  template<typename T>
249  T sum(T *arr, int n) {
250    T t;  // default construct => 0
251    for (; n > 0; n--) t += arr[n-1];
252    return t;
253  }
254  \end{cppcode}
255Here, the code assumes the existence of a default constructor, assignment operator, and an addition operator over the provided type @T@.
256If any of these required operators are not available, the \CC compiler produces an error message stating which operators could not be found.
257
258A similar sum function can be written in \CFA as follows
259  \begin{cfacode}
260  forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
261  T sum(T *arr, int n) {
262    T t = 0;
263    for (; n > 0; n--) t = t += arr[n-1];
264    return t;
265  }
266  \end{cfacode}
267The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@.
268In particular, the assertions above specify that there must be an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
269The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class.
270In addition to @otype@, there are currently two other type-classes.
271The three type parameter kinds are summarized in \autoref{table:types}
272
273\begin{table}[h!]
274  \begin{center}
275    \begin{tabular}{|c||c|c|c||c|c|c|}
276                                                                                                    \hline
277    name    & object type & incomplete type & function type & can assign value & can create & has size \\ \hline
278    @otype@ & X           &                 &               & X                & X          & X        \\ \hline
279    @dtype@ & X           & X               &               &                  &            &          \\ \hline
280    @ftype@ &             &                 & X             &                  &            &          \\ \hline
281    \end{tabular}
282  \end{center}
283  \caption{\label{table:types} The different kinds of type parameters in \CFA}
284\end{table}
285
286A major difference between the approaches of \CC and \CFA to polymorphism is that the set of assumed properties for a type is \emph{explicit} in \CFA.
287One of the major limiting factors of \CC's approach is that templates cannot be separately compiled.
288In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled, as the function prototype states all necessary requirements separate from the implementation.
289For example, the prototype for the previous sum function is
290  \begin{cfacode}
291  forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
292  T sum(T *arr, int n);
293  \end{cfacode}
294With this prototype, a caller in another translation unit knows all of the constraints on @T@, and thus knows all of the operations that need to be made available to @sum@.
295
296In \CFA, a set of assertions can be factored into a \emph{trait}.
297\begin{cfacode}
298  trait Addable(otype T) {
299    T ?+?(T, T);
300    T ++?(T);
301    T ?++(T);
302  }
303  forall(otype T | Addable(T)) void f(T);
304  forall(otype T | Addable(T) | { T --?(T); }) T g(T);
305  forall(otype T, U | Addable(T) | { T ?/?(T, U); }) U h(T, U);
306\end{cfacode}
307This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration.
308
309An interesting application of return-type resolution and polymorphism is a type-safe version of @malloc@.
310\begin{cfacode}
311forall(dtype T | sized(T))
312T * malloc() {
313  return (T*)malloc(sizeof(T)); // call C malloc
314}
315int * x = malloc();     // malloc(sizeof(int))
316double * y = malloc();  // malloc(sizeof(double))
317
318struct S { ... };
319S * s = malloc();       // malloc(sizeof(S))
320\end{cfacode}
321The built-in trait @sized@ ensures that size and alignment information for @T@ is available in the body of @malloc@ through @sizeof@ and @_Alignof@ expressions respectively.
322In calls to @malloc@, the type @T@ is bound based on call-site information, allowing \CFA code to allocate memory without the potential for errors introduced by manually specifying the size of the allocated block.
323
324\section{Invariants}
325An \emph{invariant} is a logical assertion that is true for some duration of a program's execution.
326Invariants help a programmer to reason about code correctness and prove properties of programs.
327
328\begin{sloppypar}
329In object-oriented programming languages, type invariants are typically established in a constructor and maintained throughout the object's lifetime.
330These assertions are typically achieved through a combination of access-control modifiers and a restricted interface.
331Typically, data which requires the maintenance of an invariant is hidden from external sources using the \emph{private} modifier, which restricts reads and writes to a select set of trusted routines, including member functions.
332It is these trusted routines that perform all modifications to internal data in a way that is consistent with the invariant, by ensuring that the invariant holds true at the end of the routine call.
333\end{sloppypar}
334
335In C, the @assert@ macro is often used to ensure invariants are true.
336Using @assert@, the programmer can check a condition and abort execution if the condition is not true.
337This powerful tool forces the programmer to deal with logical inconsistencies as they occur.
338For production, assertions can be removed by simply defining the preprocessor macro @NDEBUG@, making it simple to ensure that assertions are 0-cost for a performance intensive application.
339\begin{cfacode}
340struct Rational {
341  int n, d;
342};
343struct Rational create_rational(int n, int d) {
344  assert(d != 0);  // precondition
345  if (d < 0) {
346    n *= -1;
347    d *= -1;
348  }
349  assert(d > 0);  // postcondition
350  // rational invariant: d > 0
351  return (struct Rational) { n, d };
352}
353struct Rational rat_abs(struct Rational r) {
354  assert(r.d > 0); // check invariant, since no access control
355  r.n = abs(r.n);
356  assert(r.d > 0); // ensure function preserves invariant on return value
357  return r;
358}
359\end{cfacode}
360
361Some languages, such as D, provide language-level support for specifying program invariants.
362In addition to providing a C-like @assert@ expression, D allows specifying type invariants that are automatically checked at the end of a constructor, beginning of a destructor, and at the beginning and end of every public member function.
363\begin{dcode}
364import std.math;
365struct Rational {
366  invariant {
367    assert(d > 0, "d <= 0");
368  }
369  int n, d;
370  this(int n, int d) {  // constructor
371    assert(d != 0);
372    this.n = n;
373    this.d = d;
374    // implicitly check invariant
375  }
376  Rational abs() {
377    // implicitly check invariant
378    return Rational(std.math.abs(n), d);
379    // implicitly check invariant
380  }
381}
382\end{dcode}
383The D compiler is able to assume that assertions and invariants hold true and perform optimizations based on those assumptions.
384Note, these invariants are internal to the type's correct behaviour.
385
386Types also have external invariants with the state of the execution environment, including the heap, the open-file table, the state of global variables, etc.
387Since resources are finite and shared (concurrency), it is important to ensure that objects clean up properly when they are finished, restoring the execution environment to a stable state so that new objects can reuse resources.
388
389\section{Resource Management}
390\label{s:ResMgmt}
391
392Resource management is a problem that pervades every programming language.
393
394In standard C, resource management is largely a manual effort on the part of the programmer, with a notable exception to this rule being the program stack.
395The program stack grows and shrinks automatically with each function call, as needed for local variables.
396However, whenever a program needs a variable to outlive the block it is created in, the storage must be allocated dynamically with @malloc@ and later released with @free@.
397This pattern is extended to more complex objects, such as files and sockets, which can also outlive the block where they are created, and thus require their own resource management.
398Once allocated storage escapes\footnote{In garbage collected languages, such as Java, escape analysis \cite{Choi:1999:EAJ:320385.320386} is used to determine when dynamically allocated objects are strictly contained within a function, which allows the optimizer to allocate them on the stack.} a block, the responsibility for deallocating the storage is not specified in a function's type, that is, that the return value is owned by the caller.
399This implicit convention is provided only through documentation about the expectations of functions.
400
401In other languages, a hybrid situation exists where resources escape the allocation block, but ownership is precisely controlled by the language.
402This pattern requires a strict interface and protocol for a data structure, consisting of a pre-initialization and a post-termination call, and all intervening access is done via interface routines.
403This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource-management cases.
404
405For example, \CC directly supports this pattern through class types and an idiom known as RAII \footnote{Resource Acquisition is Initialization} by means of constructors and destructors.
406Constructors and destructors are special routines that are automatically inserted into the appropriate locations to bookend the lifetime of an object.
407Constructors allow the designer of a type to establish invariants for objects of that type, since it is guaranteed that every object must be initialized through a constructor.
408In particular, constructors allow a programmer to ensure that all objects are initially set to a valid state.
409On the other hand, destructors provide a simple mechanism for tearing down an object and resetting the environment in which the object lived.
410RAII ensures that if all resources are acquired in a constructor and released in a destructor, there are no resource leaks, even in exceptional circumstances.
411A type with at least one non-trivial constructor or destructor is henceforth referred to as a \emph{managed type}.
412In the context of \CFA, a non-trivial constructor is either a user defined constructor or an auto-generated constructor that calls a non-trivial constructor.
413
414For the remaining resource ownership cases, a programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol enforced by the programming language.
415
416In garbage collected languages, such as Java, resources are largely managed by the garbage collector.
417Still, garbage collectors typically focus only on memory management.
418There are many kinds of resources that the garbage collector does not understand, such as sockets, open files, and database connections.
419In particular, Java supports \emph{finalizers}, which are similar to destructors.
420Unfortunately, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious.
421Due to operating-system resource-limits, this is unacceptable for many long running programs.
422Instead, the paradigm in Java requires programmers to manually keep track of all resources \emph{except} memory, leading many novices and experts alike to forget to close files, etc.
423Complicating the picture, uncaught exceptions can cause control flow to change dramatically, leaking a resource that appears on first glance to be released.
424\begin{javacode}
425void write(String filename, String msg) throws Exception {
426  FileOutputStream out = new FileOutputStream(filename);
427  FileOutputStream log = new FileOutputStream(filename);
428  out.write(msg.getBytes());
429  log.write(msg.getBytes());
430  log.close();
431  out.close();
432}
433\end{javacode}
434Any line in this program can throw an exception, which leads to a profusion of finally blocks around many function bodies, since it is not always clear when an exception may be thrown.
435\begin{javacode}
436public void write(String filename, String msg) throws Exception {
437  FileOutputStream out = new FileOutputStream(filename);
438  try {
439    FileOutputStream log = new FileOutputStream("log.txt");
440    try {
441      out.write(msg.getBytes());
442      log.write(msg.getBytes());
443    } finally {
444      log.close();
445    }
446  } finally {
447    out.close();
448  }
449}
450\end{javacode}
451In Java 7, a new \emph{try-with-resources} construct was added to alleviate most of the pain of working with resources, but ultimately it still places the burden squarely on the user rather than on the library designer.
452Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources that can throw an exception on close can leak nested resources \cite{TryWithResources}.
453\begin{javacode}
454public void write(String filename, String msg) throws Exception {
455  try (  // try-with-resources
456    FileOutputStream out = new FileOutputStream(filename);
457    FileOutputStream log = new FileOutputStream("log.txt");
458  ) {
459    out.write(msg.getBytes());
460    log.write(msg.getBytes());
461  } // automatically closes out and log in every exceptional situation
462}
463\end{javacode}
464Variables declared as part of a try-with-resources statement must conform to the @AutoClosable@ interface, and the compiler implicitly calls @close@ on each of the variables at the end of the block.
465Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is automatically guarded and conditionally executed to prevent null-pointer exceptions.
466
467While Rust \cite{Rust} does not enforce the use of a garbage collector, it does provide a manual memory management environment, with a strict ownership model that automatically frees allocated memory and prevents common memory management errors.
468In particular, a variable has ownership over its associated value, which is freed automatically when the owner goes out of scope.
469Furthermore, values are \emph{moved} by default on assignment, rather than copied, which invalidates the previous variable binding.
470\begin{rustcode}
471struct S {
472  x: i32
473}
474let s = S { x: 123 };
475let z = s;           // move, invalidate s
476println!("{}", s.x); // error, s has been moved
477\end{rustcode}
478Types can be made copyable by implementing the @Copy@ trait.
479
480Rust allows multiple unowned views into an object through references, also known as borrows, provided that a reference does not outlive its referent.
481A mutable reference is allowed only if it is the only reference to its referent, preventing data race errors and iterator invalidation errors.
482\begin{rustcode}
483let mut x = 10;
484{
485  let y = &x;
486  let z = &x;
487  println!("{} {}", y, z); // prints 10 10
488}
489{
490  let y = &mut x;
491  // let z1 = &x;     // not allowed, have mutable reference
492  // let z2 = &mut x; // not allowed, have mutable reference
493  *y = 5;
494  println!("{}", y); // prints 5
495}
496println!("{}", x); // prints 5
497\end{rustcode}
498Since references are not owned, they do not release resources when they go out of scope.
499There is no runtime cost imposed on these restrictions, since they are enforced at compile-time.
500
501Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, providing automatic clean up of auxiliary resources, much like a \CC program.
502\begin{rustcode}
503struct S {
504  name: &'static str
505}
506
507impl Drop for S {  // RAII for S
508  fn drop(&mut self) {  // destructor
509    println!("dropped {}", self.name);
510  }
511}
512
513{
514  let x = S { name: "x" };
515  let y = S { name: "y" };
516} // prints "dropped y" "dropped x"
517\end{rustcode}
518
519% D has constructors and destructors that are worth a mention (under classes) https://dlang.org/spec/spec.html
520%  also https://dlang.org/spec/struct.html#struct-constructor
521% these are declared in the struct, so they're closer to C++ than to CFA, at least syntactically. Also do not allow for default constructors
522% D has a GC, which already makes the situation quite different from C/C++
523The programming language, D, also manages resources with constructors and destructors \cite{D}.
524In D, @struct@s are stack allocated and managed via scoping like in \CC, whereas @class@es are managed automatically by the garbage collector.
525Like Java, using the garbage collector means that destructors are called indeterminately, requiring the use of finally statements to ensure dynamically allocated resources that are not managed by the garbage collector, such as open files, are cleaned up.
526Since D supports RAII, it is possible to use the same techniques as in \CC to ensure that resources are released in a timely manner.
527Finally, D provides a scope guard statement, which allows an arbitrary statement to be executed at normal scope exit with \emph{success}, at exceptional scope exit with \emph{failure}, or at normal and exceptional scope exit with \emph{exit}. % https://dlang.org/spec/statement.html#ScopeGuardStatement
528It has been shown that the \emph{exit} form of the scope guard statement can be implemented in a library in \CC \cite{ExceptSafe}.
529
530To provide managed types in \CFA, new kinds of constructors and destructors are added to \CFA and discussed in Chapter 2.
531
532\section{Tuples}
533\label{s:Tuples}
534In mathematics, tuples are finite-length sequences which, unlike sets, are ordered and allow duplicate elements.
535In programming languages, tuples provide fixed-sized heterogeneous lists of elements.
536Many programming languages have tuple constructs, such as SETL, \KWC, ML, and Scala.
537
538\KWC, a predecessor of \CFA, introduced tuples to C as an extension of the C syntax, rather than as a full-blown data type \cite{Till89}.
539In particular, Till noted that C already contains a tuple context in the form of function parameter lists.
540The main contributions of that work were in the form of adding tuple contexts to assignment in the form of multiple assignment and mass assignment (discussed in detail in section \ref{s:TupleAssignment}), function return values (see section \ref{s:MRV_Functions}), and record field access (see section \ref{s:MemberAccessTuple}).
541Adding tuples to \CFA has previously been explored by Esteves \cite{Esteves04}.
542
543The design of tuples in \KWC took much of its inspiration from SETL \cite{SETL}.
544SETL is a high-level mathematical programming language, with tuples being one of the primary data types.
545Tuples in SETL allow a number of operations, including subscripting, dynamic expansion, and multiple assignment.
546
547\CCeleven introduced @std::tuple@ as a library variadic template struct.
548Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values.
549\begin{cppcode}
550tuple<int, int, int> triple(10, 20, 30);
551get<1>(triple); // access component 1 => 20
552
553tuple<int, double> f();
554int i;
555double d;
556tie(i, d) = f(); // assign fields of return value into local variables
557
558tuple<int, int, int> greater(11, 0, 0);
559triple < greater; // true
560\end{cppcode}
561Tuples are simple data structures with few specific operations.
562In particular, it is possible to access a component of a tuple using @std::get<N>@.
563Another interesting feature is @std::tie@, which creates a tuple of references, allowing assignment of the results of a tuple-returning function into separate local variables, without requiring a temporary variable.
564Tuples also support lexicographic comparisons, making it simple to write aggregate comparators using @std::tie@.
565
566There is a proposal for \CCseventeen called \emph{structured bindings} \cite{StructuredBindings}, that introduces new syntax to eliminate the need to pre-declare variables and use @std::tie@ for binding the results from a function call.
567\begin{cppcode}
568tuple<int, double> f();
569auto [i, d] = f(); // unpacks into new variables i, d
570
571tuple<int, int, int> triple(10, 20, 30);
572auto & [t1, t2, t3] = triple;
573t2 = 0; // changes middle element of triple
574
575struct S { int x; double y; };
576S s = { 10, 22.5 };
577auto [x, y] = s; // unpack s
578\end{cppcode}
579Structured bindings allow unpacking any structure with all public non-static data members into fresh local variables.
580The use of @&@ allows declaring new variables as references, which is something that cannot be done with @std::tie@, since \CC references do not support rebinding.
581This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism.
582Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables.
583
584Like \CC, D provides tuples through a library variadic-template structure.
585In D, it is possible to name the fields of a tuple type, which creates a distinct type.
586% http://dlang.org/phobos/std_typecons.html
587\begin{dcode}
588Tuple!(float, "x", float, "y") point2D;
589Tuple!(float, float) float2;  // different type from point2D
590
591point2D[0]; // access first element
592point2D.x;  // access first element
593
594float f(float x, float y) {
595  return x+y;
596}
597
598f(point2D.expand);
599\end{dcode}
600Tuples are 0-indexed and can be subscripted using an integer or field name, if applicable.
601The @expand@ method produces the components of the tuple as a list of separate values, making it possible to call a function that takes $N$ arguments using a tuple with $N$ components.
602
603Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML \cite{sml}.
604A function in SML always accepts exactly one argument.
605There are two ways to mimic multiple argument functions: the first through currying and the second by accepting tuple arguments.
606\begin{smlcode}
607fun fact (n : int) =
608  if (n = 0) then 1
609  else n*fact(n-1)
610
611fun binco (n: int, k: int) =
612  real (fact n) / real (fact k * fact (n-k))
613\end{smlcode}
614Here, the function @binco@ appears to take 2 arguments, but it actually takes a single argument which is implicitly decomposed via pattern matching.
615Tuples are a foundational tool in SML, allowing the creation of arbitrarily-complex structured data-types.
616
617Scala, like \CC, provides tuple types through the standard library \cite{Scala}.
618Scala provides tuples of size 1 through 22 inclusive through generic data structures.
619Tuples support named access and subscript access, among a few other operations.
620\begin{scalacode}
621val a = new Tuple3(0, "Text", 2.1) // explicit creation
622val b = (6, 'a', 1.1f)       // syntactic sugar: Tuple3[Int, Char, Float]
623val (i, _, d) = triple       // extractor syntax, ignore middle element
624
625println(a._2)                // named access => print "Text"
626println(b.productElement(0)) // subscript access => print 6
627\end{scalacode}
628In Scala, tuples are primarily used as simple data structures for carrying around multiple values or for returning multiple values from a function.
629The 22-element restriction is an odd and arbitrary choice, but in practice it does not cause problems since large tuples are uncommon.
630Subscript access is provided through the @productElement@ method, which returns a value of the top-type @Any@, since it is impossible to receive a more precise type from a general subscripting method due to type erasure.
631The disparity between named access beginning at @_1@ and subscript access starting at @0@ is likewise an oddity, but subscript access is typically avoided since it discards type information.
632Due to the language's pattern matching facilities, it is possible to extract the values from a tuple into named variables, which is a more idiomatic way of accessing the components of a tuple.
633
634
635\Csharp also has tuples, but has similarly strange limitations, allowing tuples of size up to 7 components. % https://msdn.microsoft.com/en-us/library/system.tuple(v=vs.110).aspx
636The officially supported workaround for this shortcoming is to nest tuples in the 8th component.
637\Csharp allows accessing a component of a tuple by using the field @Item$N$@ for components 1 through 7, and @Rest@ for the nested tuple.
638
639In Python \cite{Python}, tuples are immutable sequences that provide packing and unpacking operations.
640While the tuple itself is immutable, and thus does not allow the assignment of components, there is nothing preventing a component from being internally mutable.
641The components of a tuple can be accessed by unpacking into multiple variables, indexing, or via field name, like D.
642Tuples support multiple assignment through a combination of packing and unpacking, in addition to the common sequence operations.
643
644Swift \cite{Swift}, like D, provides named tuples, with components accessed by name, index, or via extractors.
645Tuples are primarily used for returning multiple values from a function.
646In Swift, @Void@ is an alias for the empty tuple, and there are no single element tuples.
647
648Tuples comparable to those described above are added to \CFA and discussed in Chapter 3.
649
650\section{Variadic Functions}
651\label{sec:variadic_functions}
652In statically-typed programming languages, functions are typically defined to receive a fixed number of arguments of specified types.
653Variadic argument functions provide the ability to define a function that can receive a theoretically unbounded number of arguments.
654
655C provides a simple implementation of variadic functions.
656A function whose parameter list ends with @, ...@ is a variadic function.
657Among the most common variadic functions is @printf@.
658\begin{cfacode}
659int printf(const char * fmt, ...);
660printf("%d %g %c %s", 10, 3.5, 'X', "a string");
661\end{cfacode}
662Through the use of a format string, C programmers can communicate argument type information to @printf@, allowing C programmers to print any of the standard C data types.
663Still, @printf@ is extremely limited, since the format codes are specified by the C standard, meaning users cannot define their own format codes to extend @printf@ for new data types or new formatting rules.
664
665\begin{sloppypar}
666C provides manipulation of variadic arguments through the @va_list@ data type, which abstracts details of the manipulation of variadic arguments.
667Since the variadic arguments are untyped, it is up to the function to interpret any data that is passed in.
668Additionally, the interface to manipulate @va_list@ objects is essentially limited to advancing to the next argument, without any built-in facility to determine when the last argument is read.
669This limitation requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types.
670The format string in @printf@ is one such example of an argument descriptor.
671\begin{cfacode}
672int f(const char * fmt, ...) {
673  va_list args;
674  va_start(args, fmt);  // initialize va_list
675  for (const char * c = fmt; *c != '\0'; ++c) {
676    if (*c == '%') {
677      ++c;
678      switch (*c) {
679        case 'd': {
680          int i = va_arg(args, int);  // have to specify type
681          // ...
682          break;
683        }
684        case 'g': {
685          double d = va_arg(args, double);
686          // ...
687          break;
688        }
689        ...
690      }
691    }
692  }
693  va_end(args);
694  return ...;
695}
696\end{cfacode}
697Every case must be handled explicitly, since the @va_arg@ macro requires a type argument to determine how the next set of bytes is to be interpreted.
698Furthermore, if the user makes a mistake, compile-time checking is typically restricted to standard format codes and their corresponding types.
699In general, this means that C's variadic functions are not type-safe, making them difficult to use properly.
700\end{sloppypar}
701
702% When arguments are passed to a variadic function, they undergo \emph{default argument promotions}.
703% Specifically, this means that
704
705\CCeleven added support for \emph{variadic templates}, which add much needed type-safety to C's variadic landscape.
706It is possible to use variadic templates to define variadic functions and variadic data types.
707\begin{cppcode}
708void print(int);
709void print(char);
710void print(double);
711...
712
713void f() {}    // base case
714
715template<typename T, typename... Args>
716void f(const T & arg, const Args &... rest) {
717  print(arg);  // print the current element
718  f(rest...);  // handle remaining arguments recursively
719}
720\end{cppcode}
721Variadic templates work largely through recursion on the \emph{parameter pack}, which is the argument with @...@ following its type.
722A parameter pack matches 0 or more elements, which can be types or expressions depending on the context.
723Like other templates, variadic template functions rely on an implicit set of constraints on a type, in this example a @print@ routine.
724That is, it is possible to use the @f@ routine on any type provided there is a corresponding @print@ routine, making variadic templates fully open to extension, unlike variadic functions in C.
725
726Recent \CC standards (\CCfourteen, \CCseventeen) expand on the basic premise by allowing variadic template variables and providing convenient expansion syntax to remove the need for recursion in some cases, amongst other things.
727
728% D has variadic templates that deserve a mention http://dlang.org/ctarguments.html
729
730In Java, a variadic function appears similar to a C variadic function in syntax.
731\begin{javacode}
732int sum(int... args) {
733  int s = 0;
734  for (int x : args) {
735    s += x;
736  }
737  return s;
738}
739
740void print(Object... objs) {
741  for (Object obj : objs) {
742    System.out.print(obj);
743  }
744}
745
746print("The sum from 1 to 10 is ", sum(1,2,3,4,5,6,7,8,9,10), ".\n");
747\end{javacode}
748The key difference is that Java variadic functions are type-safe, because they specify the type of the argument immediately prior to the ellipsis.
749In Java, variadic arguments are syntactic sugar for arrays, allowing access to length, subscripting operations, and for-each iteration on the variadic arguments, among other things.
750Since the argument type is specified explicitly, the top-type @Object@ can be used to accept arguments of any type, but to do anything interesting on the argument requires a down-cast to a more specific type, landing Java in a similar situation to C in that writing a function open to extension is difficult.
751
752The other option is to restrict the number of types that can be passed to the function by using a more specific type.
753Unfortunately, Java's use of nominal inheritance means that types must explicitly inherit from classes or interfaces in order to be considered a subclass.
754The combination of these two issues greatly restricts the usefulness of variadic functions in Java.
755
756Type-safe variadic functions are added to \CFA and discussed in Chapter 4.
Note: See TracBrowser for help on using the repository browser.