Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/rob_thesis/intro.tex

    r0111dc7 r12d3187  
    1919Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated.
    2020
    21 The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail.
     21The current implementation of \CFA is a source-to-source translator from \CFA to GNU C \cite{GCCExtensions}.
     22
     23The remainder of this section describes some of the important features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail.
    2224
    2325\subsection{C Background}
    2426\label{sub:c_background}
     27In the context of this work, the term \emph{object} refers to a region of data storage in the execution environment, the contents of which can represent values \cite[p.~6]{C11}.
     28
    2529One of the lesser-known features of standard C is \emph{designations}.
    2630Designations are similar to named parameters in languages such as Python and Scala, except that they only apply to aggregate initializers.
     31Note that in \CFA, designations use a colon separator, rather than an equals sign as in C, because this syntax is one of the few places that conflicts with the new language features.
    2732\begin{cfacode}
    2833struct A {
     
    4348Later initializers override earlier initializers, so a sub-object for which there is more than one initializer is only initialized by its last initializer.
    4449These semantics can be seen in the initialization of @a0@, where @x@ is designated twice, and thus initialized to @8@.
    45 Note that in \CFA, designations use a colon separator, rather than an equals sign as in C, because this syntax is one of the few places that conflicts with the new language features.
    4650
    4751C also provides \emph{compound literal} expressions, which provide a first-class mechanism for creating unnamed objects.
     
    5761Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}.
    5862Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and coercions and is never an lvalue.
     63
     64The \CFA translator makes use of several GNU C extensions, including \emph{nested functions} and \emph{attributes}.
     65Nested functions make it possible to access data that is lexically in scope in the nested function's body.
     66\begin{cfacode}
     67int f() {
     68  int x = 0;
     69  void g() {
     70    x++;
     71  }
     72  g();  // changes x
     73}
     74\end{cfacode}
     75Nested functions come with the usual C caveat that they should not leak into the containing environment, since they are only valid as long as the containing function's stack frame is active.
     76
     77Attributes make it possible to inform the compiler of certain properties of the code.
     78For example, a function can be marked as deprecated, so that legacy APIs can be identified and slowly removed, or as \emph{hot}, so that the compiler knows the function is called frequently and should be aggresively optimized.
     79\begin{cfacode}
     80__attribute__((deprecated("foo is deprecated, use bar instead")))
     81void foo();
     82__attribute__((hot)) void bar(); // heavily optimized
     83
     84foo();  // warning
     85bar();
     86\end{cfacode}
    5987
    6088\subsection{Overloading}
     
    6492C provides a small amount of built-in overloading, \eg + is overloaded for the basic types.
    6593Like in \CC, \CFA allows user-defined overloading based both on the number of parameters and on the types of parameters.
    66   \begin{cfacode}
    67   void f(void);  // (1)
    68   void f(int);   // (2)
    69   void f(char);  // (3)
    70 
    71   f('A');        // selects (3)
    72   \end{cfacode}
     94\begin{cfacode}
     95void f(void);  // (1)
     96void f(int);   // (2)
     97void f(char);  // (3)
     98
     99f('A');        // selects (3)
     100\end{cfacode}
    73101In this case, there are three @f@ procedures, where @f@ takes either 0 or 1 arguments, and if an argument is provided then it may be of type @int@ or of type @char@.
    74102Exactly which procedure is executed depends on the number and types of arguments passed.
    75103If there is no exact match available, \CFA attempts to find a suitable match by examining the C built-in conversion heuristics.
    76   \begin{cfacode}
    77   void g(long long);
    78 
    79   g(12345);
    80   \end{cfacode}
     104The \CFA expression resolution algorithm uses a cost function to determine the interpretation that uses the fewest conversions and polymorphic type bindings.
     105\begin{cfacode}
     106void g(long long);
     107
     108g(12345);
     109\end{cfacode}
    81110In the above example, there is only one instance of @g@, which expects a single parameter of type @long long@.
    82111Here, the argument provided has type @int@, but since all possible values of type @int@ can be represented by a value of type @long long@, there is a safe conversion from @int@ to @long long@, and so \CFA calls the provided @g@ routine.
    83112
     113Overloading solves the problem present in C where there can only be one function with a given name, requiring multiple names for functions that perform the same operation but take in different types.
     114This can be seen in the example of the absolute value functions C:
     115\begin{cfacode}
     116// stdlib.h
     117int abs(int);
     118long int labs(long int);
     119long long int llabs(long long int);
     120\end{cfacode}
     121In \CFA, the functions @labs@ and @llabs@ are replaced by appropriate overloads of @abs@.
     122
    84123In addition to this form of overloading, \CFA also allows overloading based on the number and types of \emph{return} values.
    85124This extension is a feature that is not available in \CC, but is available in other programming languages such as Ada \cite{Ada95}.
    86   \begin{cfacode}
    87   int g();         // (1)
    88   double g();      // (2)
    89 
    90   int x = g();     // selects (1)
    91   \end{cfacode}
     125\begin{cfacode}
     126int g();         // (1)
     127double g();      // (2)
     128
     129int x = g();     // selects (1)
     130\end{cfacode}
    92131Here, the only difference between the signatures of the different versions of @g@ is in the return values.
    93132The result context is used to select an appropriate routine definition.
    94133In this case, the result of @g@ is assigned into a variable of type @int@, so \CFA prefers the routine that returns a single @int@, because it is an exact match.
     134
     135Return-type overloading solves similar problems to parameter-list overloading, in that multiple functions that perform similar operations can have the same, but produce different values.
     136One use case for this feature is to provide two versions of the @bsearch@ routine:
     137\begin{cfacode}
     138forall(otype T | { int ?<?( T, T ); })
     139T * bsearch(T key, const T * arr, size_t dimension) {
     140  int comp(const void * t1, const void * t2) {
     141    return *(T *)t1 < *(T *)t2 ? -1 : *(T *)t2 < *(T *)t1 ? 1 : 0;
     142  }
     143  return (T *)bsearch(&key, arr, dimension, sizeof(T), comp);
     144}
     145forall(otype T | { int ?<?( T, T ); })
     146unsigned int bsearch(T key, const T * arr, size_t dimension) {
     147  T *result = bsearch(key, arr, dimension);
     148  // pointer subtraction includes sizeof(T)
     149  return result ? result - arr : dimension;
     150}
     151double key = 5.0;
     152double vals[10] = { /* 10 floating-point values */ };
     153
     154double * val = bsearch( 5.0, vals, 10 ); // selection based on return type
     155int posn = bsearch( 5.0, vals, 10 );
     156\end{cfacode}
     157The first version provides a thin wrapper around the C @bsearch@ routine, converting untyped @void *@ to the polymorphic type @T *@, allowing the \CFA compiler to catch errors when the type of @key@, @arr@, and the target at the call-site do not agree.
     158The second version provides an alternate return of the index in the array of the selected element, rather than its address.
    95159
    96160There are times when a function should logically return multiple values.
     
    145209
    146210An extra quirk introduced by multiple return values is in the resolution of function calls.
    147   \begin{cfacode}
    148   int f();            // (1)
    149   [int, int] f();     // (2)
    150 
    151   void g(int, int);
    152 
    153   int x, y;
    154   [x, y] = f();       // selects (2)
    155   g(f());             // selects (2)
    156   \end{cfacode}
     211\begin{cfacode}
     212int f();            // (1)
     213[int, int] f();     // (2)
     214
     215void g(int, int);
     216
     217int x, y;
     218[x, y] = f();       // selects (2)
     219g(f());             // selects (2)
     220\end{cfacode}
    157221In this example, the only possible call to @f@ that can produce the two @int@s required for assigning into the variables @x@ and @y@ is the second option.
    158222A similar reasoning holds calling the function @g@.
     223
     224This duality between aggregation and aliasing can be seen in the C standard library in the @div@ and @remquo@ functions, which return the quotient and remainder for a division of integer and floating-point values, respectively.
     225\begin{cfacode}
     226typedef struct { int quo, rem; } div_t; // from stdlib.h
     227div_t div( int num, int den );
     228double remquo( double num, double den, int * quo );
     229div_t qr = div( 13, 5 );            // return quotient/remainder aggregate
     230int q;
     231double r = remquo( 13.5, 5.2, &q ); // return remainder, alias quotient
     232\end{cfacode}
     233@div@ aggregates the quotient/remainder in a structure, while @remquo@ aliases a parameter to an argument.
     234Alternatively, a programming language can directly support returning multiple values, \eg in \CFA:
     235\begin{lstlisting}
     236[int, int] div(int num, int den);               // return two integers
     237[double, double] div( double num, double den ); // return two doubles
     238int q, r;                     // overloaded variable names
     239double q, r;
     240[q, r] = div(13, 5);          // select appropriate div and q, r
     241[q, r] = div(13.5, 5.2);
     242\end{lstlisting}
    159243
    160244In \CFA, overloading also applies to operator names, known as \emph{operator overloading}.
    161245Similar to function overloading, a single operator is given multiple meanings by defining new versions of the operator with different signatures.
    162246In \CC, this can be done as follows
    163   \begin{cppcode}
    164   struct A { int i; };
    165   int operator+(A x, A y);
    166   bool operator<(A x, A y);
    167   \end{cppcode}
     247\begin{cppcode}
     248struct A { int i; };
     249A operator+(A x, A y);
     250bool operator<(A x, A y);
     251\end{cppcode}
    168252
    169253In \CFA, the same example can be written as follows.
    170   \begin{cfacode}
    171   struct A { int i; };
    172   int ?+?(A x, A y);    // '?'s represent operands
    173   bool ?<?(A x, A y);
    174   \end{cfacode}
     254\begin{cfacode}
     255struct A { int i; };
     256A ?+?(A x, A y);    // '?'s represent operands
     257int ?<?(A x, A y);
     258\end{cfacode}
    175259Notably, the only difference is syntax.
    176260Most of the operators supported by \CC for operator overloading are also supported in \CFA.
     
    179263Finally, \CFA also permits overloading variable identifiers.
    180264This feature is not available in \CC.
    181   \begin{cfacode}
    182   struct Rational { int numer, denom; };
    183   int x = 3;               // (1)
    184   double x = 1.27;         // (2)
    185   Rational x = { 4, 11 };  // (3)
    186 
    187   void g(double);
    188 
    189   x += 1;                  // chooses (1)
    190   g(x);                    // chooses (2)
    191   Rational y = x;          // chooses (3)
    192   \end{cfacode}
     265\begin{cfacode}
     266struct Rational { int numer, denom; };
     267int x = 3;               // (1)
     268double x = 1.27;         // (2)
     269Rational x = { 4, 11 };  // (3)
     270
     271void g(double);
     272
     273x += 1;                  // chooses (1)
     274g(x);                    // chooses (2)
     275Rational y = x;          // chooses (3)
     276\end{cfacode}
    193277In this example, there are three definitions of the variable @x@.
    194278Based on the context, \CFA attempts to choose the variable whose type best matches the expression context.
     
    208292Due to these rewrite rules, the values @0@ and @1@ have the types \zero and \one in \CFA, which allow for overloading various operations that connect to @0@ and @1@ \footnote{In the original design of \CFA, @0@ and @1@ were overloadable names \cite[p.~7]{cforall}.}.
    209293The types \zero and \one have special built-in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
    210   \begin{cfacode}
    211   // lvalue is similar to returning a reference in C++
    212   lvalue Rational ?+=?(Rational *a, Rational b);
    213   Rational ?=?(Rational * dst, zero_t) {
    214     return *dst = (Rational){ 0, 1 };
    215   }
    216 
    217   Rational sum(Rational *arr, int n) {
    218     Rational r;
    219     r = 0;     // use rational-zero_t assignment
    220     for (; n > 0; n--) {
    221       r += arr[n-1];
    222     }
    223     return r;
    224   }
    225   \end{cfacode}
     294\begin{cfacode}
     295// lvalue is similar to returning a reference in C++
     296lvalue Rational ?+=?(Rational *a, Rational b);
     297Rational ?=?(Rational * dst, zero_t) {
     298  return *dst = (Rational){ 0, 1 };
     299}
     300
     301Rational sum(Rational *arr, int n) {
     302  Rational r;
     303  r = 0;     // use rational-zero_t assignment
     304  for (; n > 0; n--) {
     305    r += arr[n-1];
     306  }
     307  return r;
     308}
     309\end{cfacode}
    226310This function takes an array of @Rational@ objects and produces the @Rational@ representing the sum of the array.
    227311Note the use of an overloaded assignment operator to set an object of type @Rational@ to an appropriate @0@ value.
     
    232316In particular, \CFA supports the notion of parametric polymorphism.
    233317Parametric polymorphism allows a function to be written generically, for all values of all types, without regard to the specifics of a particular type.
    234 For example, in \CC, the simple identity function for all types can be written as
    235   \begin{cppcode}
    236   template<typename T>
    237   T identity(T x) { return x; }
    238   \end{cppcode}
    239 \CC uses the template mechanism to support parametric polymorphism. In \CFA, an equivalent function can be written as
    240   \begin{cfacode}
    241   forall(otype T)
    242   T identity(T x) { return x; }
    243   \end{cfacode}
     318For example, in \CC, the simple identity function for all types can be written as:
     319\begin{cppcode}
     320template<typename T>
     321T identity(T x) { return x; }
     322\end{cppcode}
     323\CC uses the template mechanism to support parametric polymorphism. In \CFA, an equivalent function can be written as:
     324\begin{cfacode}
     325forall(otype T)
     326T identity(T x) { return x; }
     327\end{cfacode}
    244328Once again, the only visible difference in this example is syntactic.
    245329Fundamental differences can be seen by examining more interesting examples.
    246 In \CC, a generic sum function is written as follows
    247   \begin{cppcode}
    248   template<typename T>
    249   T sum(T *arr, int n) {
    250     T t;  // default construct => 0
    251     for (; n > 0; n--) t += arr[n-1];
    252     return t;
    253   }
    254   \end{cppcode}
     330In \CC, a generic sum function is written as follows:
     331\begin{cppcode}
     332template<typename T>
     333T sum(T *arr, int n) {
     334  T t;  // default construct => 0
     335  for (; n > 0; n--) t += arr[n-1];
     336  return t;
     337}
     338\end{cppcode}
    255339Here, the code assumes the existence of a default constructor, assignment operator, and an addition operator over the provided type @T@.
    256340If any of these required operators are not available, the \CC compiler produces an error message stating which operators could not be found.
    257341
    258 A similar sum function can be written in \CFA as follows
    259   \begin{cfacode}
    260   forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
    261   T sum(T *arr, int n) {
    262     T t = 0;
    263     for (; n > 0; n--) t = t += arr[n-1];
    264     return t;
    265   }
    266   \end{cfacode}
     342A similar sum function can be written in \CFA as follows:
     343\begin{cfacode}
     344forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
     345T sum(T *arr, int n) {
     346  T t = 0;
     347  for (; n > 0; n--) t = t += arr[n-1];
     348  return t;
     349}
     350\end{cfacode}
    267351The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@.
    268352In particular, the assertions above specify that there must be an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
    269353The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class.
    270354In addition to @otype@, there are currently two other type-classes.
     355
     356@dtype@, short for \emph{data type}, serves as the top type for object types; any object type, complete or incomplete, can be bound to a @dtype@ type variable.
     357To contrast, @otype@, short for \emph{object type}, is a @dtype@ with known size, alignment, and an assignment operator, and thus bind only to complete object types.
     358With this extra information, complete objects can be used in polymorphic code in the same way they are used in monomorphic code, providing familiarity and ease of use.
     359The third type-class is @ftype@, short for \emph{function type}, matching only function types.
    271360The three type parameter kinds are summarized in \autoref{table:types}
    272361
     
    275364    \begin{tabular}{|c||c|c|c||c|c|c|}
    276365                                                                                                    \hline
    277     name    & object type & incomplete type & function type & can assign value & can create & has size \\ \hline
     366    name    & object type & incomplete type & function type & can assign & can create & has size \\ \hline
    278367    @otype@ & X           &                 &               & X                & X          & X        \\ \hline
    279368    @dtype@ & X           & X               &               &                  &            &          \\ \hline
     
    288377In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled, as the function prototype states all necessary requirements separate from the implementation.
    289378For example, the prototype for the previous sum function is
    290   \begin{cfacode}
    291   forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
    292   T sum(T *arr, int n);
    293   \end{cfacode}
     379\begin{cfacode}
     380forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
     381T sum(T *arr, int n);
     382\end{cfacode}
    294383With this prototype, a caller in another translation unit knows all of the constraints on @T@, and thus knows all of the operations that need to be made available to @sum@.
    295384
    296385In \CFA, a set of assertions can be factored into a \emph{trait}.
    297386\begin{cfacode}
    298   trait Addable(otype T) {
    299     T ?+?(T, T);
    300     T ++?(T);
    301     T ?++(T);
    302   }
    303   forall(otype T | Addable(T)) void f(T);
    304   forall(otype T | Addable(T) | { T --?(T); }) T g(T);
    305   forall(otype T, U | Addable(T) | { T ?/?(T, U); }) U h(T, U);
     387trait Addable(otype T) {
     388  T ?+?(T, T);
     389  T ++?(T);
     390  T ?++(T);
     391}
     392forall(otype T | Addable(T)) void f(T);
     393forall(otype T | Addable(T) | { T --?(T); }) T g(T);
     394forall(otype T, U | Addable(T) | { T ?/?(T, U); }) U h(T, U);
    306395\end{cfacode}
    307396This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration.
    308397
    309 An interesting application of return-type resolution and polymorphism is a type-safe version of @malloc@.
     398An interesting application of return-type resolution and polymorphism is a polymorphic version of @malloc@.
    310399\begin{cfacode}
    311400forall(dtype T | sized(T))
     
    321410The built-in trait @sized@ ensures that size and alignment information for @T@ is available in the body of @malloc@ through @sizeof@ and @_Alignof@ expressions respectively.
    322411In calls to @malloc@, the type @T@ is bound based on call-site information, allowing \CFA code to allocate memory without the potential for errors introduced by manually specifying the size of the allocated block.
     412
     413\subsection{Planned Features}
     414
     415One of the planned features \CFA is \emph{reference types}.
     416At a high level, the current proposal is to add references as a way to cleanup pointer syntax.
     417With references, it will be possible to store any address, as with a pointer, with the key difference being that references are automatically dereferenced.
     418\begin{cfacode}
     419int x = 0;
     420int * p = &x;  // needs &
     421int & ref = x; // no &
     422
     423printf("%d %d\n", *p, ref); // pointer needs *, ref does not
     424\end{cfacode}
     425
     426It is possible to add new functions or shadow existing functions for the duration of a scope, using normal C scoping rules.
     427One application of this feature is to reverse the order of @qsort@.
     428\begin{cfacode}
     429forall(otype T | { int ?<?( T, T ); })
     430void qsort(const T * arr, size_t size) {
     431  int comp(const void * t1, const void * t2) {
     432    return *(T *)t1 < *(T *)t2 ? -1 : *(T *)t2 < *(T *)t1 ? 1 : 0;
     433  }
     434  qsort(arr, dimension, sizeof(T), comp);
     435
     436}
     437double vals[10] = { ... };
     438qsort(vals, 10);                // ascending order
     439{
     440  int ?<?(double x, double y) { // locally override behaviour
     441    return x > y;
     442  }
     443  qsort(vals, 10);              // descending sort
     444}
     445\end{cfacode}
     446Currently, there is no way to \emph{remove} a function from consideration from the duration of a scope.
     447For example, it may be desirable to eliminate assignment from a scope, to reduce accidental mutation.
     448To address this desire, \emph{deleted functions} are a planned feature for \CFA.
     449\begin{cfacode}
     450forall(otype T) void f(T *);
     451
     452int x = 0;
     453f(&x);  // might modify x
     454{
     455  int ?=?(int *, int) = delete;
     456  f(&x);   // error, no assignment for int
     457}
     458\end{cfacode}
     459Now, if the deleted function is chosen as the best match, the expression resolver emits an error.
    323460
    324461\section{Invariants}
     
    450587\end{javacode}
    451588In Java 7, a new \emph{try-with-resources} construct was added to alleviate most of the pain of working with resources, but ultimately it still places the burden squarely on the user rather than on the library designer.
    452 Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources that can throw an exception on close can leak nested resources \cite{TryWithResources}.
     589Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources that can throw an exception on close can leak nested resources \footnote{Since close is only guaranteed to be called on objects declared in the try-list and not objects passed as constructor parameters, the @B@ object may not be closed in @new A(new B())@ if @A@'s close raises an exception.} \cite{TryWithResources}.
    453590\begin{javacode}
    454591public void write(String filename, String msg) throws Exception {
     
    521658% these are declared in the struct, so they're closer to C++ than to CFA, at least syntactically. Also do not allow for default constructors
    522659% D has a GC, which already makes the situation quite different from C/C++
    523 The programming language, D, also manages resources with constructors and destructors \cite{D}.
    524 In D, @struct@s are stack allocated and managed via scoping like in \CC, whereas @class@es are managed automatically by the garbage collector.
     660The programming language D also manages resources with constructors and destructors \cite{D}.
     661In D, @struct@s are stack allocatable and managed via scoping like in \CC, whereas @class@es are managed automatically by the garbage collector.
    525662Like Java, using the garbage collector means that destructors are called indeterminately, requiring the use of finally statements to ensure dynamically allocated resources that are not managed by the garbage collector, such as open files, are cleaned up.
    526663Since D supports RAII, it is possible to use the same techniques as in \CC to ensure that resources are released in a timely manner.
     
    755892
    756893Type-safe variadic functions are added to \CFA and discussed in Chapter 4.
     894
     895\section{Contributions}
     896\label{s:contributions}
     897
     898No prior work on constructors or destructors had been done for \CFA.
     899I did both the design and implementation work.
     900While the overall design is based on constructors and destructors in object-oriented C++, it had to be re-engineered into non-object-oriented \CFA.
     901I also had to make changes to the \CFA expression-resolver to integrate constructors and destructors into the type system.
     902
     903Prior work on the design of tuples for \CFA was done by Till, and some initial implementation work by Esteves.
     904I largely took the Till design but added tuple indexing, which exists in a number of programming languages with tuples, simplified the implicit tuple conversions, and integrated with the \CFA polymorphism and assertion satisfaction model.
     905I did a new implementation of tuples, and extensively
     906augmented initial work by Bilson to incorporate tuples into the \CFA expression-resolver and type-unifier.
     907
     908No prior work on variadic functions had been done for \CFA.
     909I did both the design and implementation work.
     910While the overall design is based on variadic templates in C++, my design is novel in the way it is incorporated into the \CFA polymorphism model, and is engineered into \CFA so it dovetails with tuples.
Note: See TracChangeset for help on using the changeset viewer.