Ignore:
Timestamp:
Apr 12, 2017, 3:54:28 PM (7 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
e869e434
Parents:
eaa2f3a1
Message:

thesis updates based on Peter's feedback

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/rob_thesis/intro.tex

    reaa2f3a1 r0eb18557  
    1616Therefore, these design principles must be kept in mind throughout the design and development of new language features.
    1717In order to appeal to existing C programmers, great care must be taken to ensure that new features naturally feel like C.
     18These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used.
     19Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated.
     20
    1821The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail.
    1922
     
    5356\end{cfacode}
    5457Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}.
    55 Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and is never an lvalue.
     58Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and coercions and is never an lvalue.
    5659
    5760\subsection{Overloading}
     
    5962Overloading is the ability to specify multiple entities with the same name.
    6063The most common form of overloading is function overloading, wherein multiple functions can be defined with the same name, but with different signatures.
    61 Like in \CC, \CFA allows overloading based both on the number of parameters and on the types of parameters.
     64C provides a small amount of built-in overloading, \eg + is overloaded for the basic types.
     65Like in \CC, \CFA allows user-defined overloading based both on the number of parameters and on the types of parameters.
    6266  \begin{cfacode}
    6367  void f(void);  // (1)
     
    9296There are times when a function should logically return multiple values.
    9397Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure to package multiple return-values.
     98For example, the first approach:
    9499\begin{cfacode}
    95100int f(int * ret) {        // returns a value through parameter ret
     
    101106int res1 = g(&res2);      // explicitly pass storage
    102107\end{cfacode}
    103 The former solution is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all.
    104 The latter approach:
     108is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all.
     109The second approach:
    105110\begin{cfacode}
    106111struct A {
     
    113118... res3.x ... res3.y ... // use result values
    114119\end{cfacode}
    115 requires the caller to either learn the field names of the structure or learn the names of helper routines to access the individual return values.
    116 Both solutions are syntactically unnatural.
     120is awkward because the caller has to either learn the field names of the structure or learn the names of helper routines to access the individual return values.
     121Both approaches are syntactically unnatural.
    117122
    118123In \CFA, it is possible to directly declare a function returning multiple values.
     
    165170  \begin{cfacode}
    166171  struct A { int i; };
    167   int ?+?(A x, A y);
     172  int ?+?(A x, A y);    // '?'s represent operands
    168173  bool ?<?(A x, A y);
    169174  \end{cfacode}
    170175Notably, the only difference is syntax.
    171176Most of the operators supported by \CC for operator overloading are also supported in \CFA.
    172 Of notable exception are the logical operators (e.g. @||@), the sequence operator (i.e. @,@), and the member-access operators (e.g. @.@ and \lstinline{->}).
     177Of notable exception are the logical operators (\eg @||@), the sequence operator (\ie @,@), and the member-access operators (\eg @.@ and \lstinline{->}).
    173178
    174179Finally, \CFA also permits overloading variable identifiers.
     
    243248  template<typename T>
    244249  T sum(T *arr, int n) {
    245     T t;
     250    T t;  // default construct => 0
    246251    for (; n > 0; n--) t += arr[n-1];
    247252    return t;
     
    261266  \end{cfacode}
    262267The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@.
    263 In particular, the assertions above specify that there must be a an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
     268In particular, the assertions above specify that there must be an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
    264269The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class.
    265270In addition to @otype@, there are currently two other type-classes.
     
    281286A major difference between the approaches of \CC and \CFA to polymorphism is that the set of assumed properties for a type is \emph{explicit} in \CFA.
    282287One of the major limiting factors of \CC's approach is that templates cannot be separately compiled.
    283 In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled.
     288In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled, as the function prototype states all necessary requirements separate from the implementation.
     289For example, the prototype for the previous sum function is
     290  \begin{cfacode}
     291  forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
     292  T sum(T *arr, int n);
     293  \end{cfacode}
     294With this prototype, a caller in another translation unit knows all of the constraints on @T@, and thus knows all of the operations that need to be made available to @sum@.
    284295
    285296In \CFA, a set of assertions can be factored into a \emph{trait}.
     
    296307This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration.
    297308
    298 An interesting application of return-type resolution and polymorphism is with type-safe @malloc@.
     309An interesting application of return-type resolution and polymorphism is a type-safe version of @malloc@.
    299310\begin{cfacode}
    300311forall(dtype T | sized(T))
     
    316327
    317328In object-oriented programming languages, type invariants are typically established in a constructor and maintained throughout the object's lifetime.
    318 These assertions are typically achieved through a combination of access control modifiers and a restricted interface.
     329These assertions are typically achieved through a combination of access-control modifiers and a restricted interface.
    319330Typically, data which requires the maintenance of an invariant is hidden from external sources using the \emph{private} modifier, which restricts reads and writes to a select set of trusted routines, including member functions.
    320331It is these trusted routines that perform all modifications to internal data in a way that is consistent with the invariant, by ensuring that the invariant holds true at the end of the routine call.
     
    388399In other languages, a hybrid situation exists where resources escape the allocation block, but ownership is precisely controlled by the language.
    389400This pattern requires a strict interface and protocol for a data structure, consisting of a pre-initialization and a post-termination call, and all intervening access is done via interface routines.
    390 This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource management cases.
     401This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource-management cases.
    391402
    392403For example, \CC directly supports this pattern through class types and an idiom known as RAII \footnote{Resource Acquisition is Initialization} by means of constructors and destructors.
     
    399410In the context of \CFA, a non-trivial constructor is either a user defined constructor or an auto-generated constructor that calls a non-trivial constructor.
    400411
    401 For the remaining resource ownership cases, programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol implemented via the programming language.
     412For the remaining resource ownership cases, a programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol enforced by the programming language.
    402413
    403414In garbage collected languages, such as Java, resources are largely managed by the garbage collector.
    404 Still, garbage collectors are typically focus only on memory management.
     415Still, garbage collectors typically focus only on memory management.
    405416There are many kinds of resources that the garbage collector does not understand, such as sockets, open files, and database connections.
    406417In particular, Java supports \emph{finalizers}, which are similar to destructors.
    407 Sadly, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious.
     418Unfortunately, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious.
    408419Due to operating-system resource-limits, this is unacceptable for many long running programs.
    409420Instead, the paradigm in Java requires programmers to manually keep track of all resources \emph{except} memory, leading many novices and experts alike to forget to close files, etc.
     
    450461\end{javacode}
    451462Variables declared as part of a try-with-resources statement must conform to the @AutoClosable@ interface, and the compiler implicitly calls @close@ on each of the variables at the end of the block.
    452 Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is appropriately guarded and conditionally executed to prevent null-pointer exceptions.
     463Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is automatically guarded and conditionally executed to prevent null-pointer exceptions.
    453464
    454465While Rust \cite{Rust} does not enforce the use of a garbage collector, it does provide a manual memory management environment, with a strict ownership model that automatically frees allocated memory and prevents common memory management errors.
     
    486497There is no runtime cost imposed on these restrictions, since they are enforced at compile-time.
    487498
    488 Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, allowing Rust programs to automatically clean up auxiliary resources much like a \CC program.
     499Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, providing automatic clean up of auxiliary resources, much like a \CC program.
    489500\begin{rustcode}
    490501struct S {
     
    493504
    494505impl Drop for S {  // RAII for S
    495   fn drop(&mut self) {
     506  fn drop(&mut self) {  // destructor
    496507    println!("dropped {}", self.name);
    497508  }
     
    558569tuple<int, int, int> triple(10, 20, 30);
    559570auto & [t1, t2, t3] = triple;
    560 t2 = 0; // changes triple
     571t2 = 0; // changes middle element of triple
    561572
    562573struct S { int x; double y; };
     
    564575auto [x, y] = s; // unpack s
    565576\end{cppcode}
    566 Structured bindings allow unpacking any struct with all public non-static data members into fresh local variables.
     577Structured bindings allow unpacking any structure with all public non-static data members into fresh local variables.
    567578The use of @&@ allows declaring new variables as references, which is something that cannot be done with @std::tie@, since \CC references do not support rebinding.
    568579This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism.
    569580Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables.
    570581
    571 Like \CC, D provides tuples through a library variadic template struct.
     582Like \CC, D provides tuples through a library variadic-template structure.
    572583In D, it is possible to name the fields of a tuple type, which creates a distinct type.
    573584% http://dlang.org/phobos/std_typecons.html
     
    600611\end{smlcode}
    601612Here, the function @binco@ appears to take 2 arguments, but it actually takes a single argument which is implicitly decomposed via pattern matching.
    602 Tuples are a foundational tool in SML, allowing the creation of arbitrarily complex structured data types.
     613Tuples are a foundational tool in SML, allowing the creation of arbitrarily-complex structured data-types.
    603614
    604615Scala, like \CC, provides tuple types through the standard library \cite{Scala}.
     
    653664Since the variadic arguments are untyped, it is up to the function to interpret any data that is passed in.
    654665Additionally, the interface to manipulate @va_list@ objects is essentially limited to advancing to the next argument, without any built-in facility to determine when the last argument is read.
    655 This requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types.
     666This limitation requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types.
    656667The format string in @printf@ is one such example of an argument descriptor.
    657668\begin{cfacode}
Note: See TracChangeset for help on using the changeset viewer.