Changeset 7493339 for doc/rob_thesis/intro.tex
- Timestamp:
- Apr 3, 2017, 7:04:30 PM (6 years ago)
- Branches:
- aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- fbd7ad6
- Parents:
- ae6cc8b
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/rob_thesis/intro.tex
rae6cc8b r7493339 5 5 \section{\CFA Background} 6 6 \label{s:background} 7 \CFA is a modern extension to the C programming language.7 \CFA is a modern non-object-oriented extension to the C programming language. 8 8 As it is an extension of C, there is already a wealth of existing C code and principles that govern the design of the language. 9 9 Among the goals set out in the original design of \CFA, four points stand out \cite{Bilson03}. … … 16 16 Therefore, these design principles must be kept in mind throughout the design and development of new language features. 17 17 In order to appeal to existing C programmers, great care must be taken to ensure that new features naturally feel like C. 18 The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail. % TODO: harmonize with?18 The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail. 19 19 20 20 \subsection{C Background} … … 39 39 For example, in the initialization of @a1@, the initializer of @y@ is @7@, and the unnamed initializer @6@ initializes the next subobject, @z@. 40 40 Later initializers override earlier initializers, so a subobject for which there is more than one initializer is only initailized by its last initializer. 41 Th is can be seen in the initialization of @a0@, where @x@ is designated twice, and thus initialized to @8@.42 Note that in \CFA, designations use a colon separator, rather than an equals sign as in C .41 These semantics can be seen in the initialization of @a0@, where @x@ is designated twice, and thus initialized to @8@. 42 Note that in \CFA, designations use a colon separator, rather than an equals sign as in C, because this syntax is one of the few places that conflicts with the new language features. 43 43 44 44 C also provides \emph{compound literal} expressions, which provide a first-class mechanism for creating unnamed objects. … … 91 91 92 92 There are times when a function should logically return multiple values. 93 Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure t 0package multiple return-values.93 Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure to package multiple return-values. 94 94 \begin{cfacode} 95 95 int f(int * ret) { // returns a value through parameter ret … … 102 102 \end{cfacode} 103 103 The former solution is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all. 104 The latter approach: 104 105 \begin{cfacode} 105 106 struct A { … … 112 113 ... res3.x ... res3.y ... // use result values 113 114 \end{cfacode} 114 The latter approachrequires the caller to either learn the field names of the structure or learn the names of helper routines to access the individual return values.115 requires the caller to either learn the field names of the structure or learn the names of helper routines to access the individual return values. 115 116 Both solutions are syntactically unnatural. 116 117 117 118 In \CFA, it is possible to directly declare a function returning mutliple values. 118 This provides important semantic information to the caller, since return values are only for output.119 \begin{cfacode} 120 [int, int] f() { // don't need to create anew type119 This extension provides important semantic information to the caller, since return values are only for output. 120 \begin{cfacode} 121 [int, int] f() { // no new type 121 122 return [123, 37]; 122 123 } 123 124 \end{cfacode} 124 However, the ability to return multiple values requires a syntax for accepting the results from a function. 125 However, the ability to return multiple values is useless without a syntax for accepting the results from the function. 126 125 127 In standard C, return values are most commonly assigned directly into local variables, or are used as the arguments to another function call. 126 128 \CFA allows both of these contexts to accept multiple return values. … … 148 150 g(f()); // selects (2) 149 151 \end{cfacode} 150 In this example, the only possible call to @f@ that can produce the two @int@s required by @g@ is the second option.151 A similar reasoning holds for assigning into multiple variables.152 In this example, the only possible call to @f@ that can produce the two @int@s required for assigning into the variables @x@ and @y@ is the second option. 153 A similar reasoning holds calling the function @g@. 152 154 153 155 In \CFA, overloading also applies to operator names, known as \emph{operator overloading}. … … 166 168 bool ?<?(A x, A y); 167 169 \end{cfacode} 168 Notably, the only difference i n this example is syntax.170 Notably, the only difference is syntax. 169 171 Most of the operators supported by \CC for operator overloading are also supported in \CFA. 170 172 Of notable exception are the logical operators (e.g. @||@), the sequence operator (i.e. @,@), and the member-access operators (e.g. @.@ and \lstinline{->}). … … 172 174 Finally, \CFA also permits overloading variable identifiers. 173 175 This feature is not available in \CC. 174 \begin{cfacode} % TODO: pick something better than x? max, zero, one?176 \begin{cfacode} 175 177 struct Rational { int numer, denom; }; 176 178 int x = 3; // (1) … … 186 188 In this example, there are three definitions of the variable @x@. 187 189 Based on the context, \CFA attempts to choose the variable whose type best matches the expression context. 190 When used judiciously, this feature allows names like @MAX@, @MIN@, and @PI@ to apply across many types. 188 191 189 192 Finally, the values @0@ and @1@ have special status in standard C. … … 197 200 } 198 201 \end{cfacode} 199 Every if 202 Every if- and iteration-statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result. 200 203 Due to these rewrite rules, the values @0@ and @1@ have the types \zero and \one in \CFA, which allow for overloading various operations that connect to @0@ and @1@ \footnote{In the original design of \CFA, @0@ and @1@ were overloadable names \cite[p.~7]{cforall}.}. 201 The types \zero and \one have special built 204 The types \zero and \one have special built-in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal. 202 205 \begin{cfacode} 203 206 // lvalue is similar to returning a reference in C++ … … 293 296 This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration. 294 297 298 An interesting application of return-type resolution and polymorphism is with type-safe @malloc@. 299 \begin{cfacode} 300 forall(dtype T | sized(T)) 301 T * malloc() { 302 return (T*)malloc(sizeof(T)); // call C malloc 303 } 304 int * x = malloc(); // malloc(sizeof(int)) 305 double * y = malloc(); // malloc(sizeof(double)) 306 307 struct S { ... }; 308 S * s = malloc(); // malloc(sizeof(S)) 309 \end{cfacode} 310 The built-in trait @sized@ ensures that size and alignment information for @T@ is available to @malloc@ through @sizeof@ and @_Alignof@ expressions respectively. 311 In calls to @malloc@, the type @T@ is bound based on call-site information, allowing \CFA code to allocate memory without the potential for errors introduced by manually specifying the size of the allocated block. 312 295 313 \section{Invariants} 296 % TODO: discuss software engineering benefits of ctor/dtors: {pre/post} conditions, invariants 297 % an important invariant is the state of the environment (memory, resources) 298 % some objects pass their contract to the object user 299 An \emph{invariant} is a logical assertion that true for some duration of a program's execution. 314 An \emph{invariant} is a logical assertion that is true for some duration of a program's execution. 300 315 Invariants help a programmer to reason about code correctness and prove properties of programs. 301 316 302 317 In object-oriented programming languages, type invariants are typically established in a constructor and maintained throughout the object's lifetime. 303 Th is istypically achieved through a combination of access control modifiers and a restricted interface.318 These assertions are typically achieved through a combination of access control modifiers and a restricted interface. 304 319 Typically, data which requires the maintenance of an invariant is hidden from external sources using the \emph{private} modifier, which restricts reads and writes to a select set of trusted routines, including member functions. 305 320 It is these trusted routines that perform all modifications to internal data in a way that is consistent with the invariant, by ensuring that the invariant holds true at the end of the routine call. … … 307 322 In C, the @assert@ macro is often used to ensure invariants are true. 308 323 Using @assert@, the programmer can check a condition and abort execution if the condition is not true. 309 This is a powerful tool thatforces the programmer to deal with logical inconsistencies as they occur.324 This powerful tool forces the programmer to deal with logical inconsistencies as they occur. 310 325 For production, assertions can be removed by simply defining the preprocessor macro @NDEBUG@, making it simple to ensure that assertions are 0-cost for a performance intensive application. 311 326 \begin{cfacode} … … 354 369 \end{dcode} 355 370 The D compiler is able to assume that assertions and invariants hold true and perform optimizations based on those assumptions. 356 357 An important invariant is the state of the execution environment, including the heap, the open file table, the state of global variables, etc. 358 Since resources are finite, it is important to ensure that objects clean up properly when they are finished, restoring the execution environment to a stable state so that new objects can reuse resources. 371 Note, these invariants are internal to the type's correct behaviour. 372 373 Types also have external invarients with state of the execution environment, including the heap, the open file-table, the state of global variables, etc. 374 Since resources are finite and shared (concurrency), it is important to ensure that objects clean up properly when they are finished, restoring the execution environment to a stable state so that new objects can reuse resources. 359 375 360 376 \section{Resource Management} … … 367 383 However, whenever a program needs a variable to outlive the block it is created in, the storage must be allocated dynamically with @malloc@ and later released with @free@. 368 384 This pattern is extended to more complex objects, such as files and sockets, which also outlive the block where they are created, but at their core is resource management. 369 Once allocated storage escapes a block, the responsibility for deallocating the storage is not specified in a function's type, that is, that the return value is owned by the caller.385 Once allocated storage escapes\footnote{In garbage collected languages, such as Java, escape analysis \cite{Choi:1999:EAJ:320385.320386} is used to determine when dynamically allocated objects are strictly contained within a function, which allows the optimizer to allocate them on the stack.} a block, the responsibility for deallocating the storage is not specified in a function's type, that is, that the return value is owned by the caller. 370 386 This implicit convention is provided only through documentation about the expectations of functions. 371 387 … … 380 396 On the other hand, destructors provide a simple mechanism for tearing down an object and resetting the environment in which the object lived. 381 397 RAII ensures that if all resources are acquired in a constructor and released in a destructor, there are no resource leaks, even in exceptional circumstances. 382 A type with at least one non-trivial constructor or destructor will henceforth bereferred to as a \emph{managed type}.398 A type with at least one non-trivial constructor or destructor is henceforth referred to as a \emph{managed type}. 383 399 In the context of \CFA, a non-trivial constructor is either a user defined constructor or an auto generated constructor that calls a non-trivial constructor. 384 400 … … 389 405 There are many kinds of resources that the garbage collector does not understand, such as sockets, open files, and database connections. 390 406 In particular, Java supports \emph{finalizers}, which are similar to destructors. 391 Sadly, finalizers come with far fewer guarantees, to the point where a completely conforming JVM may never call a single finalizer. % TODO: citation JVM spec; http://stackoverflow.com/a/2506514/2386739392 Due to operating system resource limits, this is unacceptable for many long running tasks. % TODO: citation?393 Instead, the paradigm in Java requires programmers manually keep track of all resource\emph{except} memory, leading many novices and experts alike to forget to close files, etc.394 Complicating the picture, uncaught exceptions can cause control flow to change dramatically, leaking a resource which appears on first glance to be closed.407 Sadly, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious. 408 Due to operating-system resource-limits, this is unacceptable for many long running programs. % TODO: citation? 409 Instead, the paradigm in Java requires programmers to manually keep track of all resources \emph{except} memory, leading many novices and experts alike to forget to close files, etc. 410 Complicating the picture, uncaught exceptions can cause control flow to change dramatically, leaking a resource that appears on first glance to be released. 395 411 \begin{javacode} 396 412 void write(String filename, String msg) throws Exception { … … 403 419 } 404 420 \end{javacode} 405 Any line in this program can throw an exception. 406 This leads to a profusion of finally blocks around many function bodies, since it isn't always clear when an exception may be thrown. 421 Any line in this program can throw an exception, which leads to a profusion of finally blocks around many function bodies, since it is not always clear when an exception may be thrown. 407 422 \begin{javacode} 408 423 public void write(String filename, String msg) throws Exception { … … 422 437 \end{javacode} 423 438 In Java 7, a new \emph{try-with-resources} construct was added to alleviate most of the pain of working with resources, but ultimately it still places the burden squarely on the user rather than on the library designer. 424 Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources which can throw an exception on close can leak nested resources. % TODO: cite oracle article http://www.oracle.com/technetwork/articles/java/trywithresources-401775.html?439 Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources that can throw an exception on close can leak nested resources \cite{TryWithResources}. 425 440 \begin{javacode} 426 441 public void write(String filename, String msg) throws Exception { 427 try ( 442 try ( // try-with-resources 428 443 FileOutputStream out = new FileOutputStream(filename); 429 444 FileOutputStream log = new FileOutputStream("log.txt"); … … 434 449 } 435 450 \end{javacode} 436 On the other hand, the Java compiler generates more code if more resources are declared, meaning that users must be more familiar with each type and library designers must provide better documentation. 451 Variables declared as part of a try-with-resources statement must conform to the @AutoClosable@ interface, and the compiler implicitly calls @close@ on each of the variables at the end of the block. 452 Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is appropriately guarded and conditionally executed to prevent null-pointer exceptions. 453 454 % TODO: discuss Rust? 455 % Like \CC, Rust \cite{Rust} provides RAII through constructors and destructors. 456 % Smart pointers are deeply integrated in the Rust type-system. 437 457 438 458 % D has constructors and destructors that are worth a mention (under classes) https://dlang.org/spec/spec.html … … 444 464 Like Java, using the garbage collector means that destructors may never be called, requiring the use of finally statements to ensure dynamically allocated resources that are not managed by the garbage collector, such as open files, are cleaned up. 445 465 Since D supports RAII, it is possible to use the same techniques as in \CC to ensure that resources are released in a timely manner. 446 Finally, D provides a scope guard statement, which allows an arbitrary statement to be executed at normal scope exit with \emph{success}, at exceptional scope exit with \emph{failure}, or at normal and exceptional scope exit with \emph{exit}. % cite? https://dlang.org/spec/statement.html#ScopeGuardStatement 447 It has been shown that the \emph{exit} form of the scope guard statement can be implemented in a library in \CC. % cite: http://www.drdobbs.com/184403758 448 449 % TODO: discussion of lexical scope vs. dynamic 450 % see Peter's suggestions 451 % RAII works in both cases. Guaranteed to work in stack case, works in heap case if root is deleted (but it's dangerous to rely on this, because of exceptions) 466 Finally, D provides a scope guard statement, which allows an arbitrary statement to be executed at normal scope exit with \emph{success}, at exceptional scope exit with \emph{failure}, or at normal and exceptional scope exit with \emph{exit}. % TODO: cite? https://dlang.org/spec/statement.html#ScopeGuardStatement 467 It has been shown that the \emph{exit} form of the scope guard statement can be implemented in a library in \CC \cite{ExceptSafe}. 468 469 To provide managed types in \CFA, new kinds of constructors and destructors are added to C and discussed in Chapter 2. 452 470 453 471 \section{Tuples} 454 472 \label{s:Tuples} 455 473 In mathematics, tuples are finite-length sequences which, unlike sets, allow duplicate elements. 456 In programming languages, tuples are a construct thatprovide fixed-sized heterogeneous lists of elements.474 In programming languages, tuples provide fixed-sized heterogeneous lists of elements. 457 475 Many programming languages have tuple constructs, such as SETL, \KWC, ML, and Scala. 458 476 … … 462 480 Adding tuples to \CFA has previously been explored by Esteves \cite{Esteves04}. 463 481 464 The design of tuples in \KWC took much of its inspiration from SETL .482 The design of tuples in \KWC took much of its inspiration from SETL \cite{SETL}. 465 483 SETL is a high-level mathematical programming language, with tuples being one of the primary data types. 466 484 Tuples in SETL allow a number of operations, including subscripting, dynamic expansion, and multiple assignment. … … 470 488 \begin{cppcode} 471 489 tuple<int, int, int> triple(10, 20, 30); 472 get<1>(triple); // access component 1 => 30490 get<1>(triple); // access component 1 => 20 473 491 474 492 tuple<int, double> f(); … … 482 500 Tuples are simple data structures with few specific operations. 483 501 In particular, it is possible to access a component of a tuple using @std::get<N>@. 484 Another interesting feature is @std::tie@, which creates a tuple of references, which allows assigningthe results of a tuple-returning function into separate local variables, without requiring a temporary variable.502 Another interesting feature is @std::tie@, which creates a tuple of references, allowing assignment of the results of a tuple-returning function into separate local variables, without requiring a temporary variable. 485 503 Tuples also support lexicographic comparisons, making it simple to write aggregate comparators using @std::tie@. 486 504 487 There is a proposal for \CCseventeen called \emph{structured bindings} , that introduces new syntax to eliminate the need to pre-declare variables and use @std::tie@ for binding the results from a function call. % TODO: cite http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0144r0.pdf505 There is a proposal for \CCseventeen called \emph{structured bindings} \cite{StructuredBindings}, that introduces new syntax to eliminate the need to pre-declare variables and use @std::tie@ for binding the results from a function call. 488 506 \begin{cppcode} 489 507 tuple<int, double> f(); … … 500 518 Structured bindings allow unpacking any struct with all public non-static data members into fresh local variables. 501 519 The use of @&@ allows declaring new variables as references, which is something that cannot be done with @std::tie@, since \CC references do not support rebinding. 502 This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must documented with some other mechanism.520 This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism. 503 521 Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables. 504 522 505 523 Like \CC, D provides tuples through a library variadic template struct. 506 524 In D, it is possible to name the fields of a tuple type, which creates a distinct type. 507 \begin{dcode} % TODO: cite http://dlang.org/phobos/std_typecons.html 525 % TODO: cite http://dlang.org/phobos/std_typecons.html 526 \begin{dcode} 508 527 Tuple!(float, "x", float, "y") point2D; 509 Tuple!(float, float) float2; // different type s528 Tuple!(float, float) float2; // different type from point2D 510 529 511 530 point2D[0]; // access first element … … 521 540 The @expand@ method produces the components of the tuple as a list of separate values, making it possible to call a function that takes $N$ arguments using a tuple with $N$ components. 522 541 523 Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML .542 Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML \cite{sml}. 524 543 A function in SML always accepts exactly one argument. 525 544 There are two ways to mimic multiple argument functions: the first through currying and the second by accepting tuple arguments. … … 535 554 Tuples are a foundational tool in SML, allowing the creation of arbitrarily complex structured data types. 536 555 537 Scala, like \CC, provides tuple types through the standard library .556 Scala, like \CC, provides tuple types through the standard library \cite{Scala}. 538 557 Scala provides tuples of size 1 through 22 inclusive through generic data structures. 539 558 Tuples support named access and subscript access, among a few other operations. … … 547 566 \end{scalacode} 548 567 In Scala, tuples are primarily used as simple data structures for carrying around multiple values or for returning multiple values from a function. 549 The 22-element restriction is an odd and arbitrary choice, but in practice it does n't cause problems since large tuples are uncommon.568 The 22-element restriction is an odd and arbitrary choice, but in practice it does not cause problems since large tuples are uncommon. 550 569 Subscript access is provided through the @productElement@ method, which returns a value of the top-type @Any@, since it is impossible to receive a more precise type from a general subscripting method due to type erasure. 551 570 The disparity between named access beginning at @_1@ and subscript access starting at @0@ is likewise an oddity, but subscript access is typically avoided since it discards type information. … … 553 572 554 573 555 \Csharp has similarly strange limitations, allowing tuples of size up to 7 components. % TODO: cite https://msdn.microsoft.com/en-us/library/system.tuple(v=vs.110).aspx574 \Csharp also has tuples, but has similarly strange limitations, allowing tuples of size up to 7 components. % TODO: cite https://msdn.microsoft.com/en-us/library/system.tuple(v=vs.110).aspx 556 575 The officially supported workaround for this shortcoming is to nest tuples in the 8th component. 557 576 \Csharp allows accessing a component of a tuple by using the field @Item$N$@ for components 1 through 7, and @Rest@ for the nested tuple. 558 577 559 560 % TODO: cite 5.3 https://docs.python.org/3/tutorial/datastructures.html 561 In Python, tuples are immutable sequences that provide packing and unpacking operations. 578 In Python \cite{Python}, tuples are immutable sequences that provide packing and unpacking operations. 562 579 While the tuple itself is immutable, and thus does not allow the assignment of components, there is nothing preventing a component from being internally mutable. 563 580 The components of a tuple can be accessed by unpacking into multiple variables, indexing, or via field name, like D. 564 581 Tuples support multiple assignment through a combination of packing and unpacking, in addition to the common sequence operations. 565 582 566 % TODO: cite https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/Types.html#//apple_ref/doc/uid/TP40014097-CH31-ID448 567 Swift, like D, provides named tuples, with components accessed by name, index, or via extractors. 583 Swift \cite{Swift}, like D, provides named tuples, with components accessed by name, index, or via extractors. 568 584 Tuples are primarily used for returning multiple values from a function. 569 585 In Swift, @Void@ is an alias for the empty tuple, and there are no single element tuples. 586 587 % TODO: this statement feels like it's too strong 588 Tuples as powerful as the above languages are added to C and discussed in Chapter 3. 570 589 571 590 \section{Variadic Functions} … … 641 660 A parameter pack matches 0 or more elements, which can be types or expressions depending on the context. 642 661 Like other templates, variadic template functions rely on an implicit set of constraints on a type, in this example a @print@ routine. 643 That is, it is possible to use the @f@ routine anyany type provided there is a corresponding @print@ routine, making variadic templates fully open to extension, unlike variadic functions in C.662 That is, it is possible to use the @f@ routine on any type provided there is a corresponding @print@ routine, making variadic templates fully open to extension, unlike variadic functions in C. 644 663 645 664 Recent \CC standards (\CCfourteen, \CCseventeen) expand on the basic premise by allowing variadic template variables and providing convenient expansion syntax to remove the need for recursion in some cases, amongst other things. … … 672 691 Unfortunately, Java's use of nominal inheritance means that types must explicitly inherit from classes or interfaces in order to be considered a subclass. 673 692 The combination of these two issues greatly restricts the usefulness of variadic functions in Java. 693 694 Type-safe variadic functions are added to C and discussed in Chapter 4.
Note: See TracChangeset
for help on using the changeset viewer.