Changeset 6250a312 for doc/rob_thesis/intro.tex
- Timestamp:
- May 10, 2017, 5:00:47 PM (8 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 8514fe19, dbfb35d
- Parents:
- 0f9bef3 (diff), 29cf9c8 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/rob_thesis/intro.tex
r0f9bef3 r6250a312 19 19 Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated. 20 20 21 The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail. 21 The current implementation of \CFA is a source-to-source translator from \CFA to GNU C \cite{GCCExtensions}. 22 23 The remainder of this section describes some of the important features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail. 22 24 23 25 \subsection{C Background} 24 26 \label{sub:c_background} 27 In the context of this work, the term \emph{object} refers to a region of data storage in the execution environment, the contents of which can represent values \cite[p.~6]{C11}. 28 25 29 One of the lesser-known features of standard C is \emph{designations}. 26 30 Designations are similar to named parameters in languages such as Python and Scala, except that they only apply to aggregate initializers. 31 Note that in \CFA, designations use a colon separator, rather than an equals sign as in C, because this syntax is one of the few places that conflicts with the new language features. 27 32 \begin{cfacode} 28 33 struct A { … … 43 48 Later initializers override earlier initializers, so a sub-object for which there is more than one initializer is only initialized by its last initializer. 44 49 These semantics can be seen in the initialization of @a0@, where @x@ is designated twice, and thus initialized to @8@. 45 Note that in \CFA, designations use a colon separator, rather than an equals sign as in C, because this syntax is one of the few places that conflicts with the new language features.46 50 47 51 C also provides \emph{compound literal} expressions, which provide a first-class mechanism for creating unnamed objects. … … 57 61 Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}. 58 62 Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and coercions and is never an lvalue. 63 64 The \CFA translator makes use of several GNU C extensions, including \emph{nested functions} and \emph{attributes}. 65 Nested functions make it possible to access data that is lexically in scope in the nested function's body. 66 \begin{cfacode} 67 int f() { 68 int x = 0; 69 void g() { 70 x++; 71 } 72 g(); // changes x 73 } 74 \end{cfacode} 75 Nested functions come with the usual C caveat that they should not leak into the containing environment, since they are only valid as long as the containing function's stack frame is active. 76 77 Attributes make it possible to inform the compiler of certain properties of the code. 78 For example, a function can be marked as deprecated, so that legacy APIs can be identified and slowly removed, or as \emph{hot}, so that the compiler knows the function is called frequently and should be aggresively optimized. 79 \begin{cfacode} 80 __attribute__((deprecated("foo is deprecated, use bar instead"))) 81 void foo(); 82 __attribute__((hot)) void bar(); // heavily optimized 83 84 foo(); // warning 85 bar(); 86 \end{cfacode} 59 87 60 88 \subsection{Overloading} … … 64 92 C provides a small amount of built-in overloading, \eg + is overloaded for the basic types. 65 93 Like in \CC, \CFA allows user-defined overloading based both on the number of parameters and on the types of parameters. 66 67 68 69 70 71 72 94 \begin{cfacode} 95 void f(void); // (1) 96 void f(int); // (2) 97 void f(char); // (3) 98 99 f('A'); // selects (3) 100 \end{cfacode} 73 101 In this case, there are three @f@ procedures, where @f@ takes either 0 or 1 arguments, and if an argument is provided then it may be of type @int@ or of type @char@. 74 102 Exactly which procedure is executed depends on the number and types of arguments passed. 75 103 If there is no exact match available, \CFA attempts to find a suitable match by examining the C built-in conversion heuristics. 76 \begin{cfacode} 77 void g(long long); 78 79 g(12345); 80 \end{cfacode} 104 The \CFA expression resolution algorithm uses a cost function to determine the interpretation that uses the fewest conversions and polymorphic type bindings. 105 \begin{cfacode} 106 void g(long long); 107 108 g(12345); 109 \end{cfacode} 81 110 In the above example, there is only one instance of @g@, which expects a single parameter of type @long long@. 82 111 Here, the argument provided has type @int@, but since all possible values of type @int@ can be represented by a value of type @long long@, there is a safe conversion from @int@ to @long long@, and so \CFA calls the provided @g@ routine. 83 112 113 Overloading solves the problem present in C where there can only be one function with a given name, requiring multiple names for functions that perform the same operation but take in different types. 114 This can be seen in the example of the absolute value functions C: 115 \begin{cfacode} 116 // stdlib.h 117 int abs(int); 118 long int labs(long int); 119 long long int llabs(long long int); 120 \end{cfacode} 121 In \CFA, the functions @labs@ and @llabs@ are replaced by appropriate overloads of @abs@. 122 84 123 In addition to this form of overloading, \CFA also allows overloading based on the number and types of \emph{return} values. 85 124 This extension is a feature that is not available in \CC, but is available in other programming languages such as Ada \cite{Ada95}. 86 87 88 89 90 91 125 \begin{cfacode} 126 int g(); // (1) 127 double g(); // (2) 128 129 int x = g(); // selects (1) 130 \end{cfacode} 92 131 Here, the only difference between the signatures of the different versions of @g@ is in the return values. 93 132 The result context is used to select an appropriate routine definition. 94 133 In this case, the result of @g@ is assigned into a variable of type @int@, so \CFA prefers the routine that returns a single @int@, because it is an exact match. 134 135 Return-type overloading solves similar problems to parameter-list overloading, in that multiple functions that perform similar operations can have the same, but produce different values. 136 One use case for this feature is to provide two versions of the @bsearch@ routine: 137 \begin{cfacode} 138 forall(otype T | { int ?<?( T, T ); }) 139 T * bsearch(T key, const T * arr, size_t dimension) { 140 int comp(const void * t1, const void * t2) { 141 return *(T *)t1 < *(T *)t2 ? -1 : *(T *)t2 < *(T *)t1 ? 1 : 0; 142 } 143 return (T *)bsearch(&key, arr, dimension, sizeof(T), comp); 144 } 145 forall(otype T | { int ?<?( T, T ); }) 146 unsigned int bsearch(T key, const T * arr, size_t dimension) { 147 T *result = bsearch(key, arr, dimension); 148 // pointer subtraction includes sizeof(T) 149 return result ? result - arr : dimension; 150 } 151 double key = 5.0; 152 double vals[10] = { /* 10 floating-point values */ }; 153 154 double * val = bsearch( 5.0, vals, 10 ); // selection based on return type 155 int posn = bsearch( 5.0, vals, 10 ); 156 \end{cfacode} 157 The first version provides a thin wrapper around the C @bsearch@ routine, converting untyped @void *@ to the polymorphic type @T *@, allowing the \CFA compiler to catch errors when the type of @key@, @arr@, and the target at the call-site do not agree. 158 The second version provides an alternate return of the index in the array of the selected element, rather than its address. 95 159 96 160 There are times when a function should logically return multiple values. … … 145 209 146 210 An extra quirk introduced by multiple return values is in the resolution of function calls. 147 148 149 150 151 152 153 154 155 156 211 \begin{cfacode} 212 int f(); // (1) 213 [int, int] f(); // (2) 214 215 void g(int, int); 216 217 int x, y; 218 [x, y] = f(); // selects (2) 219 g(f()); // selects (2) 220 \end{cfacode} 157 221 In this example, the only possible call to @f@ that can produce the two @int@s required for assigning into the variables @x@ and @y@ is the second option. 158 222 A similar reasoning holds calling the function @g@. 223 224 This duality between aggregation and aliasing can be seen in the C standard library in the @div@ and @remquo@ functions, which return the quotient and remainder for a division of integer and floating-point values, respectively. 225 \begin{cfacode} 226 typedef struct { int quo, rem; } div_t; // from stdlib.h 227 div_t div( int num, int den ); 228 double remquo( double num, double den, int * quo ); 229 div_t qr = div( 13, 5 ); // return quotient/remainder aggregate 230 int q; 231 double r = remquo( 13.5, 5.2, &q ); // return remainder, alias quotient 232 \end{cfacode} 233 @div@ aggregates the quotient/remainder in a structure, while @remquo@ aliases a parameter to an argument. 234 Alternatively, a programming language can directly support returning multiple values, \eg in \CFA: 235 \begin{lstlisting} 236 [int, int] div(int num, int den); // return two integers 237 [double, double] div( double num, double den ); // return two doubles 238 int q, r; // overloaded variable names 239 double q, r; 240 [q, r] = div(13, 5); // select appropriate div and q, r 241 [q, r] = div(13.5, 5.2); 242 \end{lstlisting} 159 243 160 244 In \CFA, overloading also applies to operator names, known as \emph{operator overloading}. 161 245 Similar to function overloading, a single operator is given multiple meanings by defining new versions of the operator with different signatures. 162 246 In \CC, this can be done as follows 163 164 165 intoperator+(A x, A y);166 167 247 \begin{cppcode} 248 struct A { int i; }; 249 A operator+(A x, A y); 250 bool operator<(A x, A y); 251 \end{cppcode} 168 252 169 253 In \CFA, the same example can be written as follows. 170 171 172 int?+?(A x, A y); // '?'s represent operands173 bool?<?(A x, A y);174 254 \begin{cfacode} 255 struct A { int i; }; 256 A ?+?(A x, A y); // '?'s represent operands 257 int ?<?(A x, A y); 258 \end{cfacode} 175 259 Notably, the only difference is syntax. 176 260 Most of the operators supported by \CC for operator overloading are also supported in \CFA. … … 179 263 Finally, \CFA also permits overloading variable identifiers. 180 264 This feature is not available in \CC. 181 182 183 184 185 186 187 188 189 190 191 192 265 \begin{cfacode} 266 struct Rational { int numer, denom; }; 267 int x = 3; // (1) 268 double x = 1.27; // (2) 269 Rational x = { 4, 11 }; // (3) 270 271 void g(double); 272 273 x += 1; // chooses (1) 274 g(x); // chooses (2) 275 Rational y = x; // chooses (3) 276 \end{cfacode} 193 277 In this example, there are three definitions of the variable @x@. 194 278 Based on the context, \CFA attempts to choose the variable whose type best matches the expression context. … … 208 292 Due to these rewrite rules, the values @0@ and @1@ have the types \zero and \one in \CFA, which allow for overloading various operations that connect to @0@ and @1@ \footnote{In the original design of \CFA, @0@ and @1@ were overloadable names \cite[p.~7]{cforall}.}. 209 293 The types \zero and \one have special built-in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal. 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 294 \begin{cfacode} 295 // lvalue is similar to returning a reference in C++ 296 lvalue Rational ?+=?(Rational *a, Rational b); 297 Rational ?=?(Rational * dst, zero_t) { 298 return *dst = (Rational){ 0, 1 }; 299 } 300 301 Rational sum(Rational *arr, int n) { 302 Rational r; 303 r = 0; // use rational-zero_t assignment 304 for (; n > 0; n--) { 305 r += arr[n-1]; 306 } 307 return r; 308 } 309 \end{cfacode} 226 310 This function takes an array of @Rational@ objects and produces the @Rational@ representing the sum of the array. 227 311 Note the use of an overloaded assignment operator to set an object of type @Rational@ to an appropriate @0@ value. … … 232 316 In particular, \CFA supports the notion of parametric polymorphism. 233 317 Parametric polymorphism allows a function to be written generically, for all values of all types, without regard to the specifics of a particular type. 234 For example, in \CC, the simple identity function for all types can be written as 235 236 237 238 239 \CC uses the template mechanism to support parametric polymorphism. In \CFA, an equivalent function can be written as 240 241 242 243 318 For example, in \CC, the simple identity function for all types can be written as: 319 \begin{cppcode} 320 template<typename T> 321 T identity(T x) { return x; } 322 \end{cppcode} 323 \CC uses the template mechanism to support parametric polymorphism. In \CFA, an equivalent function can be written as: 324 \begin{cfacode} 325 forall(otype T) 326 T identity(T x) { return x; } 327 \end{cfacode} 244 328 Once again, the only visible difference in this example is syntactic. 245 329 Fundamental differences can be seen by examining more interesting examples. 246 In \CC, a generic sum function is written as follows 247 248 249 250 251 252 253 254 330 In \CC, a generic sum function is written as follows: 331 \begin{cppcode} 332 template<typename T> 333 T sum(T *arr, int n) { 334 T t; // default construct => 0 335 for (; n > 0; n--) t += arr[n-1]; 336 return t; 337 } 338 \end{cppcode} 255 339 Here, the code assumes the existence of a default constructor, assignment operator, and an addition operator over the provided type @T@. 256 340 If any of these required operators are not available, the \CC compiler produces an error message stating which operators could not be found. 257 341 258 A similar sum function can be written in \CFA as follows 259 260 261 262 263 264 265 266 342 A similar sum function can be written in \CFA as follows: 343 \begin{cfacode} 344 forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**) 345 T sum(T *arr, int n) { 346 T t = 0; 347 for (; n > 0; n--) t = t += arr[n-1]; 348 return t; 349 } 350 \end{cfacode} 267 351 The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@. 268 352 In particular, the assertions above specify that there must be an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@. 269 353 The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class. 270 354 In addition to @otype@, there are currently two other type-classes. 355 356 @dtype@, short for \emph{data type}, serves as the top type for object types; any object type, complete or incomplete, can be bound to a @dtype@ type variable. 357 To contrast, @otype@, short for \emph{object type}, is a @dtype@ with known size, alignment, and an assignment operator, and thus bind only to complete object types. 358 With this extra information, complete objects can be used in polymorphic code in the same way they are used in monomorphic code, providing familiarity and ease of use. 359 The third type-class is @ftype@, short for \emph{function type}, matching only function types. 271 360 The three type parameter kinds are summarized in \autoref{table:types} 272 361 … … 275 364 \begin{tabular}{|c||c|c|c||c|c|c|} 276 365 \hline 277 name & object type & incomplete type & function type & can assign value& can create & has size \\ \hline366 name & object type & incomplete type & function type & can assign & can create & has size \\ \hline 278 367 @otype@ & X & & & X & X & X \\ \hline 279 368 @dtype@ & X & X & & & & \\ \hline … … 288 377 In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled, as the function prototype states all necessary requirements separate from the implementation. 289 378 For example, the prototype for the previous sum function is 290 291 292 293 379 \begin{cfacode} 380 forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**) 381 T sum(T *arr, int n); 382 \end{cfacode} 294 383 With this prototype, a caller in another translation unit knows all of the constraints on @T@, and thus knows all of the operations that need to be made available to @sum@. 295 384 296 385 In \CFA, a set of assertions can be factored into a \emph{trait}. 297 386 \begin{cfacode} 298 299 300 301 302 303 304 305 387 trait Addable(otype T) { 388 T ?+?(T, T); 389 T ++?(T); 390 T ?++(T); 391 } 392 forall(otype T | Addable(T)) void f(T); 393 forall(otype T | Addable(T) | { T --?(T); }) T g(T); 394 forall(otype T, U | Addable(T) | { T ?/?(T, U); }) U h(T, U); 306 395 \end{cfacode} 307 396 This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration. 308 397 309 An interesting application of return-type resolution and polymorphism is a type-safeversion of @malloc@.398 An interesting application of return-type resolution and polymorphism is a polymorphic version of @malloc@. 310 399 \begin{cfacode} 311 400 forall(dtype T | sized(T)) … … 321 410 The built-in trait @sized@ ensures that size and alignment information for @T@ is available in the body of @malloc@ through @sizeof@ and @_Alignof@ expressions respectively. 322 411 In calls to @malloc@, the type @T@ is bound based on call-site information, allowing \CFA code to allocate memory without the potential for errors introduced by manually specifying the size of the allocated block. 412 413 \subsection{Planned Features} 414 415 One of the planned features \CFA is \emph{reference types}. 416 At a high level, the current proposal is to add references as a way to cleanup pointer syntax. 417 With references, it will be possible to store any address, as with a pointer, with the key difference being that references are automatically dereferenced. 418 \begin{cfacode} 419 int x = 0; 420 int * p = &x; // needs & 421 int & ref = x; // no & 422 423 printf("%d %d\n", *p, ref); // pointer needs *, ref does not 424 \end{cfacode} 425 426 It is possible to add new functions or shadow existing functions for the duration of a scope, using normal C scoping rules. 427 One application of this feature is to reverse the order of @qsort@. 428 \begin{cfacode} 429 forall(otype T | { int ?<?( T, T ); }) 430 void qsort(const T * arr, size_t size) { 431 int comp(const void * t1, const void * t2) { 432 return *(T *)t1 < *(T *)t2 ? -1 : *(T *)t2 < *(T *)t1 ? 1 : 0; 433 } 434 qsort(arr, dimension, sizeof(T), comp); 435 436 } 437 double vals[10] = { ... }; 438 qsort(vals, 10); // ascending order 439 { 440 int ?<?(double x, double y) { // locally override behaviour 441 return x > y; 442 } 443 qsort(vals, 10); // descending sort 444 } 445 \end{cfacode} 446 Currently, there is no way to \emph{remove} a function from consideration from the duration of a scope. 447 For example, it may be desirable to eliminate assignment from a scope, to reduce accidental mutation. 448 To address this desire, \emph{deleted functions} are a planned feature for \CFA. 449 \begin{cfacode} 450 forall(otype T) void f(T *); 451 452 int x = 0; 453 f(&x); // might modify x 454 { 455 int ?=?(int *, int) = delete; 456 f(&x); // error, no assignment for int 457 } 458 \end{cfacode} 459 Now, if the deleted function is chosen as the best match, the expression resolver emits an error. 323 460 324 461 \section{Invariants} … … 450 587 \end{javacode} 451 588 In Java 7, a new \emph{try-with-resources} construct was added to alleviate most of the pain of working with resources, but ultimately it still places the burden squarely on the user rather than on the library designer. 452 Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources that can throw an exception on close can leak nested resources \ cite{TryWithResources}.589 Furthermore, for complete safety this pattern requires nested objects to be declared separately, otherwise resources that can throw an exception on close can leak nested resources \footnote{Since close is only guaranteed to be called on objects declared in the try-list and not objects passed as constructor parameters, the @B@ object may not be closed in @new A(new B())@ if @A@'s close raises an exception.} \cite{TryWithResources}. 453 590 \begin{javacode} 454 591 public void write(String filename, String msg) throws Exception { … … 521 658 % these are declared in the struct, so they're closer to C++ than to CFA, at least syntactically. Also do not allow for default constructors 522 659 % D has a GC, which already makes the situation quite different from C/C++ 523 The programming language , D,also manages resources with constructors and destructors \cite{D}.524 In D, @struct@s are stack allocat edand managed via scoping like in \CC, whereas @class@es are managed automatically by the garbage collector.660 The programming language D also manages resources with constructors and destructors \cite{D}. 661 In D, @struct@s are stack allocatable and managed via scoping like in \CC, whereas @class@es are managed automatically by the garbage collector. 525 662 Like Java, using the garbage collector means that destructors are called indeterminately, requiring the use of finally statements to ensure dynamically allocated resources that are not managed by the garbage collector, such as open files, are cleaned up. 526 663 Since D supports RAII, it is possible to use the same techniques as in \CC to ensure that resources are released in a timely manner. … … 755 892 756 893 Type-safe variadic functions are added to \CFA and discussed in Chapter 4. 894 895 \section{Contributions} 896 \label{s:contributions} 897 898 No prior work on constructors or destructors had been done for \CFA. 899 I did both the design and implementation work. 900 While the overall design is based on constructors and destructors in object-oriented C++, it had to be re-engineered into non-object-oriented \CFA. 901 I also had to make changes to the \CFA expression-resolver to integrate constructors and destructors into the type system. 902 903 Prior work on the design of tuples for \CFA was done by Till, and some initial implementation work by Esteves. 904 I largely took the Till design but added tuple indexing, which exists in a number of programming languages with tuples, simplified the implicit tuple conversions, and integrated with the \CFA polymorphism and assertion satisfaction model. 905 I did a new implementation of tuples, and extensively 906 augmented initial work by Bilson to incorporate tuples into the \CFA expression-resolver and type-unifier. 907 908 No prior work on variadic functions had been done for \CFA. 909 I did both the design and implementation work. 910 While the overall design is based on variadic templates in C++, my design is novel in the way it is incorporated into the \CFA polymorphism model, and is engineered into \CFA so it dovetails with tuples.
Note: See TracChangeset
for help on using the changeset viewer.