Changeset ec71a50 for doc/theses/aaron_moss_PhD/phd/background.tex
- Timestamp:
- Sep 21, 2018, 11:26:31 PM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer, pthread-emulation, qualifiedEnum
- Children:
- 3b1825b, fcc57ba
- Parents:
- 031a88a9 (diff), 371ef1d (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/aaron_moss_PhD/phd/background.tex
r031a88a9 rec71a50 1 \chapter{ Background}1 \chapter{\CFA{}} 2 2 3 3 \CFA{} adds a number of features to C, some of them providing significant increases to the expressive power of the language, but all designed to maintain the existing procedural programming paradigm of C and to be as orthogonal as possible to each other. … … 21 21 It is important to note that \CFA{} is not an object-oriented language. 22 22 This is a deliberate choice intended to maintain the applicability of the mental model and language idioms already possessed by C programmers. 23 This choice is in marked contrast to \CC{}, which, though it has backward-compatibility with C on the source code level, is a much larger and more complex language, and requires extensive developer re-training before they canwrite idiomatic, efficient code in \CC{}'s object-oriented paradigm.23 This choice is in marked contrast to \CC{}, which, though it has backward-compatibility with C on the source code level, is a much larger and more complex language, and requires extensive developer re-training to write idiomatic, efficient code in \CC{}'s object-oriented paradigm. 24 24 25 25 \CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions''; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best. … … 62 62 struct counter { int x; }; 63 63 64 counter& `++?`(counter& c) { ++c.x; return c; } $\C {// pre-increment}$65 counter `?++`(counter& c) { $\C {// post-increment}$64 counter& `++?`(counter& c) { ++c.x; return c; } $\C[2in]{// pre-increment}$ 65 counter `?++`(counter& c) { $\C[2in]{// post-increment}$ 66 66 counter tmp = c; ++c; return tmp; 67 67 } 68 bool `?<?`(const counter& a, const counter& b) { $\C {// comparison}$68 bool `?<?`(const counter& a, const counter& b) { $\C[2in]{// comparison}$ 69 69 return a.x < b.x; 70 70 } … … 73 73 Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions''\cit{}, so as to present user programmers with only a single set of overloading rules. 74 74 75 \subsection{Polymorphic Functions} 75 \subsubsection{Special Literal Types} 76 77 Literal !0! is also used polymorphically in C; it may be either integer zero or the null value of any pointer type. 78 \CFA{} provides a special type for the !0! literal, !zero_t!, so that users can define a zero value for their own types without being forced to create a conversion from an integer or pointer type (though \CFA{} also includes implicit conversions from !zero_t! to the integer and pointer types for backward compatibility). 79 80 According to the C standard\cit{}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true. 81 By this rule, boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}. 82 \CFACC{} applies this rewriting in all boolean contexts, so any type !T! can be made ``truthy'' (that is, given a boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}; unlike \CC{} prior to the addition of explicit casts in \CCeleven{}, this design does not add comparability or convertablity to arbitrary integer types. 83 84 \CFA{} also includes a special type for !1!, !one_t!; like !zero_t!, !one_t! has built-in implicit conversions to the various integral types so that !1! maintains its expected semantics in legacy code. 85 The addition of !one_t! allows generic algorithms to handle the unit value uniformly for types where it is meaningful; a simple example of this is that polymorphic functions\footnote{discussed in Section~\ref{poly-func-sec}} in the \CFA{} prelude define !++x! and !x++! in terms of !x += 1!, allowing users to idiomatically define all forms of increment for a type !T! by defining the single function !T& ?+=?(T&, one_t)!; analogous overloads for the decrement operators are also present, and programmers can override any of these functions for a particular type if desired. 86 87 \CFA{} previously allowed !0! and !1! to be the names of polymorphic variables, with separate overloads for !int 0!, !int 1!, and !forall(dtype T) T* 0!. 88 As revealed in my own work on generic types (Chapter~\ref{generic-chap}), the parameteric polymorphic zero variable was not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types. 89 As such, variables that could represent !0! and !1! were phased out in favour of functions that could generate those values for a given type as appropriate. 90 91 \subsection{Polymorphic Functions} \label{poly-func-sec} 76 92 77 93 The most significant feature \CFA{} adds is parametric-polymorphic functions. … … 91 107 One benefit of this design is that it allows polymorphic functions to be separately compiled. 92 108 The forward declaration !forall(otype T) T identity(T);! uniquely defines a single callable function, which may be implemented in a different file. 93 The fact that there is only one implementation of each polymorphic function also reduces compile times relative to the template-expansion approach taken by \CC{}, as well as reducing binary sizes and runtime pressure on instruction cache atby re-using a single version of each function.109 The fact that there is only one implementation of each polymorphic function also reduces compile times relative to the template-expansion approach taken by \CC{}, as well as reducing binary sizes and runtime pressure on instruction cache by re-using a single version of each function. 94 110 95 111 \subsubsection{Type Assertions} … … 117 133 118 134 This version of !twice! works for any type !S! that has an addition operator defined for it, and it could be used to satisfy the type assertion on !four_times!. 119 \CFACC{} accomplishes this by creating a wrapper function calling !twice //(2)! with !S! bound to !double!, then providing this wrapper function to !four_times!\footnote{\lstinline{twice // (2)} could also have had a type parameter named \lstinline{T}; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline{T} of \lstinline{four_times}}.120 121 Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration \emph{in the current scope}.122 If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own typeassertion unboundedly repeatedly.135 \CFACC{} accomplishes this by creating a wrapper function calling !twice//(2)! with !S! bound to !double!, then providing this wrapper function to !four_times!\footnote{\lstinline{twice // (2)} could also have had a type parameter named \lstinline{T}; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline{T} of \lstinline{four_times}}. 136 137 Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope. 138 If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own assertion unboundedly repeatedly. 123 139 To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some semantically well-typed expressions that cannot be resolved by \CFACC{}. 124 \TODO{Update this with final state} One contribution made in the course of this thesis was modifying \CFACC{} to use the more flexible expression resolution algorithm for assertion matching, rather than the previous simplerapproach of unification on the types of the functions.140 \TODO{Update this with final state} One contribution made in the course of this thesis was modifying \CFACC{} to use the more flexible expression resolution algorithm for assertion matching, rather than the simpler but limited previous approach of unification on the types of the functions. 125 141 126 142 \subsubsection{Deleted Declarations} … … 175 191 \begin{cfa} 176 192 trait pointer_like(`otype Ptr, otype El`) { 177 El& *?(Ptr); $\C{ Ptr can be dereferenced to El}$193 El& *?(Ptr); $\C{// Ptr can be dereferenced to El}$ 178 194 }; 179 195 180 196 struct list { 181 197 int value; 182 list* next; $\C{ may omit struct on type names}$198 list* next; $\C{// may omit struct on type names}$ 183 199 }; 184 200 … … 200 216 201 217 In addition to the multiple interpretations of an expression produced by name overloading and polymorphic functions, for backward compatibility \CFA{} must support all of the implicit conversions present in C, producing further candidate interpretations for expressions. 202 As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions' '\cit{} define which of the built-in tyhpes are implicitly convertable to which other types, and the relative cost of any pair of such conversions from a single source type.203 \CFA{} adds to the usual arithmetic conversions rules defining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg{} !int! to !char!, but more expensive than any \emph{safe} (widening) conversion, \eg{} !int! to !double!.218 As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'\cit{} define which of the built-in types are implicitly convertible to which other types, and the relative cost of any pair of such conversions from a single source type. 219 \CFA{} adds rules to the usual arithmetic conversions defining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg{} !int! to !char!, but more expensive than any \emph{safe} (widening) conversion, \eg{} !int! to !double!. 204 220 One contribution of this thesis, discussed in Section \TODO{add to resolution chapter}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls. 205 221 … … 208 224 Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate. 209 225 For instance, in the example in Section~\ref{overloading-sec}, !max(max, -max)! cannot be unambiguously resolved, but !int m = max(max, -max)! has a single minimal-cost resolution. 210 While the interpretation !int m = (int)max((double)max, -(double)max)! is also a valid interpretation, it is not minimal-cost due to the unsafe cast from the !double! result of !max! to the!int!\footnote{The two \lstinline{double} casts function as type ascriptions selecting \lstinline{double max} rather than casts from \lstinline{int max} to \lstinline{double}, and as such are zero-cost.}.211 These contextual effects make the expression resolution problem for \CFA{} both theoretically and practically difficult, but the observation driving the work in Chapter~\ref{resolution-chap} is that of the many top-level expressions in a given program, most will likely be straightforward and idiomatic so that programmers writing and maintaining the code can easily understand them; it follows that effective heuristics for common cases can bring down compiler runtime enough that a small proportion of harder-to-resolve expressions should not increase compiler runtime or memory usage inordinately.226 While the interpretation !int m = (int)max((double)max, -(double)max)! is also a valid interpretation, it is not minimal-cost due to the unsafe cast from the !double! result of !max! to !int!\footnote{The two \lstinline{double} casts function as type ascriptions selecting \lstinline{double max} rather than casts from \lstinline{int max} to \lstinline{double}, and as such are zero-cost.}. 227 These contextual effects make the expression resolution problem for \CFA{} both theoretically and practically difficult, but the observation driving the work in Chapter~\ref{resolution-chap} is that of the many top-level expressions in a given program, most are straightforward and idiomatic so that programmers writing and maintaining the code can easily understand them; it follows that effective heuristics for common cases can bring down compiler runtime enough that a small proportion of harder-to-resolve expressions does not inordinately increase overall compiler runtime or memory usage. 212 228 213 229 \subsection{Type Features} \label{type-features-sec} 214 230 231 The name overloading and polymorphism features of \CFA{} have the greatest effect on language design and compiler runtime, but there are a number of other features in the type system which have a smaller effect but are useful for code examples. 232 These features are described here. 233 215 234 \subsubsection{Reference Types} 216 235 217 % TODO mention contribution on reference rebind 218 219 \subsubsection{Lifetime Management} 220 221 \subsubsection{0 and 1 Literals} 236 One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz\cite{Schluntz17}. 237 Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer. 238 These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy easy aliasing of nested values with a similarly convenient syntax. 239 A particular improvement is removing syntactic special cases for operators which take or return mutable values; for example, the use !a += b! of a compound assignment operator now matches its signature, !int& ?+=?(int&, int)!, as opposed to the previous syntactic special cases to automatically take the address of the first argument to !+=! and to mark its return value as mutable. 240 241 The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named. 242 In \CFA{}, the distinction between lvalue and rvalue can be reframed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code. 243 \CFA{} references preserve the existing qualifier-dropping implicit lvalue-to-rvalue conversion from C (\eg{} a !const volatile int&! can be implicitly copied to a bare !int!) 244 To make reference types more easily usable in legacy pass-by-value code, \CFA{} also adds an implicit rvalue-to-lvalue conversion, implemented by storing the value in a fresh compiler-generated temporary variable and passing a reference to that temporary. 245 To mitigate the ``!const! hell'' problem present in \CC{}, there is also a qualifier-dropping lvalue-to-lvalue conversion, also implemented by copying into a temporary: 246 247 \begin{cfa} 248 const int magic = 42; 249 250 void inc_print( int& x ) { printf("%d\n", ++x); } 251 252 print_inc( magic ); $\C{// legal; implicitly generated code in red below:}$ 253 254 `int tmp = magic;` $\C{// to safely strip const-qualifier}$ 255 `print_inc( tmp );` $\C{// tmp is incremented, magic is unchanged}$ 256 \end{cfa} 257 258 Despite the similar syntax, \CFA{} references are significantly more flexible than \CC{} references. 259 The primary issue with \CC{} references is that it is impossible to extract the address of the reference variable rather than the address of the referred-to variable. 260 This breaks a number of the usual compositional properties of the \CC{} type system, \eg{} a reference cannot be re-bound to another variable, nor is it possible to take a pointer to, array of, or reference to a reference. 261 \CFA{} supports all of these use cases \TODO{test array} without further added syntax. 262 The key to this syntax-free feature support is an observation made by the author that the address of a reference is a lvalue. 263 In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference, and pointers to references by repeating the address-of operator: 264 265 \begin{cfa} 266 int x = 2, y = 3; 267 int& r = x; $\C{// r aliases x}$ 268 &r = &y; $\C{// r now aliases y}$ 269 int** p = &&r; $\C{// p points to r}$ 270 \end{cfa} 271 272 For better compatibility with C, the \CFA{} team has chosen not to differentiate function overloads based on top-level reference types, and as such their contribution to the difficulty of \CFA{} expression resolution is largely restricted to the implementation details of normalization conversions and adapters. 273 274 \subsubsection{Resource Management} 275 276 \CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz\cite{Schluntz17}. 277 This idiom allows a safer and more principled approach to resource management by tying acquisition of a resource to object initialization, with the corresponding resource release executed automatically at object finalization. 278 A wide variety of conceptual resources may be conveniently managed by this scheme, including heap memory, file handles, and software locks. 279 280 \CFA{}'s implementation of RAII is based on special constructor and destructor operators, available via the !x{ ... }! constructor syntax and !^x{ ... }! destructor syntax. 281 Each type has an overridable compiler-generated zero-argument constructor, copy constructor, assignment operator, and destructor, as well as a field-wise constructor for each appropriate prefix of the member fields of !struct! types. 282 For !struct! types the default versions of these operators call their equivalents on each field of the !struct!. 283 The main implication of these object lifetime functions for expression resolution is that they are all included as implicit type assertions for !otype! type variables, with a secondary effect being an increase in code size due to the compiler-generated operators. 284 Due to these implicit type assertions, assertion resolution is pervasive in \CFA{} polymorphic functions, even those without explicit type assertions. 285 Implicitly-generated code is shown in red in the following example: 286 287 \begin{cfa} 288 struct kv { 289 int key; 290 char* value; 291 }; 292 293 `void ?{} (kv& this) {` $\C[3in]{// default constructor}$ 294 ` this.key{};` $\C[3in]{// call recursively on members}$ 295 ` this.value{}; 296 } 297 298 void ?{} (kv& this, int key) {` $\C[3in]{// partial field constructor}$ 299 ` this.key{ key }; 300 this.value{};` $\C[3in]{// default-construct missing fields}$ 301 `} 302 303 void ?{} (kv& this, int key, char* value) {` $\C[3in]{// complete field constructor}$ 304 ` this.key{ key }; 305 this.value{ value }; 306 } 307 308 void ?{} (kv& this, kv that) {` $\C[3in]{// copy constructor}$ 309 ` this.key{ that.key }; 310 this.value{ that.value }; 311 } 312 313 kv ?=? (kv& this, kv that) {` $\C[3in]{// assignment operator}$ 314 ` this.key = that.key; 315 this.value = that.value; 316 } 317 318 void ^?{} (kv& this) {` $\C[3in]{// destructor}$ 319 ` ^this.key{}; 320 ^this.value{}; 321 }` 322 323 forall(otype T `| { void ?{}(T&); void ?{}(T&, T); T ?=?(T&, T); void ^?{}(T&); }`) 324 void foo(T); 325 \end{cfa} 326 327 \subsubsection{Tuple Types} 328 329 \CFA{} adds \emph{tuple types} to C, a syntactic facility for referring to lists of values anonymously or with a single identifier. 330 An identifier may name a tuple, a function may return one, and a tuple may be implicitly \emph{destructured} into its component values. 331 The implementation of tuples in \CFACC{}'s code generation is based on the generic types introduced in Chapter~\ref{generic-chap}, with one compiler-generated generic type for each tuple arity. 332 This allows tuples to take advantage of the same runtime optimizations available to generic types, while reducing code bloat. 333 An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example shows the basics: 334 335 \begin{cfa} 336 [char, char] x = ['!', '?']; $\C{// (1); tuple type and expression syntax}$ 337 int x = 2; $\C{// (2)}$ 338 339 forall(otype T) 340 [T, T] swap( T a, T b ) { $\C{// (3)}$ 341 return [b, a]; $\C{// one-line swap syntax}$ 342 } 343 344 x = swap( x ); $\C{// destructure [char, char] x into two elements}$ 345 $\C{// cannot use int x, not enough arguments}$ 346 347 void swap( int, char, char ); $\C{// (4)}$ 348 349 swap( x, x ); $\C{// (4) on (2), (1)}$ 350 $\C{// not (3) on (2), (2) due to polymorphism cost}$ 351 \end{cfa} 352 353 Tuple destructuring breaks the one-to-one relationship between identifiers and values. 354 This precludes some argument-parameter matching strategies for expression resolution, as well as cheap interpretation filters based on comparing number of parameters and arguments. 355 As an example, in the call to !swap( x, x )! above, the second !x! can be resolved starting at the second or third parameter of !swap!, depending which interpretation of !x! was chosen for the first argument.
Note: See TracChangeset
for help on using the changeset viewer.