Changeset 397848f5 for doc/theses/aaron_moss_PhD/phd
- Timestamp:
- Apr 16, 2019, 2:50:33 PM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- f1240b0
- Parents:
- 0e54654
- Location:
- doc/theses/aaron_moss_PhD/phd
- Files:
-
- 5 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/aaron_moss_PhD/phd/background.tex
r0e54654 r397848f5 5 5 To provide background for the contributions in subsequent chapters, this chapter provides a summary of the features of \CFA{} at the time this work was conducted. 6 6 7 The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.7 Glen Ditchfield laid out the core design of \CFA{} in his 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model. 8 8 Building on Ditchfield's design for contextual polymorphism as well as KW-C \cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson \cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's. 9 9 This early \CFACC{} provided basic functionality, but incorporated a number of algorithmic choices that have failed to scale as \CFA{} has developed, lacking the runtime performance for practical use; this thesis is substantially concerned with rectifying those deficits. … … 27 27 28 28 Another aspect of \CFA{}'s procedural paradigm is that it retains C's translation-unit-based encapsulation model, rather than class-based encapsulation such as in \CC{}. 29 This design choice implies that separate compilation must be maintained to allow headers to act as an encapsulation boundary, rather than the header-only libraries used by \CC{} templates. 29 As such, any language feature that requires code to be exposed in header files (\eg{} \CC{} templates) also eliminates encapsulation in \CFA{}. 30 Given this constraint, \CFA{} is carefully designed to allow separate compilation for its added language features under the existing C usage patterns. 30 31 31 32 \section{Name Overloading} \label{overloading-sec} … … 115 116 \subsection{Type Assertions} 116 117 117 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{ Subscript not included in source code.\label{sub-foot}}:118 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{This example introduces a convention used throughout this thesis of disambiguating overloaded names with subscripts; the subscripts do not appear in \CFA{} source code.}: 118 119 119 120 \begin{cfa} … … 129 130 130 131 Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions. 131 For instance, !twice! could have been defined like this \footref{sub-foot}:132 For instance, !twice! could have been defined like this: 132 133 133 134 \begin{cfa} … … 241 242 Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer. 242 243 These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy aliasing of nested values with a similarly convenient syntax. 243 A particular improvement is removing syntactic special cases for operators that take or return mutable values; for example, the use !a += b! of a compound assignment operator now matches its signature, !int& ?+=?(int&, int)!, as opposed to the syntactic special cases in earlier versions of \CFA{} to automatically take the address of the first argument to !+=! and to mark its return value as mutable. 244 The addition of reference types also eliminated two syntactic special-cases present in previous versions of \CFA{}. 245 Considering a call !a += b! to a compound assignment operator, the previous declaration for that operator was !lvalue int ?+=?(int*, int)! -- to mutate the left argument, the built-in operators were special-cased to implicitly take the address of that argument, while the special !lvalue! syntax was used to mark the return type of a function as a mutable reference. 246 With references, this declaration can be re-written as !int& ?+=?(int&, int)! -- the reference semantics generalize the implicit address-of on the left argument and allow it to be used in user-declared functions, while also subsuming the (now removed) !lvalue! syntax for function return types. 244 247 245 248 The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named in the standard. … … 276 279 For better compatibility with C, the \CFA{} team has chosen not to differentiate function overloads based on top-level reference types, and as such their contribution to the difficulty of \CFA{} expression resolution is largely restricted to the implementation details of normalization conversions and adapters. 277 280 278 \subsection{Resource Management} 281 \subsection{Resource Management} \label{ctor-sec} 279 282 280 283 \CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz \cite{Schluntz17}. -
doc/theses/aaron_moss_PhD/phd/experiments.tex
r0e54654 r397848f5 7 7 8 8 \CFACC{} can generate realistic test inputs for the resolver prototype from equivalent \CFA{} code; 9 the generated test inputs currently comprise all \CFA{} code currently extant, $9,000$ lines drawn primarily from the standard library and compiler test suite.9 the generated test inputs currently comprise all \CFA{} code currently in existence, $9,000$ lines drawn primarily from the standard library and compiler test suite. 10 10 \CFACC{} is also instrumented to produce a number of code metrics. 11 11 These metrics were used to construct synthetic test inputs during development of the resolver prototype; these synthetic inputs provided useful design guidance, but the performance results presented in this chapter are based on the more realistic directly-generated inputs. … … 170 170 This instrumented resolver is then run on a set of difficult test instances; to limit the data collection task, these runs are restricted to the best-performing \textsc{bu-dca-per} algorithm and test inputs taking more than 1~s to complete. 171 171 172 The 13 test inputs thus selected contain 20 632 top-level expressions among them, which are separated into order-of-magnitude bins by runtime in Figure~\ref{per-prob-histo-fig}.172 The 13 test inputs thus selected contain 20,632 top-level expressions among them, which are separated into order-of-magnitude bins by runtime in Figure~\ref{per-prob-histo-fig}. 173 173 As can be seen from this figure, overall runtime is dominated by a few particularly difficult problem instances --- the 60\% of expressions that resolve in under 0.1~ms collectively take less time to resolve than any of the 0.2\% of expressions that take at least 100~ms to resolve. 174 174 On the other hand, the 46 expressions in that 0.2\% take 38\% of the overall time in this difficult test suite, while the 201 expressions that take between 10 and 100~ms to resolve consume another 30\%. -
doc/theses/aaron_moss_PhD/phd/generic-types.tex
r0e54654 r397848f5 333 333 \end{cfa} 334 334 335 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code ), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.335 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code\footnote{A GCC extension allows arithmetic on \lstinline{void*}, calculated as if \lstinline{sizeof(void) == 1}.}), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types. 336 336 !_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as: 337 337 … … 350 350 Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller does not know how large to allocate the array of member offsets. 351 351 352 The C standard does not specify a memory layout for structs, but the System V ABI \cite{SysVABI} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.352 The C standard does not specify a memory layout for structs, but the System V ABI \cite{SysVABI} does; compatibility with this standard is sufficient for \CFA{}'s currently-supported architectures, though future ports may require different layout-function generation algorithms. 353 353 This algorithm, sketched below in pseudo-\CFA{}, is a straightforward mapping of consecutive fields into the first properly-aligned offset in the !struct! layout; layout functions for !union! types omit the offset array and simply calculate the maximum size and alignment over all union variants. 354 354 Since \CFACC{} generates a distinct layout function for each type, constant-folding and loop unrolling are applied. … … 423 423 424 424 To validate the practicality of this generic type design, microbenchmark-based tests were conducted against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}. 425 Since allthese languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.425 Since these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code. 426 426 A more illustrative comparison measures the costs of idiomatic usage of each language's features. 427 427 The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages, code for which is included in Appendix~\ref{generic-bench-app}. -
doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex
r0e54654 r397848f5 5 5 % "valid" instead of "feasible" interpretations 6 6 7 The main task of the \CFACC{} type-checker is \emph{expression resolution} ,determining which declarations the identifiers in each expression correspond to.8 Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.7 The main task of the \CFACC{} type-checker is \emph{expression resolution}: determining which declarations the identifiers in each expression correspond to. 8 Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier. 9 9 A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations. 10 10 To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations. … … 122 122 \subsection{Expression Cost} \label{expr-cost-sec} 123 123 124 The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged modulothe refinements to the cost tuple described above.124 The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged with the exception of the refinements to the cost tuple described above. 125 125 Nonetheless, some salient details are repeated here for the sake of completeness. 126 126 … … 129 129 In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation. 130 130 Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type. 131 Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types (!42! is !int!, !"hello"! is !char*!, \etc{}), though struct literals (\eg{} !(S){ 1, 2, 3 }! for some struct !S!) require resolution of the implied constructor call.131 Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S}) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}. 132 132 133 133 Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution. … … 147 147 \end{cfa} 148 148 149 Considered independently, !g!$_1$!(42)! is the cheapest interpretation of !g(42)!, with cost $(0,0,0,0,0,0 )$ since the argument type is an exact match.150 However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0 )$ for !f( g!$_1$!(42) )!.151 If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0 )$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.149 Considered independently, !g!$_1$!(42)! is the cheapest interpretation of !g(42)!, with cost $(0,0,0,0,0,0,0)$ since the argument type is an exact match. 150 However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0,0)$ for !f( g!$_1$!(42) )!. 151 If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen. 152 152 Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C, where each expression can be resolved given only the resolution of its immediate subexpressions. 153 153 -
doc/theses/aaron_moss_PhD/phd/type-environment.tex
r0e54654 r397848f5 12 12 For purposes of this chapter, a \emph{type environment} $T$ is a set of \emph{type classes} $\myset{T_1, T_2, \cdots, T_{|T|}}$. 13 13 Each type class $T_i$ contains a set of \emph{type variables} $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$. 14 Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be disjoint.14 Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be \emph{disjoint}. 15 15 Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} that the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples). 16 16 … … 42 42 The other two query operations act on type classes, where $report(T_i)$ produces the set $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$ of all type variables in a class $T_i$ and $bound(T_i)$ produces the bound $b_i$ of that class, or a sentinel indicating no bound is set. 43 43 44 The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ cannot belong to any other type class in $T$.44 The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ must not belong to any other type class in $T$. 45 45 The $add(T_i, v_{i,j})$ operation adds a new type variable $v_{i,j}$ to class $T_i$; again, $v_{i,j}$ cannot exist elsewhere in $T$. 46 46 $bind(T_i, b_i)$ mutates the bound for a type class, setting or updating the current bound. … … 106 106 In particular, the type-class bound cannot be easily included in the union-find data structure, as the requirement to make it the class representative breaks the balancing properties of $union$, and requires too-close integration of the type environment $unifyBound$ internal operation. 107 107 This issue can be solved by including a side map from class representatives to the type-class bound. 108 If placeholder values are inserted in this map for type classes without bounds th an this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.108 If placeholder values are inserted in this map for type classes without bounds then this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements. 109 109 110 110 \subsection{Union-Find with Classes} \label{env-union-find-classes-approach}
Note: See TracChangeset
for help on using the changeset viewer.