Changeset 397848f5


Ignore:
Timestamp:
Apr 16, 2019, 2:50:33 PM (3 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr
Children:
f1240b0
Parents:
0e54654
Message:

thesis: typos from Gregor

Location:
doc/theses/aaron_moss_PhD/phd
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/background.tex

    r0e54654 r397848f5  
    55To provide background for the contributions in subsequent chapters, this chapter provides a summary of the features of \CFA{} at the time this work was conducted.
    66
    7 The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
     7Glen Ditchfield laid out the core design of \CFA{} in his 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
    88Building on Ditchfield's design for contextual polymorphism as well as KW-C \cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson \cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
    99This early \CFACC{} provided basic functionality, but incorporated a number of algorithmic choices that have failed to scale as \CFA{} has developed, lacking the runtime performance for practical use; this thesis is substantially concerned with rectifying those deficits.
     
    2727
    2828Another aspect of \CFA{}'s procedural paradigm is that it retains C's translation-unit-based encapsulation model, rather than class-based encapsulation such as in \CC{}.
    29 This design choice implies that separate compilation must be maintained to allow headers to act as an encapsulation boundary, rather than the header-only libraries used by \CC{} templates.
     29As such, any language feature that requires code to be exposed in header files (\eg{} \CC{} templates) also eliminates encapsulation in \CFA{}.
     30Given this constraint, \CFA{} is carefully designed to allow separate compilation for its added language features under the existing C usage patterns.
    3031
    3132\section{Name Overloading} \label{overloading-sec}
     
    115116\subsection{Type Assertions}
    116117
    117 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{Subscript not included in source code.\label{sub-foot}}:
     118Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{This example introduces a convention used throughout this thesis of disambiguating overloaded names with subscripts; the subscripts do not appear in \CFA{} source code.}:
    118119
    119120\begin{cfa}
     
    129130
    130131Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
    131 For instance, !twice! could have been defined like this\footref{sub-foot}:
     132For instance, !twice! could have been defined like this:
    132133
    133134\begin{cfa}
     
    241242Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer.
    242243These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy aliasing of nested values with a similarly convenient syntax.
    243 A particular improvement is removing syntactic special cases for operators that take or return mutable values; for example, the use !a += b! of a compound assignment operator now matches its signature, !int& ?+=?(int&, int)!, as opposed to the syntactic special cases in earlier versions of \CFA{} to automatically take the address of the first argument to !+=! and to mark its return value as mutable.
     244The addition of reference types also eliminated two syntactic special-cases present in previous versions of \CFA{}.
     245Considering a call !a += b! to a compound assignment operator, the previous declaration for that operator was !lvalue int ?+=?(int*, int)! -- to mutate the left argument, the built-in operators were special-cased to implicitly take the address of that argument, while the special !lvalue! syntax was used to mark the return type of a function as a mutable reference.
     246With references, this declaration can be re-written as !int& ?+=?(int&, int)! -- the reference semantics generalize the implicit address-of on the left argument and allow it to be used in user-declared functions, while also subsuming the (now removed) !lvalue! syntax for function return types.
    244247
    245248The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named in the standard.
     
    276279For better compatibility with C, the \CFA{} team has chosen not to differentiate function overloads based on top-level reference types, and as such their contribution to the difficulty of \CFA{} expression resolution is largely restricted to the implementation details of normalization conversions and adapters.
    277280
    278 \subsection{Resource Management}
     281\subsection{Resource Management} \label{ctor-sec}
    279282
    280283\CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz \cite{Schluntz17}.
  • doc/theses/aaron_moss_PhD/phd/experiments.tex

    r0e54654 r397848f5  
    77
    88\CFACC{} can generate realistic test inputs for the resolver prototype from equivalent \CFA{} code;
    9 the generated test inputs currently comprise all \CFA{} code currently extant, $9,000$ lines drawn primarily from the standard library and compiler test suite.
     9the generated test inputs currently comprise all \CFA{} code currently in existence, $9,000$ lines drawn primarily from the standard library and compiler test suite.
    1010\CFACC{} is also instrumented to produce a number of code metrics.
    1111These metrics were used to construct synthetic test inputs during development of the resolver prototype; these synthetic inputs provided useful design guidance, but the performance results presented in this chapter are based on the more realistic directly-generated inputs.
     
    170170This instrumented resolver is then run on a set of difficult test instances; to limit the data collection task, these runs are restricted to the best-performing \textsc{bu-dca-per} algorithm and test inputs taking more than 1~s to complete.
    171171
    172 The 13 test inputs thus selected contain 20632 top-level expressions among them, which are separated into order-of-magnitude bins by runtime in Figure~\ref{per-prob-histo-fig}.
     172The 13 test inputs thus selected contain 20,632 top-level expressions among them, which are separated into order-of-magnitude bins by runtime in Figure~\ref{per-prob-histo-fig}.
    173173As can be seen from this figure, overall runtime is dominated by a few particularly difficult problem instances --- the 60\% of expressions that resolve in under 0.1~ms collectively take less time to resolve than any of the 0.2\% of expressions that take at least 100~ms to resolve.
    174174On the other hand, the 46 expressions in that 0.2\% take 38\% of the overall time in this difficult test suite, while the 201 expressions that take between 10 and 100~ms to resolve consume another 30\%.
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    r0e54654 r397848f5  
    333333\end{cfa}
    334334
    335 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
     335Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code\footnote{A GCC extension allows arithmetic on \lstinline{void*}, calculated as if \lstinline{sizeof(void) == 1}.}), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
    336336!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as:
    337337
     
    350350Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller does not know how large to allocate the array of member offsets.
    351351
    352 The C standard does not specify a memory layout for structs, but the System V ABI \cite{SysVABI} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
     352The C standard does not specify a memory layout for structs, but the System V ABI \cite{SysVABI} does; compatibility with this standard is sufficient for \CFA{}'s currently-supported architectures, though future ports may require different layout-function generation algorithms.
    353353This algorithm, sketched below in pseudo-\CFA{}, is a straightforward mapping of consecutive fields into the first properly-aligned offset in the !struct! layout; layout functions for !union! types omit the offset array and simply calculate the maximum size and alignment over all union variants.
    354354Since \CFACC{} generates a distinct layout function for each type, constant-folding and loop unrolling are applied.
     
    423423
    424424To validate the practicality of this generic type design, microbenchmark-based tests were conducted against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.
    425 Since all these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
     425Since these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
    426426A more illustrative comparison measures the costs of idiomatic usage of each language's features.
    427427The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages, code for which is included in Appendix~\ref{generic-bench-app}.
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    r0e54654 r397848f5  
    55% "valid" instead of "feasible" interpretations
    66
    7 The main task of the \CFACC{} type-checker is \emph{expression resolution}, determining which declarations the identifiers in each expression correspond to.
    8 Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
     7The main task of the \CFACC{} type-checker is \emph{expression resolution}: determining which declarations the identifiers in each expression correspond to.
     8Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
    99A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
    1010To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
     
    122122\subsection{Expression Cost} \label{expr-cost-sec}
    123123
    124 The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged modulo the refinements to the cost tuple described above.
     124The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged with the exception of the refinements to the cost tuple described above.
    125125Nonetheless, some salient details are repeated here for the sake of completeness.
    126126
     
    129129In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation.
    130130Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type.
    131 Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types (!42! is !int!, !"hello"! is !char*!, \etc{}), though struct literals (\eg{} !(S){ 1, 2, 3 }! for some struct !S!) require resolution of the implied constructor call.
     131Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S}) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}.
    132132
    133133Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution.
     
    147147\end{cfa}
    148148
    149 Considered independently, !g!$_1$!(42)! is the cheapest interpretation of !g(42)!, with cost $(0,0,0,0,0,0)$ since the argument type is an exact match.
    150 However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0)$ for !f( g!$_1$!(42) )!.
    151 If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.
     149Considered independently, !g!$_1$!(42)! is the cheapest interpretation of !g(42)!, with cost $(0,0,0,0,0,0,0)$ since the argument type is an exact match.
     150However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0,0)$ for !f( g!$_1$!(42) )!.
     151If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.
    152152Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C, where each expression can be resolved given only the resolution of its immediate subexpressions.
    153153
  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    r0e54654 r397848f5  
    1212For purposes of this chapter, a \emph{type environment} $T$ is a set of \emph{type classes} $\myset{T_1, T_2, \cdots, T_{|T|}}$.
    1313Each type class $T_i$ contains a set of \emph{type variables} $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$.
    14 Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be disjoint.
     14Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be \emph{disjoint}.
    1515Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} that the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
    1616
     
    4242The other two query operations act on type classes, where $report(T_i)$ produces the set $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$ of all type variables in a class $T_i$ and $bound(T_i)$ produces the bound $b_i$ of that class, or a sentinel indicating no bound is set.
    4343
    44 The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ cannot belong to any other type class in $T$.
     44The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ must not belong to any other type class in $T$.
    4545The $add(T_i, v_{i,j})$ operation adds a new type variable $v_{i,j}$ to class $T_i$; again, $v_{i,j}$ cannot exist elsewhere in $T$.
    4646$bind(T_i, b_i)$ mutates the bound for a type class, setting or updating the current bound.
     
    106106In particular, the type-class bound cannot be easily included in the union-find data structure, as the requirement to make it the class representative breaks the balancing properties of $union$, and requires too-close integration of the type environment $unifyBound$ internal operation.
    107107This issue can be solved by including a side map from class representatives to the type-class bound.
    108 If placeholder values are inserted in this map for type classes without bounds than this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.
     108If placeholder values are inserted in this map for type classes without bounds then this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.
    109109
    110110\subsection{Union-Find with Classes} \label{env-union-find-classes-approach}
Note: See TracChangeset for help on using the changeset viewer.