Ignore:
Timestamp:
Apr 24, 2019, 10:49:48 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
f343c6b, f845e80
Parents:
5d3a952
Message:

thesis: spelling and grammar fixes

Location:
doc/theses/aaron_moss_PhD/phd
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/background.tex

    r5d3a952 r69c37cc  
    112112The type variable !T! is transformed into a set of additional implicit parameters to !identity!, which encode sufficient information about !T! to create and return a variable of that type.
    113113\CFA{} passes the size and alignment of the type represented by an !otype! parameter, as well as a default constructor, copy constructor, assignment operator, and destructor.
    114 Types which do not have one or more of these operators visible cannot be bound to !otype! parameters, but may be bound to un-constrained !dtype! (``data type'') type variables.
     114Types that do not have one or more of these operators visible cannot be bound to !otype! parameters, but may be bound to un-constrained !dtype! (``data type'') type variables.
    115115In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; the experiments in Chapter~\ref{generic-chap} indicate that this overhead is comparable to that of \CC{} virtual function calls.
    116116% \TODO{rerun experiments, possibly look at vtable variant}
     
    143143\end{cfa}
    144144
    145 Specializing this polymorphic function with !S = double! produces a monomorphic function which can  be used to satisfy the type assertion on !four_times!.
     145Specializing this polymorphic function with !S = double! produces a monomorphic function which can be used to satisfy the type assertion on !four_times!.
    146146\CFACC{} accomplishes this by creating a wrapper function calling !twice!$_2$ with !S! bound to !double!, then providing this wrapper function to !four_times!\footnote{\lstinline{twice}$_2$ could also have had a type parameter named \lstinline{T}; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline{T} of \lstinline{four_times}}.
    147147However, !twice!$_2$ also works for any type !S! that has an addition operator defined for it.
  • doc/theses/aaron_moss_PhD/phd/conclusion.tex

    r5d3a952 r69c37cc  
    99Further improvements to the \CFA{} type system are still possible, however.
    1010One area suggested by this work is development of a scheme for user-defined conversions; to integrate properly with the \CFA{} conversion model, there would need to be a distinction between safe and unsafe conversions, and possibly a way to denote conversions as explicit-only or non-chainable.
    11 Another place for ongoing effort is improvement of compilation performance; I believe the most promising direction for that is rebuilding the \CFA{} compiler on a different framework than Bilson's \CFACC{}.
     11Another place for ongoing effort is improvement of compilation performance; I believe the most promising direction for that effort is rebuilding the \CFA{} compiler on a different framework than Bilson's \CFACC{}.
    1212The resolver prototype presented in this work has good performance and already has the basics of \CFA{} semantics implemented, as well as many of the necessary core data structures, and would be a viable candidate for a new compiler architecture.
    1313An alternate approach would be to fork an existing C compiler such as Clang~\cite{Clang}, which would need to be modified to use one of the resolution algorithms discussed here, as well as various other features introduced by Bilson~\cite{Bilson03}.
  • doc/theses/aaron_moss_PhD/phd/experiments.tex

    r5d3a952 r69c37cc  
    4242The main area for future expansion in the design of the resolver prototype is conversions.
    4343Cast expressions are implemented in the output language of the resolver, but cannot be expressed in the input.
    44 The only implicit conversions supported are among the arithmetic-like concrete types, which captures most, but not all, of \CFA{}'s built-in implicit conversions\footnote{Notable absences include \lstinline{void*} to other pointer types, or \lstinline{0} to pointer types.}.
     44The only implicit conversions supported are among the arithmetic-like concrete types, which capture most, but not all, of \CFA{}'s built-in implicit conversions\footnote{Notable absences include \lstinline{void*} to other pointer types, or \lstinline{0} to pointer types.}.
    4545Future work should include a way to express implicit (and possibly explicit) conversions in the input language, with an investigation of the most efficient way to handle implicit conversions, and potentially a design for user-defined conversions.
    4646
     
    5353The primary architectural difference between the resolver prototype and \CFACC{} is that the prototype system uses a simple mark-and-sweep garbage collector for memory management, while \CFACC{} uses a manual memory-management approach.
    5454This architectural difference affects the mutation patterns used by both systems: \CFACC{} frequently makes deep clones of multi-node object graphs to ensure that there is a single ``owner'' for each object which can safely delete it later; the prototype system, by contrast, relies on its garbage collector to handle ownership, and can often copy pointers rather than cloning objects.
    55 The resolver prototype thus only needs to clone nodes which it modifies, and can share un-modified children between clones; the tree mutator abstraction in the prototype is designed to take advantage of this property.
     55The resolver prototype thus only needs to clone nodes that it modifies, and can share un-modified children between clones; the tree mutator abstraction in the prototype is designed to take advantage of this property.
    5656The key design decision enabling this is that all child nodes are held by !const! pointer, and thus cannot be mutated once they have been stored in a parent node.
    5757With minimal programming discipline, it can thus be ensured that any expression is either mutable or shared, but never both; the Dotty research compiler for Scala takes a similar architectural approach \cite{Dotty-github}.
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    r5d3a952 r69c37cc  
    481481
    482482The C and \CCV{} variants are generally the slowest and have the largest memory footprint, due to their less-efficient memory layout and the pointer indirection necessary to implement generic types in those languages; this inefficiency is exacerbated by the second level of generic types in the pair benchmarks.
    483 By contrast, the \CFA{} and \CC{} variants run in noticably less time for both the integer and pair because of the equivalent storage layout, with the inlined libraries (\ie{} no separate compilation) and greater maturity of the \CC{} compiler contributing to its lead.
     483By contrast, the \CFA{} and \CC{} variants run in noticeably less time for both the integer and pair because of the equivalent storage layout, with the inlined libraries (\ie{} no separate compilation) and greater maturity of the \CC{} compiler contributing to its lead.
    484484\CCV{} is slower than C largely due to the cost of runtime type checking of downcasts (implemented with !dynamic_cast!); the outlier for \CFA{}, pop !pair!, results from the complexity of the generated-C polymorphic code.
    485485The gcc compiler is unable to optimize some dead code and condense nested calls; a compiler designed for \CFA{} could more easily perform these optimizations.
  • doc/theses/aaron_moss_PhD/phd/introduction.tex

    r5d3a952 r69c37cc  
    3939\cbstart
    4040The prototype system, which implements the algorithmic contributions of this thesis, is the first performant type-checker implementation for a \CFA{}-style type system.
    41 As the existence of an efficient compiler is necessary for the practical viability of a programming language, the contributions of this thesis comprise a validation of the \CFA{} language design which was previously lacking.
     41As the existence of an efficient compiler is necessary for the practical viability of a programming language, the contributions of this thesis comprise a validation of the \CFA{} language design that was previously lacking.
    4242\cbend
    4343
     
    4545In particular, with the addition of \emph{concepts} in \CCtwenty{}~\cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
    4646\cbstart
    47 Much of the difficulty of type-checking \CFA{} stems from the language design choice to allow type inference from the context of a function call in addition to its arguments; this feature allows the programmers to specify fewer redundant type annotations on functions which are polymorphic in their return type.
     47Much of the difficulty of type-checking \CFA{} stems from the language design choice to allow type inference from the context of a function call in addition to its arguments; this feature allows the programmers to specify fewer redundant type annotations on functions that are polymorphic in their return type.
    4848\cbend \cbstartx
    4949The !auto! keyword in \CCeleven{} supports a similar but sharply restricted form of this contextual inference -- the demonstration of the richer inference in \CFA{} raises possibilities for similar features in future versions of \CC{}.
    5050By contrast, Java~8~\cite{Java8} and Scala~\cite{Scala} use comparably powerful forms of type inference, so the algorithmic techniques in this thesis may be effective for those languages' compiler implementors.
    5151\cbendx
    52 Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust~\cite{Rust}) which perform type inference; the type environment presented here may be useful to those language implementors.
     52Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust~\cite{Rust}) that perform type inference; the type environment presented here may be useful to those language implementors.
    5353
    5454\cbstarty
    55 One area of inquiry which is outside the scope of this thesis is formalization of the \CFA{} type system.
     55One area of inquiry that is outside the scope of this thesis is formalization of the \CFA{} type system.
    5656Ditchfield~\cite{Ditchfield92} defined the $F_\omega^\ni$ polymorphic lambda calculus which is the theoretical basis for the \CFA{} type system.
    5757Ditchfield did not, however, prove any soundness or completeness properties for $F_\omega^\ni$; such proofs remain future work.
    5858A number of formalisms other than $F_\omega^\ni$ could potentially be adapted to model \CFA{}.
    59 One promising candidate is \emph{local type inference} \cite{Pierce00,Odersky01}, which describes similar contextual propagation of type information; another is the polymorphic conformity-based model of the Emerald~\cite{Black90} programming language, which defines a subtyping relation on types which are conceptually similar to \CFA{} traits.
     59One promising candidate is \emph{local type inference} \cite{Pierce00,Odersky01}, which describes similar contextual propagation of type information; another is the polymorphic conformity-based model of the Emerald~\cite{Black90} programming language, which defines a subtyping relation on types that are conceptually similar to \CFA{} traits.
    6060These modelling approaches could potentially be used to extend an existing formal semantics for C such as Cholera \cite{Norrish98}, CompCert \cite{Leroy09}, or Formalin \cite{Krebbers14}.
    6161\cbendy
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    r5d3a952 r69c37cc  
    2020As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
    2121The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions.
    22 One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters which only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
     22One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
    2323Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost.
    2424Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}.
     
    122122For multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is $-1$ (from !T!) rather than $-2$ (from !S!).
    123123Specialization cost is not counted on the return type list; since $specialization$ is a property of the function declaration, a lower specialization cost prioritizes one declaration over another.
    124 User programmers can choose between functions with varying parameter lists by adjusting the arguments, but the same is not true in general of varying return types\footnote{In particular, as described in Section~\ref{expr-cost-sec}, cast expressions take the cheapest valid and convertable interpretation of the argument expression, and expressions are resolved as a cast to \lstinline{void}. As a result of this, including return types in the $specialization$ cost means that a function with return type \lstinline{T*} for some polymorphic type \lstinline{T} would \emph{always} be chosen over a function with the same parameter types returning \lstinline{void}, even for \lstinline{void} contexts, an unacceptably counter-intuitive result.}, so the return types are omitted from the $specialization$ element.
     124User programmers can choose between functions with varying parameter lists by adjusting the arguments, but the same is not true in general of varying return types\footnote{In particular, as described in Section~\ref{expr-cost-sec}, cast expressions take the cheapest valid and convertible interpretation of the argument expression, and expressions are resolved as a cast to \lstinline{void}. As a result of this, including return types in the $specialization$ cost means that a function with return type \lstinline{T*} for some polymorphic type \lstinline{T} would \emph{always} be chosen over a function with the same parameter types returning \lstinline{void}, even for \lstinline{void} contexts, an unacceptably counter-intuitive result.}, so the return types are omitted from the $specialization$ element.
    125125Since both $vars$ and $specialization$ are properties of the declaration rather than any particular interpretation, they are prioritized less than the interpretation-specific conversion costs from Bilson's original 3-tuple.
    126126
     
    198198In C semantics, this example is unambiguously upcasting !32! to !unsigned long long!, performing the shift, then downcasting the result to !unsigned!, at cost $(1,0,3,1,0,0,0)$.
    199199If ascription were allowed to be a first-class interpretation of a cast expression, it would be cheaper to select the !unsigned! interpretation of !?>>?! by downcasting !x! to !unsigned! and upcasting !32! to !unsigned!, at a total cost of $(1,0,1,1,0,0,0)$.
    200 However, this break from C semantics is not backwards compatibile, so to maintain C compatibility, the \CFA{} resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to !unsigned long long! in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.
     200However, this break from C semantics is not backwards compatible, so to maintain C compatibility, the \CFA{} resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to !unsigned long long! in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.
    201201For example, in !int x; double x; (int)x;!, both declarations have zero-cost interpretations as !x!, but the !int x! interpretation is cheaper to cast to !int!, and is thus selected.
    202202Thus, in contrast to the lazy resolution of nested function-call expressions discussed above, where final interpretations for each subexpression are not chosen until the top-level expression is reached, cast expressions introduce eager resolution of their argument subexpressions, as if that argument was itself a top-level expression.
     
    309309Some other language designs solve the matching problem by forcing a bottom-up order.
    310310\CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function.
    311 This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call which is polymorphic in its return type, and these annotations are often redundant.
     311This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant.
    312312\CFA{}, by contrast, saves programmers from redundant annotations with its richer inference:
    313313
     
    395395
    396396\cbstart
    397 One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgements between top-level expressions.
     397One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgments between top-level expressions.
    398398This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type.
    399399New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity.
    400 Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgement depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
     400Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgment depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
    401401Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching.
    402402\cbend
     
    424424The main challenge to implement this approach in \CFACC{} is applying the implicit conversions generated by the resolution process in the code-generation for the thunk functions that \CFACC{} uses to pass type assertions to their requesting functions with the proper signatures.
    425425
    426 One \CFA{} feature that could be added to improve the ergonomics of overload selection is an \emph{ascription cast}; as discussed in Section~\ref{expr-cost-sec}, the semantics of the C cast operator are to choose the cheapest argument interpretation which is convertable to the target type, using the conversion cost as a tie-breaker.
     426One \CFA{} feature that could be added to improve the ergonomics of overload selection is an \emph{ascription cast}; as discussed in Section~\ref{expr-cost-sec}, the semantics of the C cast operator are to choose the cheapest argument interpretation which is convertible to the target type, using the conversion cost as a tie-breaker.
    427427An ascription cast would reverse these priorities, choosing the argument interpretation with the cheapest conversion to the target type, only using interpretation cost to break ties\footnote{A possible stricter semantics would be to select the cheapest interpretation with a zero-cost conversion to the target type, reporting a compiler error otherwise.}.
    428428This would allow ascription casts to the desired return type to be used for overload selection:
Note: See TracChangeset for help on using the changeset viewer.