Changeset f845e80 for doc


Ignore:
Timestamp:
Apr 25, 2019, 2:23:48 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
98b4b12
Parents:
69c37cc
Message:

thesis: apply round 2 revisions and strip change bars

Location:
doc/theses/aaron_moss_PhD/phd
Files:
10 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/background.tex

    r69c37cc rf845e80  
    1313Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors \cite{Schluntz17}, improved support for tuples \cite{Schluntz17}, reference types \cite{Moss18}, first-class concurrent and parallel programming support \cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library \cite{Moss18}.
    1414
    15 \cbstart
    1615This thesis is primarily concerned with the \emph{expression resolution} portion of \CFA{} type-checking; resolution is discussed in more detail in Chapter~\ref{resolution-chap}, but is essentially determining which declarations the identifiers in each expression correspond to.
    17 Given that no simultaneously-visible C declarations share identifiers, expression resolution in C is not difficult, but the added features of \CFA{} make its resolution algorithms significantly more complex.
    18 Due to this complexity, the expression-resolution pass in \CFACC{} requires 95\% of compiler runtime on some source files, making an efficient procedure for expression resolution a requirement for a performant \CFA{} compiler.
    19 \cbend
     16In C, no simultaneously-visible declarations share identifiers, hence expression resolution in C is not difficult.
     17In \CFA{}, multiple added features make the resolution process significantly more complex.
     18Due to this complexity, the expression-resolution pass in \CFACC{} requires 95\% of compiler runtime on some source files, making a new, more efficient procedure for expression resolution a requirement for a performant \CFA{} compiler.
    2019
    2120The features presented in this chapter are chosen to elucidate the design constraints of the work presented in this thesis.
     
    208207Given a declaration !list_iterator it!, !*it! can be either an !int! or a !list!, with the meaning disambiguated by context (\eg{} !int x = *it;! interprets !*it! as !int!, while !(*it).value = 42;! interprets !*it! as !list!).
    209208While a nominal-inheritance system with associated types could model one of those two relationships by making !El! an associated type of !Ptr! in the !pointer_like! implementation,
    210 \cbstart
    211 the author is unaware of any nominal-inheritance system that could model both relationships simultaneously.
     209I am unaware of any nominal-inheritance system that can model both relationships simultaneously.
    212210Further comparison of \CFA{} polymorphism with other languages can be found in Section~\ref{generic-related-sec}.
    213 \cbend
    214211
    215212The flexibility of \CFA{}'s implicit trait-satisfaction mechanism provides programmers with a great deal of power, but also blocks some optimization approaches for expression resolution.
    216213The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgments difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
    217214
    218 \cbstart
    219215\subsection{Deleted Declarations}
    220216
    221 Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
     217Particular type combinations can be exempted from matching a given polymorphic function through the use of a \emph{deleted function declaration}:
    222218
    223219\begin{cfa}
     
    228224Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
    229225If this deleted declaration is selected as the unique minimal-cost interpretation of an expression then an error is produced, allowing \CFA{} programmers to guide the expression resolver away from undesirable solutions.
    230 \cbend
    231226
    232227\section{Implicit Conversions} \label{implicit-conv-sec}
     
    254249Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer.
    255250These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy aliasing of nested values with a similarly convenient syntax.
    256 \cbstart
    257251The addition of reference types also eliminated two syntactic special-cases present in previous versions of \CFA{}.
    258 Considering a call !a += b! to a compound assignment operator, the previous declaration for that operator was !lvalue int ?+=?(int*, int)! -- to mutate the left argument, the built-in operators were special-cased to implicitly take the address of that argument, while the special !lvalue! syntax was used to mark the return type of a function as a mutable reference.
    259 With references, this declaration can be re-written as !int& ?+=?(int&, int)! -- the reference semantics generalize the implicit address-of on the left argument and allow it to be used in user-declared functions, while also subsuming the (now removed) !lvalue! syntax for function return types.
    260 \cbend
     252Consider the a call !a += b! to a compound assignment operator.
     253The previous declaration for that operator is !lvalue int ?+=?(int*, int)!.
     254To mutate the left argument, the built-in operators were special-cased to implicitly take the address of that argument, while the special !lvalue! syntax was used to mark the return type of a function as a mutable reference.
     255With references, this declaration is re-written as !int& ?+=?(int&, int)!.
     256The reference semantics generalize the implicit address-of on the left argument and allow it to be used in user-declared functions, while also subsuming the (now removed) !lvalue! syntax for function return types.
    261257
    262258The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named in the standard.
     
    281277\CFA{} supports all of these use cases without further added syntax.
    282278The key to this syntax-free feature support is an observation made by the author that the address of a reference is a lvalue.
    283 In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference\footnote{\cbstart The syntactic difference between reference initialization and reference assignment is unfortunate, but preserves the ability to pass function arguments by reference (a reference initialization context) without added syntax. \cbend }, and pointers to references by repeating the address-of operator:
     279In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference\footnote{The syntactic difference between reference initialization and reference assignment is unfortunate, but preserves the ability to pass function arguments by reference (a reference initialization context) without added syntax.}, and pointers to references by repeating the address-of operator:
    284280
    285281\begin{cfa}
  • doc/theses/aaron_moss_PhD/phd/conclusion.tex

    r69c37cc rf845e80  
    1313An alternate approach would be to fork an existing C compiler such as Clang~\cite{Clang}, which would need to be modified to use one of the resolution algorithms discussed here, as well as various other features introduced by Bilson~\cite{Bilson03}.
    1414
    15 \cbstart
    1615More generally, the algorithmic techniques described in this thesis may be useful to implementors of other programming languages.
    1716In particular, the demonstration of practical performance for polymorphic return-type inference suggests the possibility of eliding return-type-only template parameters in \CC{} function calls, though integrating such an extension into \CC{} expression resolution in a backwards-compatible manner may be challenging.
    18 The \CFA{} expression resolution problem also bears some similarity to the \emph{local type inference} model put forward by Pierce \& Turner \cite{Pierce00} and Odersky \etal{} \cite{Odersky01}; compiler implementors for languages such as Scala \cite{Scala} that perform type inference based on this model may be able to profitably adapt the algorithms and data structures presented in this thesis.
    19 \cbend
     17The \CFA{} expression resolution problem also bears some similarity to the \emph{local type inference} model put forward by Pierce \& Turner \cite{Pierce00} and Odersky \etal{} \cite{Odersky01}; compiler implementors for languages like Scala \cite{Scala}, which performs type inference based on this model, may be able to profitably adapt the algorithms and data structures presented in this thesis.
  • doc/theses/aaron_moss_PhD/phd/experiments.tex

    r69c37cc rf845e80  
    77
    88\CFACC{} can generate realistic test inputs for the resolver prototype from equivalent \CFA{} code;
    9 the generated test inputs currently comprise all \CFA{} code currently in existence\footnote{ \cbstart Though \CFA{} is backwards-compatible with C, the lack of \lstinline{forall} functions and name overloading in C mean that the larger corpus of C code does not provide challenging test instances for \CFACC{} \cbend }, 9,000 lines drawn primarily from the standard library and compiler test suite.
    10 \cbstart
     9the generated test inputs currently comprise all \CFA{} code currently in existence\footnote{Though \CFA{} is backwards-compatible with C, the lack of \lstinline{forall} functions and name overloading in C mean that the larger corpus of C code does not provide challenging test instances for \CFACC{}.}, 9,000 lines drawn primarily from the standard library and compiler test suite.
    1110This code includes a substantial degree of name overloading for common library functions and a number of fundamental polymorphic abstractions, including iterators and streaming input/output.
    12 \cbend
    1311\CFACC{} is also instrumented to produce a number of code metrics.
    1412These metrics were used to construct synthetic test inputs during development of the resolver prototype; these synthetic inputs provided useful design guidance, but the performance results presented in this chapter are based on the more realistic directly-generated inputs.
     
    212210A top-down algorithm was not attempted in \CFACC{} due to its poor performance in the prototype.
    213211The second round of modifications addressed assertion satisfaction, taking Bilson's original \textsc{cfa-imm} algorithm and modifying it to use the deferred approach \textsc{cfa-def}.
    214 \cbstart
    215 Due to time constraints a deferred-cached assertion satisfaction algorithm for \CFACC{} could not be completed, but both preliminary results from this effort and the averaged prototype results from Section~\ref{proto-exp-sec} indicate that assertion satisfaction caching is not likely to be a fruitful optimization for \CFACC{}.
    216 \cbend
     212Due to time constraints, a deferred-cached assertion satisfaction algorithm for \CFACC{} could not be completed, but both preliminary results from this effort and the averaged prototype results from Section~\ref{proto-exp-sec} indicate that assertion satisfaction caching is not likely to be a fruitful optimization for \CFACC{}.
    217213The new environment data structures discussed in Section~\ref{proto-exp-sec} have not been successfully merged into \CFACC{} due to their dependencies on the garbage-collection framework in the prototype; I spent several months modifying \CFACC{} to use similar garbage collection, but due to \CFACC{} not being designed to use such memory management the performance of the modified compiler was non-viable.
    218214It is possible that the persistent union-find environment could be modified to use a reference-counted pointer internally without changing the entire memory-management framework of \CFACC{}, but such an attempt is left to future work.
     
    221217The results from \CFACC{} for \textsc{cfa-co} \vs{} \textsc{cfa-bu} do not mirror those from the prototype; I conjecture this is mostly due to the different memory-management schemes and sorts of data required to run type unification and assertion satisfaction calculations, as \CFACC{} performance has proven to be particularly sensitive to the amount of heap allocation performed.
    222218This data also shows a noticeable regression in compiler performance in the eleven months between \textsc{cfa-bu} and \textsc{cfa-imm}, which use the same resolution algorithms; this approximate doubling in runtime is not due to expression resolution, as no integration work happened in this time, but I am unable to ascertain its actual cause.
    223 \cbstart
    224219To isolate the effects of the algorithmic changes from this unrelated performance regression, the speedup results in Figure~\ref{cfa-speedup-fig} are shown with respect to the start of each modification round, \textsc{cfa-bu} \vs{} \textsc{cfa-co} and \textsc{cfa-def} \vs{} \textsc{cfa-imm}.
    225 \cbend
    226220It should also be noted with regard to the peak memory results in Figure~\ref{cfa-mem-fig} that the peak memory usage does not always occur during the resolution phase of the compiler.
    227221
     
    232226\end{figure}
    233227
    234 \cbstart
    235228\begin{figure}
    236229\centering
     
    238231\caption[\CFACC{} speedup.]{\CFACC{} speedup against against \textsc{cfa-co} baseline runtime. To isolate the effect of algorithmic changes, \textsc{cfa-bu} speedup is \vs{} \textsc{cfa-co} while \textsc{cfa-def} speedup is \vs{} \textsc{cfa-imm}. The `inter-round' series shows slowdown between \textsc{cfa-bu} and \textsc{cfa-imm}.} \label{cfa-speedup-fig}
    239232\end{figure}
    240 \cbend
    241233
    242234\begin{figure}
  • doc/theses/aaron_moss_PhD/phd/frontpgs.tex

    r69c37cc rf845e80  
    141141        The compilation performance improvements have all been experimentally validated with a new prototype system that encapsulates the key aspects of the \CFA{} language; this prototype is a promising basis for future research and a technical contribution of this work.
    142142
    143         \cbstart
    144143        \CFA{}, extended and refined in this thesis, presents both an independently interesting combination of language features and a comprehensive approach to the modernization of C.
    145144        This work demonstrates the hitherto unproven compiler-implementation viability of the \CFA{} language design, and provides a number of useful tools to implementors of other languages.
    146         \cbend
    147145
    148146\cleardoublepage
  • doc/theses/aaron_moss_PhD/phd/generic-bench.tex

    r69c37cc rf845e80  
    11\chapter{Generic Stack Benchmarks} \label{generic-bench-app}
    22
    3 \cbstart
    43This appendix includes the generic stack code for all four language variants discussed in Section~\ref{generic-performance-sec}. Throughout, !/***/! designates a counted redundant type annotation; these include !sizeof! on a known type, repetition of a type name in initialization or return statements, and type-specific helper functions.
    5 \cbend
    64The code is reformatted slightly for brevity.
    75
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    r69c37cc rf845e80  
    333333\end{cfa}
    334334
    335 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code\footnote{ \cbstart A GCC extension allows arithmetic on \lstinline{void*}, calculated as if \lstinline{sizeof(void) == 1}. \cbend }), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
     335Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code\footnote{A GCC extension allows arithmetic on \lstinline{void*}, calculated as if \lstinline{sizeof(void) == 1}.}), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
    336336!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as:
    337337
  • doc/theses/aaron_moss_PhD/phd/introduction.tex

    r69c37cc rf845e80  
    3737\end{itemize}
    3838
    39 \cbstart
    4039The prototype system, which implements the algorithmic contributions of this thesis, is the first performant type-checker implementation for a \CFA{}-style type system.
    4140As the existence of an efficient compiler is necessary for the practical viability of a programming language, the contributions of this thesis comprise a validation of the \CFA{} language design that was previously lacking.
    42 \cbend
    4341
    4442Though the direction and experimental validation of this work is fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
    4543In particular, with the addition of \emph{concepts} in \CCtwenty{}~\cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
    46 \cbstart
    47 Much of the difficulty of type-checking \CFA{} stems from the language design choice to allow type inference from the context of a function call in addition to its arguments; this feature allows the programmers to specify fewer redundant type annotations on functions that are polymorphic in their return type.
    48 \cbend \cbstartx
    49 The !auto! keyword in \CCeleven{} supports a similar but sharply restricted form of this contextual inference -- the demonstration of the richer inference in \CFA{} raises possibilities for similar features in future versions of \CC{}.
    50 By contrast, Java~8~\cite{Java8} and Scala~\cite{Scala} use comparably powerful forms of type inference, so the algorithmic techniques in this thesis may be effective for those languages' compiler implementors.
    51 \cbendx
     44Much of the difficulty of type-checking \CFA{} stems from the language design choice to allow overload selection from the context of a function call based on function return type in addition to the type of the arguments to the call; this feature allows the programmers to specify fewer redundant type annotations on functions that are polymorphic in their return type.
     45As an example in \CC{}:
     46\begin{C++}
     47template<typename T> T* zero() { return new T( 0 ); }
     48
     49int* z = zero<int>();  $\C{// must specify int twice}$
     50\end{C++}
     51
     52\CFA{} allows !int* z = zero()!, which elides the second !int!.
     53While the !auto! keyword in \CCeleven{} supports similar inference in a limited set of contexts (\eg{} !auto z = zero<int>()!), the demonstration of the richer inference in \CFA{} raises possibilities for similar features in future versions of \CC{}.
     54By contrast to \CC{}, Java~8~\cite{Java8} and Scala~\cite{Scala} use comparably powerful forms of type inference to \CFA{}, so the algorithmic techniques in this thesis may be effective for those languages' compiler implementors.
    5255Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust~\cite{Rust}) that perform type inference; the type environment presented here may be useful to those language implementors.
    5356
    54 \cbstarty
    5557One area of inquiry that is outside the scope of this thesis is formalization of the \CFA{} type system.
    56 Ditchfield~\cite{Ditchfield92} defined the $F_\omega^\ni$ polymorphic lambda calculus which is the theoretical basis for the \CFA{} type system.
     58Ditchfield~\cite{Ditchfield92} defined the $F_\omega^\ni$ polymorphic lambda calculus, which is the theoretical basis for the \CFA{} type system.
    5759Ditchfield did not, however, prove any soundness or completeness properties for $F_\omega^\ni$; such proofs remain future work.
    5860A number of formalisms other than $F_\omega^\ni$ could potentially be adapted to model \CFA{}.
    59 One promising candidate is \emph{local type inference} \cite{Pierce00,Odersky01}, which describes similar contextual propagation of type information; another is the polymorphic conformity-based model of the Emerald~\cite{Black90} programming language, which defines a subtyping relation on types that are conceptually similar to \CFA{} traits.
     61One promising candidate is \emph{local type inference} \cite{Pierce00,Odersky01}, which describes similar contextual propagation of type information; another is the polymorphic conformity-based model of the Emerald~\cite{Black90} programming language, which defines a subtyping relation on types that is conceptually similar to \CFA{} traits.
    6062These modelling approaches could potentially be used to extend an existing formal semantics for C such as Cholera \cite{Norrish98}, CompCert \cite{Leroy09}, or Formalin \cite{Krebbers14}.
    61 \cbendy
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    r69c37cc rf845e80  
    66A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
    77To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
    8 \cbstart
    98This conversion cost is summed over all subexpression interpretations in the interpretation of a top-level expression.
    10 \cbend
    119Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such unique interpretation exists.
    1210
    1311\section{Expression Resolution}
    1412
    15 \cbstart
    16 The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
    17 Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error.
     13The expression resolution pass in \CFACC{} must traverse an input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
     14Once the set of valid interpretations for the top-level expression is found, the expression resolver selects the unique minimal-cost candidate or reports an error.
    1815
    1916The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}.
    20 As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
     17As mentioned above, the lack of name overloading in C (except for built-in operators) makes its resolution problem substantially easier.
     18A comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
    2119The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions.
    22 One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
     20One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context(and in fact does not allow explicit specification of type parameters to a function), as in the following example:
     21
     22\begin{cfa}
     23forall(dtype T) T& deref(T*); $\C{// dereferences pointer}$
     24forall(otype T) T* def(); $\C{// new heap-allocated default-initialized value}$
     25
     26int& i = deref( def() );
     27\end{cfa}
     28
     29In this example, the \CFA{} compiler infers the type arguments of !deref! and !def! from the !int&! type of !i!; \CC{}, by contrast, requires a type parameter on !def!\footnote{The type parameter of \lstinline{deref} can be inferred from its argument.}, \ie{} !deref( def<int>() )!.
    2330Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost.
    2431Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}.
    2532Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further  complication for a \CFA{} compiler.
    2633The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem.
    27 \cbend
    2834
    2935\subsection{Type Unification}
     
    3743\subsection{Conversion Cost} \label{conv-cost-sec}
    3844
    39 \cbstart
    4045\CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments.
    4146Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates.
    42 \cbend
    4347C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly.
    4448The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
     
    148152In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation.
    149153Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type.
    150 Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{ \cbstart Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S} \cbend ) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}.
     154Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S}) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}.
    151155
    152156Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution.
     
    306310This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions.
    307311
    308 \cbstart
    309312Some other language designs solve the matching problem by forcing a bottom-up order.
    310313\CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function.
    311 This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant.
    312 \CFA{}, by contrast, saves programmers from redundant annotations with its richer inference:
     314This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant:
     315
     316\begin{C++}
     317template<typename T> T* malloc() { /* ... */ }
     318
     319int* p = malloc<int>(); $\C{// T = int must be explicitly supplied}$
     320\end{C++}
     321
     322\CFA{} saves programmers from redundant annotations with its richer inference:
    313323
    314324\begin{cfa}
     
    317327int* p = malloc(); $\C{// Infers T = int from left-hand side}$
    318328\end{cfa}
    319 \cbend
    320329
    321330Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
     
    394403This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
    395404
    396 \cbstart
    397 One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgments between top-level expressions.
     405One superficially-promising optimization, which I did not pursue, is caching assertion-satisfaction judgments among top-level expressions.
    398406This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type.
    399407New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity.
    400408Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgment depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
    401409Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching.
    402 \cbend
    403410
    404411The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers.
     
    406413The combination of the assertion satisfaction elements of the problem with the conversion-cost model of \CFA{} makes this logic-solver approach difficult to apply in \CFACC{}, however.
    407414Expressing assertion resolution as a satisfiability problem ignores the cost optimization aspect, which is necessary to decide among what are often many possible satisfying assignments of declarations to assertions.
    408 \cbstartx
    409415(MaxSAT solvers \cite{Morgado13}, which allow weights on solutions to satisfiability problems, may be a productive avenue for future investigation.)
    410 \cbendx
    411416On the other hand, the deeply-recursive nature of the satisfiability problem makes it difficult to adapt to optimizing solver approaches such as linear programming.
    412417To maintain a well-defined programming language, any optimization algorithm used must provide an exact (rather than approximate) solution; this constraint also rules out a whole class of approximately-optimal generalized solvers.
  • doc/theses/aaron_moss_PhD/phd/thesis.tex

    r69c37cc rf845e80  
    1616\usepackage{ifthen}
    1717\newboolean{PrintVersion}
    18 \setboolean{PrintVersion}{false}
     18\setboolean{PrintVersion}{true}
    1919% CHANGE THIS VALUE TO "true" as necessary, to improve printed results for hard copies
    2020% by overriding some options of the hyperref package below.
     
    3131\usepackage{caption} % for subfigure
    3232\usepackage{subcaption}
    33 
    34 \usepackage[color]{changebar}
    35 \cbcolor{blue}
    36 \newcommand{\cbstartx}{\cbcolor{red} \cbstart}
    37 \newcommand{\cbendx}{\cbend \cbcolor{blue}}
    38 \newcommand{\cbstarty}{\cbcolor{green} \cbstart}
    39 \newcommand{\cbendy}{\cbend \cbcolor{blue}}
    4033
    4134% Hyperlinks make it very easy to navigate an electronic document.
  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    r69c37cc rf845e80  
    1515Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} that the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
    1616
    17 \cbstart
    1817The following example demonstrates the use of a type environment for unification:
    1918
     
    2625
    2726Expression resolution starts from an empty type environment; from this empty environment, the calls to !g! can be independently resolved.
    28 These resolutions result in two new type environments, $T = \{ \myset{\mathsf{G}_1} \rightarrow$ !int!$\}$ and $T' = \{ \myset{\mathsf{G}_2} \rightarrow$ !int!$\}$; the calls to !g! have generated distinct type variables !G!$_1$ and !G!$_2$, each bound to !int! by unification with their argument type.
     27These resolutions result in two new type environments, $T = \{ \myset{\mathsf{G}_1} \rightarrow$ !int!$\}$ and $T' = \{ \myset{\mathsf{G}_2} \rightarrow$ !int!$\}$; the calls to !g! have generated distinct type variables !G!$_1$ and !G!$_2$, each bound to !int! by unification with the type of its argument (!10! and !20!, both !int!).
    2928To complete resolution of the call to !f!, both environments must be combined; resolving the first argument to !f! produces a new type environment $T'' = \{ \myset{\mathsf{G}_1, \mathsf{F}_1} \rightarrow$ !int!$\}$: the new type variable !F!$_1$ has been introduced and unified with !G!$_1$ (the return type of !g(10)!), and consequently bound to !int!.
    3029To resolve the second argument to !f!, $T''$ must be checked for compatibility with $T'$; since !F!$_1$ unifies with !G!$_2$, their type classes must be merged.
    3130Since both !F!$_1$ and !G!$_2$ are bound to !int!, this merge succeeds, producing the final environment $T'' = \{ \myset{\mathsf{G}_1, \mathsf{F}_1, \mathsf{G}_2} \rightarrow$ !int!$\}$.
    32 \cbend
    3331
    3432\begin{table}
     
    300298However, allowing multiple threads concurrent access to the persistent data structure is likely to result in ``reroot thrashing'', as different threads reroot the data structure to their own versions of interest.
    301299This contention could be mitigated by partitioning the data structure into separate subtrees for each thread, with each subtree having its own root node, and the boundaries among them implemented with a lock-equipped !ThreadBoundary! edit node.
    302 \cbstartx
    303300Alternatively, the concurrent hash trie of Prokopec \etal{} \cite{Prokopec11,Prokopec12} may be a useful hash-table replacement.
    304 \cbendx
Note: See TracChangeset for help on using the changeset viewer.