Ignore:
Timestamp:
Apr 23, 2019, 3:41:51 PM (6 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
3b40801b
Parents:
cf01d0b
Message:

thesis: integrate defence answers as clarifications into thesis

Location:
doc/theses/aaron_moss_PhD/phd
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/experiments.tex

    rcf01d0b rc4b5486  
    88\CFACC{} can generate realistic test inputs for the resolver prototype from equivalent \CFA{} code;
    99the generated test inputs currently comprise all \CFA{} code currently in existence\footnote{Though \CFA{} is backwards-compatible with C, the lack of \lstinline{forall} functions and name overloading in C mean that the larger corpus of C code does not provide challenging test instances for \CFACC{}}, 9,000 lines drawn primarily from the standard library and compiler test suite.
     10This code includes a substantial degree of name overloading for common library functions and a number of fundamental polymorphic abstractions, including iterators and streaming input/output.
    1011\CFACC{} is also instrumented to produce a number of code metrics.
    1112These metrics were used to construct synthetic test inputs during development of the resolver prototype; these synthetic inputs provided useful design guidance, but the performance results presented in this chapter are based on the more realistic directly-generated inputs.
    12 
    13 % There are three sources of problem instances for the resolver prototype.
    14 % The first is small, hand-written tests designed to test the expressive power and correctness of the prototype.
    15 % These tests are valuable for regression testing, but not time-consuming enough to be useful performance tests.
    16 % The second sort of problem instances are procedurally generated according to a set of parameters (distributions of polymorphic versus monomorphic functions, number of function arguments, number of types, \etc{}); the procedural problem generator can be used to explore the behaviour of an algorithm with respect to certain sorts of problem instances by varying the input parameters.
    17 % I have implemented a flagged \CFACC{} pass which outputs information which can be used to initialize the procedural generator's parameters to realistic values.
    18 % The final sort of problem instances are derived from actual \CFA{} code.
    19 % The prototype has a rich enough representation of \CFA{} that actual instances of expression resolution can be expressed with good fidelity, and I have implemented a compiler pass for \CFACC{} which can generate instances from \CFA{} code.
    20 % Since at this juncture all development in \CFA{} is done by our research team, I have tested the prototype system on all \CFA{} code currently extant, primarily the standard library and compiler test suite.
    2113
    2214\section{Resolver Prototype Features} \label{rp-features-sec}
  • doc/theses/aaron_moss_PhD/phd/frontpgs.tex

    rcf01d0b rc4b5486  
    142142
    143143        \CFA{}, extended and refined in this thesis, presents both an independently interesting combination of language features and a comprehensive approach to the modernization of C.
    144         This work demonstrates the hitherto unproven practical viability of the \CFA{} language design, and provides a number of useful tools to implementors of other languages.
     144        This work demonstrates the hitherto unproven compiler-implementation viability of the \CFA{} language design, and provides a number of useful tools to implementors of other languages.
    145145
    146146\cleardoublepage
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    rcf01d0b rc4b5486  
    143143\end{figure}
    144144
    145 \CC{}, Java, and other languages use \emph{generic types} to produce type-safe abstract data types.
     145\CC{}, Java, and other languages use \emph{generic types} (or \emph{parameterized types}) to produce type-safe abstract data types.
    146146Design and implementation of generic types for \CFA{} is the first major contribution of this thesis, a summary of which is published in \cite{Moss18} and on which this chapter is closely based.
    147147\CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.
  • doc/theses/aaron_moss_PhD/phd/introduction.tex

    rcf01d0b rc4b5486  
    11\chapter{Introduction}
    22
    3 The C programming language has had a wide-ranging impact on the design of software and programming languages.
    4 In the 30 years since its first standardization, it has consistently been one of the most popular programming languages, with billions of lines of C code still in active use, and tens of thousands of trained programmers producing it. The TIOBE index\cite{TIOBE} tracks popularity of programming languages over time, and C has never dropped below second place:
     3The C programming language~\cite{C11} has had a wide-ranging impact on the design of software and programming languages.
     4In the 30 years since its first standardization, it has consistently been one of the most popular programming languages, with billions of lines of C code still in active use, and tens of thousands of trained programmers producing it. The TIOBE index~\cite{TIOBE} tracks popularity of programming languages over time, and C has never dropped below second place:
    55
    66\begin{table}[h]
     
    1919
    2020The impact of C on programming language design is also obvious from Table~\ref{tiobe-table}; with the exception of Python, all of the top five languages use C-like syntax and control structures.
    21 \CC{} is even a largely backwards-compatible extension of C.
     21\CC{}~\cite{C++} is even a largely backwards-compatible extension of C.
    2222Though its lasting popularity and wide impact on programming language design point to the continued relevance of C, there is also widespread desire of programmers for languages with more expressive power and programmer-friendly features; accommodating both maintenance of legacy C code and development of the software of the future is a difficult task for a single programming language.
    2323
     
    3737\end{itemize}
    3838
    39 Though the direction and validation of this work is fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
    40 In particular, with the addition of \emph{concepts} in \CCtwenty{} \cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
    41 Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust) which perform type inference; the type environment presented here may be useful to those language implementers.
     39The prototype system, which implements the algorithmic contributions of this thesis, is the first performant type-checker implementation for a \CFA{}-style type system.
     40As the existence of an efficient compiler is necessary for the practical viability of a programming language, the contributions of this thesis comprise a validation of the \CFA{} language design which was previously lacking.
     41
     42Though the direction and experimental validation of this work is fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
     43In particular, with the addition of \emph{concepts} in \CCtwenty{}~\cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
     44Much of the difficulty of type-checking \CFA{} stems from the language design choice to allow type inference from the context of a function call in addition to its arguments; this feature allows the programmers to specify fewer redundant type annotations on functions which are polymorphic in their return type.
     45Java's diamond type-inference operator~\cite{Java-diamond} and the !auto! keyword in \CCeleven{} support similar but sharply restricted forms of this contextual inference -- the demonstration of the richer inference in \CFA{} raises possibilities for similar features in future versions of these languages.
     46Scala~\cite{Scala}, by contrast, already supports a similarly-powerful \emph{local type inference} model~\cite{Pierce00,Odersky01}, so the algorithmic techniques in this thesis may be effective for Scala compiler implementors.
     47Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust~\cite{Rust}) which perform type inference; the type environment presented here may be useful to those language implementors.
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    rcf01d0b rc4b5486  
    66A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
    77To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
     8This conversion cost is summed over all subexpression interpretations in the interpretation of a top-level expression.
    89Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such unique interpretation exists.
    910
     
    299300This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions.
    300301
     302Some other language designs solve the matching problem by forcing a bottom-up order.
     303\CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function.
     304This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call which is polymorphic in its return type, and these annotations are often redundant.
     305\CFA{}, by contrast, saves programmers from redundant annotations with its richer inference:
     306
     307\begin{cfa}
     308forall(dtype T | sized(T)) T* malloc();
     309
     310int* p = malloc(); $\C{// Infers T = int from left-hand side}$
     311\end{cfa}
     312
    301313Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
    302314This thesis closes that gap in the literature by performing performance comparisons of both top-down and bottom-up expression resolution algorithms, with results reported in Chapter~\ref{expr-chap}.
     
    372384During the course of checking the assertions of a single top-level expression, the results are cached for each assertion checked.
    373385The search key for this cache is the assertion declaration with its type variables substituted according to the type environment to distinguish satisfaction of the same assertion for different types.
    374 This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
     386This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
     387
     388One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgements between top-level expressions.
     389This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type.
     390New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity.
     391Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgement depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
     392Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching.
    375393
    376394The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers.
Note: See TracChangeset for help on using the changeset viewer.