Context Navigation

← Previous Change
Next Change →

Changeset c4b5486 for doc/theses/aaron_moss_PhD

Timestamp:

Apr 23, 2019, 3:41:51 PM (7 years ago)

Author:

Aaron Moss <a3moss@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum

Children:

Parents:

Message:

thesis: integrate defence answers as clarifications into thesis

Location:

doc/theses/aaron_moss_PhD/phd

Files:

: 5 edited

experiments.tex (modified) (1 diff)
frontpgs.tex (modified) (1 diff)
generic-types.tex (modified) (1 diff)
introduction.tex (modified) (3 diffs)
resolution-heuristics.tex (modified) (3 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/aaron_moss_PhD/phd/experiments.tex

-              rcf01d0b
+              rc4b5486
 \CFACC{} can generate realistic test inputs for the resolver prototype from equivalent \CFA{} code;
 the generated test inputs currently comprise all \CFA{} code currently in existence\footnote{Though \CFA{} is backwards-compatible with C, the lack of \lstinline{forall} functions and name overloading in C mean that the larger corpus of C code does not provide challenging test instances for \CFACC{}}, 9,000 lines drawn primarily from the standard library and compiler test suite.
+This code includes a substantial degree of name overloading for common library functions and a number of fundamental polymorphic abstractions, including iterators and streaming input/output.
 \CFACC{} is also instrumented to produce a number of code metrics.
 These metrics were used to construct synthetic test inputs during development of the resolver prototype; these synthetic inputs provided useful design guidance, but the performance results presented in this chapter are based on the more realistic directly-generated inputs.
-% There are three sources of problem instances for the resolver prototype.
-% The first is small, hand-written tests designed to test the expressive power and correctness of the prototype.
-% These tests are valuable for regression testing, but not time-consuming enough to be useful performance tests.
-% The second sort of problem instances are procedurally generated according to a set of parameters (distributions of polymorphic versus monomorphic functions, number of function arguments, number of types, \etc{}); the procedural problem generator can be used to explore the behaviour of an algorithm with respect to certain sorts of problem instances by varying the input parameters.
-% I have implemented a flagged \CFACC{} pass which outputs information which can be used to initialize the procedural generator's parameters to realistic values.
-% The final sort of problem instances are derived from actual \CFA{} code.
-% The prototype has a rich enough representation of \CFA{} that actual instances of expression resolution can be expressed with good fidelity, and I have implemented a compiler pass for \CFACC{} which can generate instances from \CFA{} code.
-% Since at this juncture all development in \CFA{} is done by our research team, I have tested the prototype system on all \CFA{} code currently extant, primarily the standard library and compiler test suite.
 \section{Resolver Prototype Features} \label{rp-features-sec}

doc/theses/aaron_moss_PhD/phd/frontpgs.tex

rcf01d0b	rc4b5486
142	142
143	143	\CFA{}, extended and refined in this thesis, presents both an independently interesting combination of language features and a comprehensive approach to the modernization of C.
144		This work demonstrates the hitherto unproven ~~practical~~ viability of the \CFA{} language design, and provides a number of useful tools to implementors of other languages.
	144	This work demonstrates the hitherto unproven compiler-implementation viability of the \CFA{} language design, and provides a number of useful tools to implementors of other languages.
145	145
146	146	\cleardoublepage

doc/theses/aaron_moss_PhD/phd/generic-types.tex

rcf01d0b	rc4b5486
143	143	\end{figure}
144	144
145		\CC{}, Java, and other languages use \emph{generic types} to produce type-safe abstract data types.
	145	\CC{}, Java, and other languages use \emph{generic types} (or \emph{parameterized types}) to produce type-safe abstract data types.
146	146	Design and implementation of generic types for \CFA{} is the first major contribution of this thesis, a summary of which is published in \cite{Moss18} and on which this chapter is closely based.
147	147	\CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.

doc/theses/aaron_moss_PhD/phd/introduction.tex

-              rcf01d0b
+              rc4b5486
 \chapter{Introduction}
 The C programming language has had a wide-ranging impact on the design of software and programming languages.
 In the 30 years since its first standardization, it has consistently been one of the most popular programming languages, with billions of lines of C code still in active use, and tens of thousands of trained programmers producing it. The TIOBE index\cite{TIOBE} tracks popularity of programming languages over time, and C has never dropped below second place:
+The C programming language~\cite{C11} has had a wide-ranging impact on the design of software and programming languages.
+In the 30 years since its first standardization, it has consistently been one of the most popular programming languages, with billions of lines of C code still in active use, and tens of thousands of trained programmers producing it. The TIOBE index~\cite{TIOBE} tracks popularity of programming languages over time, and C has never dropped below second place:
 \begin{table}[h]
 …
 The impact of C on programming language design is also obvious from Table~\ref{tiobe-table}; with the exception of Python, all of the top five languages use C-like syntax and control structures.
 \CC{} is even a largely backwards-compatible extension of C.
+\CC{}~\cite{C++} is even a largely backwards-compatible extension of C.
 Though its lasting popularity and wide impact on programming language design point to the continued relevance of C, there is also widespread desire of programmers for languages with more expressive power and programmer-friendly features; accommodating both maintenance of legacy C code and development of the software of the future is a difficult task for a single programming language.
 …
 \end{itemize}
+Though the direction and validation of this work is fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
+In particular, with the addition of \emph{concepts} in \CCtwenty{} \cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
+Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust) which perform type inference; the type environment presented here may be useful to those language implementers.
+The prototype system, which implements the algorithmic contributions of this thesis, is the first performant type-checker implementation for a \CFA{}-style type system.
+As the existence of an efficient compiler is necessary for the practical viability of a programming language, the contributions of this thesis comprise a validation of the \CFA{} language design which was previously lacking.
+Though the direction and experimental validation of this work is fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
+In particular, with the addition of \emph{concepts} in \CCtwenty{}~\cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
+Much of the difficulty of type-checking \CFA{} stems from the language design choice to allow type inference from the context of a function call in addition to its arguments; this feature allows the programmers to specify fewer redundant type annotations on functions which are polymorphic in their return type.
+Java's diamond type-inference operator~\cite{Java-diamond} and the !auto! keyword in \CCeleven{} support similar but sharply restricted forms of this contextual inference -- the demonstration of the richer inference in \CFA{} raises possibilities for similar features in future versions of these languages.
+Scala~\cite{Scala}, by contrast, already supports a similarly-powerful \emph{local type inference} model~\cite{Pierce00,Odersky01}, so the algorithmic techniques in this thesis may be effective for Scala compiler implementors.
+Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust~\cite{Rust}) which perform type inference; the type environment presented here may be useful to those language implementors.

doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

-              rcf01d0b
+              rc4b5486
 A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
 To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
+This conversion cost is summed over all subexpression interpretations in the interpretation of a top-level expression.
 Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such unique interpretation exists.
 …
 This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions.
+Some other language designs solve the matching problem by forcing a bottom-up order.
+\CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function.
+This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call which is polymorphic in its return type, and these annotations are often redundant.
+\CFA{}, by contrast, saves programmers from redundant annotations with its richer inference:
+\begin{cfa}
+forall(dtype T | sized(T)) T* malloc();
+int* p = malloc(); $\C{// Infers T = int from left-hand side}$
+\end{cfa}
 Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
 This thesis closes that gap in the literature by performing performance comparisons of both top-down and bottom-up expression resolution algorithms, with results reported in Chapter~\ref{expr-chap}.
 …
 During the course of checking the assertions of a single top-level expression, the results are cached for each assertion checked.
 The search key for this cache is the assertion declaration with its type variables substituted according to the type environment to distinguish satisfaction of the same assertion for different types.
+This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
+This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
+One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgements between top-level expressions.
+This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type.
+New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity.
+Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgement depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
+Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching.
 The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers.

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: