# Changeset 4ba22b8

Ignore:
Timestamp:
Mar 11, 2019, 5:34:55 PM (4 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
811466d
Parents:
d438111
Message:

thesis: initial ch.6 revisions

File:
1 edited

### Legend:

Unmodified
 rd438111 \label{expr-chap} I have implemented a prototype system to test the practical effectiveness of the various algorithms described in Chapters~\ref{resolution-chap} and~\ref{env-chap}. I implemented a prototype system to test the practical effectiveness of the various algorithms described in Chapters~\ref{resolution-chap} and~\ref{env-chap}. This prototype system implements the expression resolution pass of the \CFA{} compiler, \CFACC{}, with a simplified version of the \CFA{} type system and a parser to read in problem instances, and is published online under a permissive licence\footnote{\url{https://github.com/cforall/resolv-proto}}. The resolver prototype allows for quicker iteration on algorithms due to its simpler language model and lack of a requirement to generate runnable code, yet captures enough of the nuances of \CFA{} to have some predictive power for the runtime performance of algorithmic variants in \CFACC{} itself. I have implemented an optional \CFACC{} pass which generates test inputs for the resolver prototype from \CFA{} translation units; since at this juncture all development in \CFA{} is done by our research team, I have tested the prototype system on all \CFA{} code currently extant, primarily the standard library and compiler test suite. The resolver prototype allows for quicker iteration on algorithms due to its simpler language model and lack of a requirement to generate runnable code, yet captures enough of the nuances of \CFA{} to have predictive power for the runtime performance of algorithmic variants in \CFACC{} itself. \CFACC{} can generate realistic test inputs for the resolver prototype from equivalent \CFA{} code; the generated test inputs currently comprise all \CFA{} code currently extant, $9,000$ lines drawn primarily from the standard library and compiler test suite. \CFACC{} is also instrumented to produce a number of code metrics. These metrics were used to construct synthetic test inputs during development of the resolver prototype; these synthetic inputs provided useful design guidance, but the performance results presented in this chapter are based on the more realistic directly-generated inputs. % There are three sources of problem instances for the resolver prototype. The resolver prototype can express most of the \CFA{} features described in Chapter~\ref{cfa-chap}. It supports both monomorphic and polymorphic functions, with type assertions for polymorphic functions. Traits are not explicitly represented, but \CFACC{} inlines traits before the resolver pass, so this is a faithful representation of the existing compiler problem. Traits are not explicitly represented, but \CFACC{} inlines traits before the resolver pass, so this is a faithful representation of the existing compiler. The prototype system supports variable declarations as well as function declarations, and has a lexical-scoping scheme and \CFA{}-like overloading rules. The type system of the resolver prototype also captures key aspects of the \CFA{} type system. \emph{Concrete types} represent the built-in arithmetic types of \CFA{}, along with the implicit conversions between them. Each concrete type is represented by an integer ID, and the conversion cost from $x$ to $y$ is $|y-x|$, a safe conversion if $y > x$, or an unsafe conversion if $y < x$. This is markedly simpler than the graph of conversion costs in \CFA{} (Figure~\ref{safe-conv-graph-fig}), but captures the essentials of the design well. \emph{Concrete types} represent the built-in arithmetic types of \CFA{}, along with the implicit conversions among them. Each concrete type is represented by an integer identifier, and the conversion cost from $x$ to $y$ is $|y-x|$, a safe conversion if $y > x$, or an unsafe conversion if $y < x$. This scheme is markedly simpler than the graph of conversion costs in \CFA{} (Figure~\ref{safe-conv-fig}), but captures the essentials of the design. For simplicity, !zero_t! and !one_t!, the types of !0! and !1!, are represented by the type corresponding to !int!. \emph{Named types} are analogues to \CFA{} aggregates, such as structs and unions; aggregate fields are encoded as unary functions from the struct type to the field type, named based on the field name. \emph{Named types} are analogues to \CFA{} aggregates, such as structs and unions; aggregate fields are encoded as unary functions from the struct type to the field type, with the function named based on the field name. Named types also support type parameters, and as such can represent generic types as well. Generic named types are used to represent the built-in parameterized types of \CFA{} as well; !T*! is encoded as \texttt{\#\$ptr}. \CFA{} arrays are also represented as pointers, to simulate array-to-pointer decay, while top-level reference types are replaced by their referent to simulate the variety of reference conversions. Function types have first-class representation in the prototype as the type of both function declarations and function pointers, though the function type in the prototype system loses information about type assertions, so polymorphic function pointers cannot be expressed. Void and tuple types are also supported in the prototype, to express the multiple-return-value functions in \CFA{}, though varargs functions and !ttype! tuple-typed type variables are absent from the prototype system. \emph{Function types} have first-class representation in the prototype as well; \CFA{} function function pointers are represented as variables with the appropriate function type, though \CFA{} polymorphic function pointers cannot be represented, as the prototype system function type does not store information about type assertions. \emph{Void} and \emph{tuple types} are also supported in the prototype, to express the multiple-return-value functions in \CFA{}, though varargs functions and !ttype! tuple-typed type variables are absent from the prototype system. The prototype system also does not represent type qualifiers (\eg{} !const!, !volatile!), so all such qualifiers are stripped during conversion to the prototype system. The resolver prototype supports three sorts of expressions in its input language. The simplest are \emph{value expressions}, which are expressions declared to be a certain type; these implement literal expressions in \CFA{}, and, already being typed, are passed through the resolver unchanged. The second sort, \emph{name expressions}, represent a variable expression in \CFA{}; these contain the name of a variable or function, and are matched to an appropriate declaration overloading that name by the resolver. The second sort, \emph{name expressions}, represent a variable expression in \CFA{}; these contain the name of a variable or function, and are matched to an appropriate declaration overloading that name. The third input expression, the \emph{function expression}, represents a call to a function, with a name and zero or more argument subexpressions. As is usual in \CFA{}, operators are represented as function calls; however, as mentioned above, the prototype system represents field access expressions !a.f! as function expressions as well. The main area for future expansion in the design of the resolver prototype is conversions. Cast expressions are implemented in the output language of the resolver, but cannot be expressed in the input. The only implicit conversions supported are between the arithmetic-like concrete types, which captures most, but not all, of \CFA{}'s built-in implicit conversions\footnote{Notable absences include \lstinline{void*} to other pointer types, or \lstinline{0} to pointer types.}. The only implicit conversions supported are among the arithmetic-like concrete types, which captures most, but not all, of \CFA{}'s built-in implicit conversions\footnote{Notable absences include \lstinline{void*} to other pointer types, or \lstinline{0} to pointer types.}. Future work should include a way to express implicit (and possibly explicit) conversions in the input language, with an investigation of the most efficient way to handle implicit conversions, and potentially a design for user-defined conversions. As discussed above, for speed of development the resolver prototype works over a simplified version of the \CFA{} type system. The build system for the resolver prototype uses a number of conditional compilation flags to switch between algorithm variants while retaining maximally shared code. The build system for the resolver prototype uses a number of conditional compilation flags to switch among algorithm variants while retaining maximally shared code. A distinct executable name is also generated for each algorithmic variant so that distinct variants can be more easily tested against each other. The primary architectural difference between the resolver prototype and \CFACC{} is that the prototype system uses a simple mark-and-sweep garbage collector for memory management, while \CFACC{} takes a manual memory management approach. This decision was made for the purpose of faster development iteration, but has proved to be a significant performance benefit as well. \CFACC{} frequently needs to make deep clones of large object graphs to ensure memory ownership (followed by eventual deletion of these clones), an unnecessarily time-consuming process. The prototype, on the other hand, only needs to clone modified nodes, and can share identical subsets of the object graph. The key design decision enabling this is that all subnodes are held by !const! pointer, and thus cannot be mutated once they have been stored in a parent node. With minimal programming discipline, it can thus be ensured that any expression is either mutable or shared, but never both; the Dotty research compiler for Scala takes a similar architectural approach\cite{Dotty-github}. The tree mutator abstraction is designed to take advantage of this, only creating new nodes if a node must actually be mutated. I attempted to port this garbage collector to \CFACC{}, but without success. The GC could be used for memory management with few changes to the code-base, but without a substantial re-write to enforce the same !const! children'' discipline \CFACC{} could not take advantage of the potential to share sub-objects; without sharing of sub-objects the GC variant of \CFACC{} must do all the same allocations and deletions and garbage-collector overhead degraded performance unacceptably (though it did fix some known memory leaks introduced by failures of the existing manual memory management scheme). Another minor architectural difference between \CFACC{} and the prototype system is that \CFACC{} makes extensive use of the pointer-based !std::list!, !std::set!, and !std::map! data structures, while the prototype uses the array-based !std::vector! and the hash-based !unordered_! variants of !set! and !map! instead. Porting the prototype to use the pointer-based data structures resulted in modest performance regressions, whereas preliminary results results from porting \CFACC{} to use !std::vector! over !std::list! also showed performance regressions, in some cases significant. The primary architectural difference between the resolver prototype and \CFACC{} is that the prototype system uses a simple mark-and-sweep garbage collector for memory management, while \CFACC{} uses a manual memory-management approach. This architectural difference affects the mutation patterns used by both systems: \CFACC{} frequently makes deep clones of multi-node object graphs to ensure that there is a single owner'' for each object which can safely delete it later; the prototype system, by contrast, relies on its garbage collector to handle ownership, and can often copy pointers rather than cloning objects. The resolver prototype thus only needs to clone nodes which it modifies, and can share un-modified children between clones; the tree mutator abstraction in the prototype is designed to take advantage of this property. The key design decision enabling this is that all child nodes are held by !const! pointer, and thus cannot be mutated once they have been stored in a parent node. With minimal programming discipline, it can thus be ensured that any expression is either mutable or shared, but never both; the Dotty research compiler for Scala takes a similar architectural approach \cite{Dotty-github}. Given the significantly better performance results from the resolver prototype than \CFACC{} and profiling data showing that memory allocation is a large component of \CFACC{} runtime, I attempted to port this garbage collector to \CFACC{}, but without success. The GC could be used for memory management with few changes to the code-base, but without a substantial re-write to enforce the same !const! children'' discipline, \CFACC{} could not take advantage of the potential to share sub-objects; without sharing of sub-objects the GC variant of \CFACC{} must do all the same allocations and deletions and garbage-collector overhead degraded performance unacceptably (though it did fix some known memory leaks introduced by failures of the existing manual memory-management scheme). Another minor architectural difference between the prototype system and \CFACC{} is that \CFACC{} makes extensive use of the pointer-based !std::list!, !std::set!, and !std::map! data structures, while the prototype uses the array-based !std::vector! and the hash-based !unordered_! variants of !set! and !map! instead. Porting the prototype to use the pointer-based data structures resulted in modest performance regressions, whereas preliminary results from porting \CFACC{} to use !std::vector! over !std::list! also showed performance regressions, in some cases significant. The relative performance impact of this architectural difference is unclear, and thus excluded from consideration. The final difference between \CFACC{} and the resolver prototype is that, as an experiment in language usability, the prototype performs resolution-based rather than unification-based assertion satisfaction, as discussed in Section~\ref{resn-conclusion-sec}. This enables coding patterns not available in \CFACC{}, \eg{} a more flexible approach to type assertion satisfaction and better handling of functions returning polymorphic type variables that do not exist in the parameter list. This change enables coding patterns not available in \CFACC{}, \eg{} a more flexible approach to type assertion satisfaction and better handling of functions returning polymorphic type variables that do not exist in the parameter list. The experimental results in Section~\ref{proto-exp-sec} indicate that this choice is not a barrier to a performant resolver. % \TODO{test performance; shouldn't be too hard to change \texttt{resolveAssertions} to use unification} \section{Prototype Experiments} \label{proto-exp-sec} The primary performance experiments for this thesis were conducted using the resolver prototype on problem instances generated from actual \CFA{} code using the method described in Section~\ref{rp-features-sec}. The prototype was compiled in 24 variants over 3 variables, with variants identified by the hyphen-separated concatenation of their short codes, \eg{} \textsc{bu-imm-bas} for bottom-up traversal, immediate assertion satisfaction, basic type environment. The primary performance experiments for this thesis are conducted using the resolver prototype on problem instances generated from actual \CFA{} code using the method described in Section~\ref{rp-features-sec}. The prototype is compiled in 24 variants over 3 variables, with variants identified by the hyphen-separated concatenation of their short codes, \eg{} \textsc{bu-imm-bas} for bottom-up traversal, immediate assertion satisfaction, basic type environment. The variables and their values are as follows: \begin{description} \item[Basic] (\textsc{bas}) Bilson-style type environment with hash-based equivalence class storage, as discussed in Section~\ref{naive-env-sec}. \item[Incremental Inheritance] (\textsc{inc}) Incremental inheritance variant sharing unmodified common parent information between environments, as discussed in Section~\ref{inc-env-sec}. \item[Persistent union-find] (\textsc{per}) Union-find-based environment, using the persistent variant discussed in Section~\ref{env-persistent-union-find} for backtracking and combination. The requirement of this type environment for common root environments for combination is incompatible with the caching used in the top-down traversal direction, and thus no \textsc{td-*-per} algorithms are tested. \item[Incremental Inheritance] (\textsc{inc}) Incremental-inheritance variant sharing unmodified common parent information among environments, as discussed in Section~\ref{inc-env-sec}. \item[Persistent union-find] (\textsc{per}) Union-find-based environment, using the persistent variant discussed in Section~\ref{env-persistent-union-find} for backtracking and combination. This variant requires that all pairs of type arguments used as arguments to$combine\$ descent from a common root environment; this requirement is incompatible with the caching used in the top-down traversal direction, and thus no \textsc{td-*-per} algorithms are tested. \end{description} \end{description} To test the various algorithms, the resolver prototype was compiled using \texttt{g++} 6.5.0 with each of the 24 valid combinations of variables\footnote{Namely, all combinations except \textsc{td-*-per}.}, and then timed running each of the \CFA{}-derived test inputs. Terminal output was suppressed for all tests to avoid confounding factors in the timing results, and all tests were run three times in series, with the median result reported in all cases. To test the various algorithms, the resolver prototype is compiled using \texttt{g++} 6.5.0 with each of the 24 valid combinations of variables\footnote{Namely, all combinations except \textsc{td-*-per}.}, and then timed running each of the \CFA{}-derived test inputs. Terminal output is suppressed for all tests to avoid confounding factors in the timing results, and all tests are run three times in series, with the median result reported in all cases. The medians are representative data points; considering test cases that took at least 0.2~s to run, the average run was within 2\% of the reported median runtime, and no run diverged by more than 20\% of median runtime or 5.5~s. The memory results are even more consistent, with no run exceeding 2\% difference from median in peak resident set size, and 93\% of tests not recording any difference within the 1~KB granularity of the measurement software. The memory results are even more consistent, with no run exceeding 2\% difference from median in peak resident set size, and 93\% of tests recording identical peak memory usage within the 1~KB granularity of the measurement software. All tests were run on a machine with 128~GB of RAM and 64 cores running at 2.2~GHz. As a matter of experimental practicality, test runs which exceeded 8~GB of peak resident memory usage were excluded from the data set. This is a reasonable real-world restriction, as a compiler which is merely slow may be accommodated with patience, but one which uses in excess of 8~GB of RAM may be impossible to run on many currently deployed computer systems. The \textsc{bu-dca-bas} and \textsc{bu-dca-per} variants were able to run all 131 test inputs to completion under this restriction, with maximum memory usage of 70~MB and 78~MB, respectively, which validates its selection as an error threshold. As a matter of experimental practicality, test runs that exceeded 8~GB of peak resident memory usage are excluded from the data set. This restriction is justifiable by real-world use, as a compiler that is merely slow may be accommodated with patience, but one that uses in excess of 8~GB of RAM may be impossible to run on many currently deployed computer systems. 8~GB of RAM is not typical of the memory usage of the best-peforming two variants, \textsc{bu-dca-bas} and \textsc{bu-dca-per}, which were able to run all 131 test inputs to completion  with maximum memory usage of 70~MB and 78~MB, respectively. However, this threshold did eliminate a significant number of algorithm-test variants, with the worst-performing variant, \textsc{td-imm-inc}, only completing 62 test inputs within the memory bound. Full results for tests completed by algorithm variant are presented in Figure~\ref{tests-completed-fig}.