% inline code ©...© (copyright symbol) emacs: C-q M-) % red highlighting ®...® (registered trademark symbol) emacs: C-q M-. % blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_ % green highlighting ¢...¢ (cent symbol) emacs: C-q M-" % LaTex escape §...§ (section symbol) emacs: C-q M-' % keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^ % math escape $...$ (dollar symbol) \documentclass[twoside,11pt]{article} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Latex packages used in the document (copied from CFA user manual). \usepackage[T1]{fontenc} % allow Latin1 (extended ASCII) characters \usepackage{textcomp} \usepackage[latin1]{inputenc} \usepackage{fullpage,times,comment} \usepackage{epic,eepic} \usepackage{upquote} % switch curled `'" to straight \usepackage{calc} \usepackage{xspace} \usepackage{graphicx} \usepackage{varioref} % extended references \usepackage{listings} % format program code \usepackage[flushmargin]{footmisc} % support label/reference in footnote \usepackage{latexsym} % \Box glyph \usepackage{mathptmx} % better math font with "times" \usepackage[usenames]{color} \usepackage[pagewise]{lineno} \renewcommand{\linenumberfont}{\scriptsize\sffamily} \input{common} % bespoke macros used in the document \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref} \usepackage{breakurl} \renewcommand{\UrlFont}{\small\sf} \setlength{\topmargin}{-0.45in} % move running title into header \setlength{\headsep}{0.25in} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newsavebox{\LstBox} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \title{\Huge \vspace*{1in} Efficient Type Resolution in \CFA \\ \vspace*{0.25in} \huge PhD Comprehensive II Research Proposal \vspace*{1in} } \author{\huge \vspace*{0.1in} Aaron Moss \\ \Large Cheriton School of Computer Science \\ \Large University of Waterloo } \date{ \today } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} \pagestyle{headings} % changed after setting pagestyle \renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}} \renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}} \pagenumbering{roman} \linenumbers % comment out to turn off line numbering \maketitle \thispagestyle{empty} \clearpage \thispagestyle{plain} \pdfbookmark[1]{Contents}{section} \tableofcontents \clearpage \thispagestyle{plain} \pagenumbering{arabic} \section{Introduction} \CFA\footnote{Pronounced ``C-for-all'', and written \CFA, CFA, or \CFL.} is an evolutionary modernization of the C programming language currently being designed and built at the University of Waterloo by a team led by Peter Buhr. Features added to C by \CFA include name overloading, user-defined operators, parametric-polymorphic routines, and type constructors and destructors, among others. These features make \CFA significantly more powerful and expressive than C, but impose a significant compile-time cost to implement, particularly in the expression resolver, which must evaluate the typing rules of a much more complex type system. The primary goal of this proposed research project is to develop a sufficiently performant expression resolution algorithm, experimentally validate its performance, and integrate it into \Index*{CFA-CC}, the \CFA reference compiler. Secondary goals of this project include the development of various new language features for \CFA; parametric-polymorphic (``generic'') types have already been designed and implemented, and reference types and user-defined conversions are under design consideration. The experimental performance-testing architecture for resolution algorithms will also be used to determine the compile-time cost of adding such new features to the \CFA type system. \section{\CFA} To make the scope of the proposed expression resolution problem more explicit, it is necessary to define the features of both C and \CFA (both current and proposed) which affect this algorithm. In some cases the interactions of multiple features make expression resolution a significantly more complex problem than any individual feature would; in others a feature which does not by itself add any complexity to expression resolution will trigger previously rare edge cases much more frequently. \subsection{Polymorphic Functions} The most significant feature \CFA adds is parametric-polymorphic functions. Such functions are written using a ©forall© clause, the feature that gave the language its name: \begin{lstlisting} forall(otype T) T identity(T x) { return x; } int forty_two = identity(42); // T is bound to int, forty_two == 42 \end{lstlisting} The ©identity© function above can be applied to any complete object type (or ``©otype©''). The type variable ©T© is transformed into a set of additional implicit parameters to ©identity© which encode sufficient information about ©T© to create and return a variable of that type. The current \CFA implementation passes the size and alignment of the type represented by an ©otype© parameter, as well as an assignment operator, constructor, copy constructor \& destructor. Since bare polymorphic types do not provide a great range of available operations, \CFA also provides a \emph{type assertion} mechanism to provide further information about a type: \begin{lstlisting} forall(otype T | { T twice(T); }) T four_times(T x) { return twice( twice(x) ); } double twice(double d) { return d * 2.0; } // (1) double magic = four_times(10.5); // T is bound to double, uses (1) to satisfy type assertion \end{lstlisting} These type assertions may be either variable or function declarations which depend on a polymorphic type variable. ©four_times© can only be called with an argument for which there exists a function named ©twice© that can take that argument and return another value of the same type; a pointer to the appropriate ©twice© function will be passed as an additional implicit parameter to the call to ©four_times©. Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions. For instance, ©twice© could have been define as below, using the \CFA syntax for operator overloading: \begin{lstlisting} forall(otype S | { S ?+?(S, S); }) S twice(S x) { return x + x; } // (2) \end{lstlisting} This version of ©twice© will work for any type ©S© that has an addition operator defined for it, and it could have been used to satisfy the type assertion on ©four_times©. The compiler accomplishes this by creating a wrapper function calling ©twice // (2)© with ©S© bound to ©double©, then providing this wrapper function to ©four_times©\footnote{©twice // (2)© could have had a type parameter named ©T©; \CFA specifies a renaming the type parameters, which would avoid the name conflict with the parameter ©T© of ©four_times©.}. Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope. If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that function will be examined as a candidate for its own type assertion unboundedly repeatedly. To avoid infinite loops, the current \Index*{CFA-CC} compiler imposes a fixed limit on the possible depth of recursion, similar to that employed by most \Index*[C++]{\CC} compilers for template expansion; this restriction means that there are some semantically well-typed expressions which cannot be resolved by {CFA-CC}. One area of potential improvement this project proposes to investigate is the possibility of using the compiler's knowledge of the current set of declarations to make a more precise judgement of when further type assertion satisfaction recursion will not produce a well-typed expression. \subsection{Name Overloading} In C, no more than one function or variable in the same scope may share the same name, and function or variable declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration. This makes finding the proper declaration to match to a function application or variable expression a simple matter of symbol table lookup, which can be easily and efficiently implemented. \CFA, on the other hand, allows overloading of variable and function names % TODO talk about uses for this \subsection{Implicit Conversions} % TODO also discuss possibility of user-generated implicit conversions here \subsection{Generic Types} \subsection{Multiple Return Values} \subsection{Reference Types} % TODO discuss rvalue-to-lvalue conversion here \section{Expression Resolution} % TODO cite Baker, Cormack, etc. \subsection{Symbol Table} % TODO not sure this is sufficiently novel, but it is an improvement to CFA-CC \section{Completion Timeline} \section{Conclusion} \newpage \bibliographystyle{plain} \bibliography{cfa} \addcontentsline{toc}{section}{\indexname} % add index name to table of contents \begin{theindex} Italic page numbers give the location of the main entry for the referenced term. Plain page numbers denote uses of the indexed term. Entries for grammar non-terminals are italicized. A typewriter font is used for grammar terminals and program identifiers. \indexspace \input{comp_II.ind} \end{theindex} \end{document}