1 | % inline code ©...© (copyright symbol) emacs: C-q M-) |
---|
2 | % red highlighting ®...® (registered trademark symbol) emacs: C-q M-. |
---|
3 | % blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_ |
---|
4 | % green highlighting ¢...¢ (cent symbol) emacs: C-q M-" |
---|
5 | % LaTex escape §...§ (section symbol) emacs: C-q M-' |
---|
6 | % keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^ |
---|
7 | % math escape $...$ (dollar symbol) |
---|
8 | |
---|
9 | \documentclass[twoside,11pt]{article} |
---|
10 | |
---|
11 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
12 | |
---|
13 | % Latex packages used in the document (copied from CFA user manual). |
---|
14 | \usepackage[T1]{fontenc} % allow Latin1 (extended ASCII) characters |
---|
15 | \usepackage{textcomp} |
---|
16 | \usepackage[latin1]{inputenc} |
---|
17 | \usepackage{fullpage,times,comment} |
---|
18 | \usepackage{epic,eepic} |
---|
19 | \usepackage{upquote} % switch curled `'" to straight |
---|
20 | \usepackage{calc} |
---|
21 | \usepackage{xspace} |
---|
22 | \usepackage{graphicx} |
---|
23 | \usepackage{varioref} % extended references |
---|
24 | \usepackage{listings} % format program code |
---|
25 | \usepackage[flushmargin]{footmisc} % support label/reference in footnote |
---|
26 | \usepackage{latexsym} % \Box glyph |
---|
27 | \usepackage{mathptmx} % better math font with "times" |
---|
28 | \usepackage[usenames]{color} |
---|
29 | \usepackage[pagewise]{lineno} |
---|
30 | \renewcommand{\linenumberfont}{\scriptsize\sffamily} |
---|
31 | \input{common} % bespoke macros used in the document |
---|
32 | \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref} |
---|
33 | \usepackage{breakurl} |
---|
34 | \renewcommand{\UrlFont}{\small\sf} |
---|
35 | |
---|
36 | \setlength{\topmargin}{-0.45in} % move running title into header |
---|
37 | \setlength{\headsep}{0.25in} |
---|
38 | |
---|
39 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
40 | |
---|
41 | \newsavebox{\LstBox} |
---|
42 | |
---|
43 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
44 | |
---|
45 | \title{\Huge |
---|
46 | \vspace*{1in} |
---|
47 | Efficient Type Resolution in \CFA \\ |
---|
48 | \vspace*{0.25in} |
---|
49 | \huge |
---|
50 | PhD Comprehensive II Research Proposal |
---|
51 | \vspace*{1in} |
---|
52 | } |
---|
53 | |
---|
54 | \author{\huge |
---|
55 | \vspace*{0.1in} |
---|
56 | Aaron Moss \\ |
---|
57 | \Large Cheriton School of Computer Science \\ |
---|
58 | \Large University of Waterloo |
---|
59 | } |
---|
60 | |
---|
61 | \date{ |
---|
62 | \today |
---|
63 | } |
---|
64 | |
---|
65 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
66 | |
---|
67 | \begin{document} |
---|
68 | \pagestyle{headings} |
---|
69 | % changed after setting pagestyle |
---|
70 | \renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}} |
---|
71 | \renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}} |
---|
72 | \pagenumbering{roman} |
---|
73 | \linenumbers % comment out to turn off line numbering |
---|
74 | |
---|
75 | \maketitle |
---|
76 | \thispagestyle{empty} |
---|
77 | |
---|
78 | \clearpage |
---|
79 | \thispagestyle{plain} |
---|
80 | \pdfbookmark[1]{Contents}{section} |
---|
81 | \tableofcontents |
---|
82 | |
---|
83 | \clearpage |
---|
84 | \thispagestyle{plain} |
---|
85 | \pagenumbering{arabic} |
---|
86 | |
---|
87 | \section{Introduction} |
---|
88 | |
---|
89 | \CFA\footnote{Pronounced ``C-for-all'', and written \CFA, CFA, or \CFL.} is an evolutionary modernization of the C programming language currently being designed and built at the University of Waterloo by a team led by Peter Buhr. |
---|
90 | Features added to C by \CFA include name overloading, user-defined operators, parametric-polymorphic routines, and type constructors and destructors, among others. |
---|
91 | These features make \CFA significantly more powerful and expressive than C, but impose a significant compile-time cost to implement, particularly in the expression resolver, which must evaluate the typing rules of a much more complex type system. |
---|
92 | The primary goal of this proposed research project is to develop a sufficiently performant expression resolution algorithm, experimentally validate its performance, and integrate it into \Index*{CFA-CC}, the \CFA reference compiler. |
---|
93 | Secondary goals of this project include the development of various new language features for \CFA; parametric-polymorphic (``generic'') types have already been designed and implemented, and reference types and user-defined conversions are under design consideration. |
---|
94 | The experimental performance-testing architecture for resolution algorithms will also be used to determine the compile-time cost of adding such new features to the \CFA type system. |
---|
95 | |
---|
96 | \section{\CFA} |
---|
97 | |
---|
98 | To make the scope of the proposed expression resolution problem more explicit, it is necessary to define the features of both C and \CFA (both current and proposed) which affect this algorithm. |
---|
99 | In some cases the interactions of multiple features make expression resolution a significantly more complex problem than any individual feature would; in others a feature which does not by itself add any complexity to expression resolution will trigger previously rare edge cases much more frequently. |
---|
100 | |
---|
101 | \subsection{Polymorphic Functions} |
---|
102 | The most significant feature \CFA adds is parametric-polymorphic functions. |
---|
103 | Such functions are written using a ©forall© clause, the feature that gave the language its name: |
---|
104 | \begin{lstlisting} |
---|
105 | forall(otype T) |
---|
106 | T identity(T x) { |
---|
107 | return x; |
---|
108 | } |
---|
109 | |
---|
110 | int forty_two = identity(42); // T is bound to int, forty_two == 42 |
---|
111 | \end{lstlisting} |
---|
112 | The ©identity© function above can be applied to any complete object type (or ``©otype©''). |
---|
113 | The type variable ©T© is transformed into a set of additional implicit parameters to ©identity© which encode sufficient information about ©T© to create and return a variable of that type. |
---|
114 | The current \CFA implementation passes the size and alignment of the type represented by an ©otype© parameter, as well as an assignment operator, constructor, copy constructor \& destructor. |
---|
115 | |
---|
116 | Since bare polymorphic types do not provide a great range of available operations, \CFA also provides a \emph{type assertion} mechanism to provide further information about a type: |
---|
117 | \begin{lstlisting} |
---|
118 | forall(otype T | { T twice(T); }) |
---|
119 | T four_times(T x) { |
---|
120 | return twice( twice(x) ); |
---|
121 | } |
---|
122 | |
---|
123 | double twice(double d) { return d * 2.0; } // (1) |
---|
124 | |
---|
125 | double magic = four_times(10.5); // T is bound to double, uses (1) to satisfy type assertion |
---|
126 | \end{lstlisting} |
---|
127 | These type assertions may be either variable or function declarations which depend on a polymorphic type variable. |
---|
128 | ©four_times© can only be called with an argument for which there exists a function named ©twice© that can take that argument and return another value of the same type; a pointer to the appropriate ©twice© function will be passed as an additional implicit parameter to the call to ©four_times©. |
---|
129 | |
---|
130 | Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions. |
---|
131 | For instance, ©twice© could have been define as below, using the \CFA syntax for operator overloading: |
---|
132 | \begin{lstlisting} |
---|
133 | forall(otype S | { S ?+?(S, S); }) |
---|
134 | S twice(S x) { return x + x; } // (2) |
---|
135 | \end{lstlisting} |
---|
136 | This version of ©twice© will work for any type ©S© that has an addition operator defined for it, and it could have been used to satisfy the type assertion on ©four_times©. |
---|
137 | The compiler accomplishes this by creating a wrapper function calling ©twice // (2)© with ©S© bound to ©double©, then providing this wrapper function to ©four_times©\footnote{©twice // (2)© could have had a type parameter named ©T©; \CFA specifies a renaming the type parameters, which would avoid the name conflict with the parameter ©T© of ©four_times©.}. |
---|
138 | |
---|
139 | Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope. |
---|
140 | If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that function will be examined as a candidate for its own type assertion unboundedly repeatedly. |
---|
141 | To avoid infinite loops, the current \Index*{CFA-CC} compiler imposes a fixed limit on the possible depth of recursion, similar to that employed by most \Index*[C++]{\CC} compilers for template expansion; this restriction means that there are some semantically well-typed expressions which cannot be resolved by {CFA-CC}. |
---|
142 | One area of potential improvement this project proposes to investigate is the possibility of using the compiler's knowledge of the current set of declarations to make a more precise judgement of when further type assertion satisfaction recursion will not produce a well-typed expression. |
---|
143 | |
---|
144 | \subsection{Name Overloading} |
---|
145 | In C, no more than one function or variable in the same scope may share the same name, and function or variable declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration. |
---|
146 | This makes finding the proper declaration to match to a function application or variable expression a simple matter of symbol table lookup, which can be easily and efficiently implemented. |
---|
147 | \CFA, on the other hand, allows overloading of variable and function names, so long as the overloaded declarations do not have the same type, avoiding the multiplication of function names for different types common in the C standard library, as in the following example: |
---|
148 | \begin{lstlisting} |
---|
149 | int three = 3; |
---|
150 | double three = 3.0; |
---|
151 | |
---|
152 | int thrice(int i) { return i * three; } // uses int three |
---|
153 | double thrice(double d) { return d * three; } // uses double three |
---|
154 | |
---|
155 | // thrice(three); // ERROR: ambiguous |
---|
156 | int nine = thrice(three); // uses int thrice and three, based on return type |
---|
157 | double nine = thrice(three); // uses double thrice and three, based on return type |
---|
158 | \end{lstlisting} |
---|
159 | |
---|
160 | The presence of name overloading in \CFA means that simple table lookup is not sufficient to match identifiers to declarations, and a type matching algorithm must be part of expression resolution. |
---|
161 | |
---|
162 | \subsection{Implicit Conversions} |
---|
163 | In addition to the multiple interpretations of an expression produced by name overloading, \CFA also supports all of the implicit conversions present in C, producing further candidate interpretations for expressions. |
---|
164 | C does not have a traditionally-defined inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'' define which of the built-in types are implicitly convertable to which other types, and the relative cost of any pair of such conversions from a single source type. |
---|
165 | \CFA adds to the usual arithmetic conversions rules for determining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, {e.g.} ©int© to ©char©, but more expensive than any \emph{safe} (widening) conversion, {e.g.} ©int© to ©double©. |
---|
166 | The expression resolution problem, then, is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression, and which subexpression interpretation is minimal-cost may be disambiguated by context. |
---|
167 | |
---|
168 | \subsubsection{User-generated Implicit Conversions} |
---|
169 | One possible additional feature to \CFA included in this research proposal is \emph{user-generated implicit conversions}. |
---|
170 | Such a conversion system should be simple for user programmers to utilize, and fit naturally with the existing design of implicit conversions in C; ideally it would also be sufficiently powerful to encode C's usual arithmetic conversions itself, so that \CFA only has one set of rules for conversions. |
---|
171 | |
---|
172 | Glen Ditchfield \textbf{TODO CITE} has laid out a framework for using polymorphic conversion constructor functions to create a directed acyclic graph (DAG) of conversions. |
---|
173 | A monomorphic variant of these functions can be used to mark a conversion arc in the DAG as only usable as the final step in a conversion. |
---|
174 | With these two types of conversion arcs, separate DAGs can be created for the safe and the unsafe conversions, and conversion cost can be represented as path length through the DAG. |
---|
175 | Open research questions on this topic include whether a conversion graph can be generated that represents each allowable conversion in C with a unique minimal-length path, such that the path lengths accurately represent the relative costs of the conversions, whether such a graph representation can be usefully augmented to include user-defined types as well as built-in types, and whether the graph can be efficiently represented and included in the expression resolver. |
---|
176 | |
---|
177 | \subsection{Generic Types} |
---|
178 | |
---|
179 | \subsection{Multiple Return Values} |
---|
180 | |
---|
181 | \subsection{Reference Types} |
---|
182 | % TODO discuss rvalue-to-lvalue conversion here |
---|
183 | |
---|
184 | \section{Expression Resolution} |
---|
185 | % TODO cite Baker, Cormack, etc. |
---|
186 | |
---|
187 | \subsection{Symbol Table} |
---|
188 | % TODO not sure this is sufficiently novel, but it is an improvement to CFA-CC |
---|
189 | |
---|
190 | \section{Completion Timeline} |
---|
191 | |
---|
192 | \section{Conclusion} |
---|
193 | |
---|
194 | \newpage |
---|
195 | |
---|
196 | \bibliographystyle{plain} |
---|
197 | \bibliography{cfa} |
---|
198 | |
---|
199 | |
---|
200 | \addcontentsline{toc}{section}{\indexname} % add index name to table of contents |
---|
201 | \begin{theindex} |
---|
202 | Italic page numbers give the location of the main entry for the referenced term. |
---|
203 | Plain page numbers denote uses of the indexed term. |
---|
204 | Entries for grammar non-terminals are italicized. |
---|
205 | A typewriter font is used for grammar terminals and program identifiers. |
---|
206 | \indexspace |
---|
207 | \input{comp_II.ind} |
---|
208 | \end{theindex} |
---|
209 | |
---|
210 | \end{document} |
---|