Context Navigation

source: doc/theses/fangren_yu_MMath/intro.tex@ 56ec508

Visit:

Last change on this file since 56ec508 was 8ef0bf7, checked in by Peter A. Buhr <pabuhr@…>, 6 months ago
update Table 1.1
Property mode set to `100644`
File size: 58.0 KB

Line
1	\chapter{Introduction}
2
3	This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the \newterm{type-resolver} used to satisfy call-site assertions among overloaded variable and function names to allow polymorphic routine calls.
4	Assertions are the operations a function uses within its body to perform its computation.
5	For example, a polymorphic function summing an array needs a size, zero, assignment, and plus for the array element-type, and a subscript operation for the array type.
6	\begin{cfa}
7	T sum( T a[$\,$], size_t size ) {
8	@T@ total = { @0@ }; $\C[1.75in]{// size, 0 for type T}$
9	for ( size_t i = 0; i < size; i += 1 )
10	total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$
11	return total;
12	}
13	\end{cfa}
14	In certain cases, if the resolver fails to find an exact assertion match, it attempts to find a \emph{best} match using reasonable type conversions.
15	Hence, \CFA follows the current trend of replacing nominal inheritance with traits composed of assertions for type matching.
16	The over-arching goal in \CFA is to push the boundary on localized assertion matching, with advanced overloading resolution and type conversions that match programmer expectations in the C programming language.
17	Together, the resulting type-system has a number of unique features making it different from other programming languages with expressive, static, type-systems.
18
19
20	\section{Types}
21
22	All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
23	A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
24	These language types are thrust upon programmers with their syntactic/semantic rules and restrictions.
25	These rules are then used to transform a language expression to a unique hardware expression.
26	Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
27	Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
28	Expressibility, generalization, and safety are all bound up in a language's type system, and hence, directly affect the capability, build time, and correctness of program development.
29
30
31	\section{Overloading}
32
33	\begin{quote}
34	There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton
35	\end{quote}
36	Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
37	Experience from \CC and \CFA developers shows the type system can implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
38	In many cases, a programmer is unaware of name clashes, as they are silently resolved, simplifying the development process.
39
40	\newterm{Namespace pollution} refers to loading the global or other namespaces with many names, resulting in paranoia that the compiler could make wrong choices for overloaded names causing failure.
41	This fear leads to coding styles where names are partitioned by language mechanisms and qualification is used to make names unique.
42	This approach defeats the purpose of overloading and places an additional coding burden on both the developer and user.
43	As well, many namespace systems provide a mechanism to open their scope returning to normal overloading, \ie no qualification.
44	While namespace mechanisms are very important and provide a number of crucial program-development features, protection from overloading is overstated.
45	Similarly, lexical nesting is another place where overloading occurs.
46	For example, in object-oriented programming, class member names \newterm{shadow} names within members.
47	Some programmers, qualify all member names with @class::@ or @this->@ to make them unique from names defined in members.
48	Even nested lexical blocks result in shadowing, \eg multiple nested loop-indices called @i@.
49	Again, coding styles exist requiring all variables in nested block to be unique to prevent name shadowing.
50	Depending on the language, these possible ambiguities can be reported (as warnings or errors) and resolved explicitly using some form of qualification and/or cast.
51
52	Formally, overloading is defined by Strachey as \newterm{ad hoc polymorphism}:
53	\begin{quote}
54	In ad hoc polymorphism there is no single systematic way of determining the type of the result from the type of the arguments.
55	There may be several rules of limited extent which reduce the number of cases, but these are themselves ad hoc both in scope and content.
56	All the ordinary arithmetic operators and functions come into this category.
57	It seems, moreover, that the automatic insertion of transfer functions by the compiling system is limited to this.~\cite[p.~37]{Strachey00}
58	\end{quote}
59	where a \newterm{transfer function} is an implicit conversion to help find a matching overload:
60	\begin{quote}
61	The problem of dealing with polymorphic operators is complicated by the fact that the range of types sometimes overlap.
62	Thus for example 3 may be an integer or a real and it may be necessary to change it from one type to the other.
63	The functions which perform this operation are known as transfer functions and may either be used explicitly by the programmer, or, in some systems, inserted automatically by the compiling system.~\cite[p.~35]{Strachey00}
64	\end{quote}
65	The differentiating characteristic between parametric polymorphism and overloading is often stated as: polymorphic functions use one algorithm to operate on arguments of many different types, whereas overloaded functions use a different algorithm for each type of argument.
66	A similar differentiation is applicable for overloading and default parameters.
67	\begin{cquote}
68	\setlength{\tabcolsep}{10pt}
69	\begin{tabular}{@{}lll@{}}
70	\multicolumn{1}{c}{\textbf{different implementations}} & \multicolumn{1}{c}{\textbf{same implementation}} \\
71	\begin{cfa}
72	void foo( int );
73	void foo( int, int );
74	\end{cfa}
75	&
76	\begin{cfa}
77	void foo( int, int = 5 ); // default value
78
79	\end{cfa}
80	\end{tabular}
81	\end{cquote}
82	However, this distinguishing characteristic is vague.
83	For example, should the operation @abs@ be overloaded or polymorphic or both?
84	\begin{cquote}
85	\setlength{\tabcolsep}{10pt}
86	\begin{tabular}{@{}lll@{}}
87	\multicolumn{1}{c}{\textbf{overloading}} & \multicolumn{1}{c}{\textbf{polymorphic}} \\
88	\begin{cfa}
89	int abs( int );
90	double abs( double );
91	\end{cfa}
92	&
93	\begin{cfa}
94	forall( T \| { void ?{}( T &, zero_t ); int ?<?( T, T ); T -?( T ); } )
95	T abs( T );
96	\end{cfa}
97	\end{tabular}
98	\end{cquote}
99	Here, there are performance advantages for having specializations and code-reuse advantages for the generalization.
100
101	The Strachey definitions raise several questions.
102	\begin{enumerate}[leftmargin=*]
103	\item
104	Is overloading polymorphism?
105
106	\noindent
107	In type theory, polymorphism allows an overloaded type name to represent multiple different types.
108	For example, generic types overload the type name for a container type.
109	\begin{cfa}
110	@List@<int> li; @List@<double> ld; @List@<struct S> ls;
111	\end{cfa}
112	For subtyping, a derived type masquerades as a base type, where the base and derived names cannot be overloaded.
113	Instead, the mechanism relies on structural typing among the types.
114	In both cases, the polymorphic mechanisms apply in the type domain and the names are in the type namespace.
115	In the following C example:
116	\begin{cfa}
117	struct S { int S; };
118	struct S s;
119	void S( struct S S ) { S.S = 1; };
120	enum E { E };
121	enum E e = E;
122	\end{cfa}
123	the overloaded names @S@ and @E@ are separated into the type and object domain, and C uses the type kinds @struct@ and @enum@ to disambiguate the names.
124	In general, types are not overloaded because inferencing them is difficult to imagine in a statically programming language.
125	\begin{cquote}
126	\setlength{\tabcolsep}{26pt}
127	\begin{tabular}{@{}ll@{}}
128	\begin{cfa}
129	struct S { int i; }; // 1
130	struct S { int i, j }; // 2
131	union S { int i; double d; }; // 3
132	typedef int S; // 4
133	\end{cfa}
134	&
135	\begin{cfa}
136	S s;
137	if ( ... ) s = (S){ 3 }; // 1
138	else s = 3; // 4
139
140	\end{cfa}
141	\end{tabular}
142	\end{cquote}
143	On the other hand, in ad-hoc overloading of variables and/or functions, the names are in the object domain and each overloaded object name has an anonymous associated type.
144	\begin{cfa}
145	@double@ foo( @int@ ); @char@ foo( @void@ ); @int@ foo( @double, double@ );
146	@double@ foo; @char@ foo; @int@ foo;
147	\end{cfa}
148	Here, the associated type cannot be extracted using @typeof@/\lstinline[language={[11]C++}]{decltype} for typing purposes, as @typeof( foo )@ is always ambiguous.
149	To disambiguate the type of @foo@ requires additional information from the call-site or a cast.
150	\begin{cfa}
151	double d = foo( 3 ); char c = foo(); int i = foo( 3.5, 4.1 ); $\C{// exact match}$
152	d = foo; c = foo; i = foo; $\C{// exact match}$
153	i = ((double (*)( int ))foo)( 3.5 ); $\C{// no exact match, explicit cast => conversions}$
154	\end{cfa}
155	Hence, depending on your perspective, overloading may not be polymorphism, as no single overloaded entity represents multiple types.
156
157	\item
158	Does ad-hoc polymorphism have a single systematic way of determining the type of the result from the type of the arguments?
159
160	\noindent
161	For exact type matches in overloading, there is a systematic way of matching arguments to parameters, and a function denotes its return type rather than using type inferencing.
162	This matching is just as robust as other polymorphic analysis.
163	The ad-hoc aspect might be the implicit transfer functions (conversions) applied to arguments to create an exact parameter type-match, as there may be multiple conversions leading to different exact matches.
164	Note, conversion issues apply to both non-overloaded and overloaded functions.
165	Here, the selection of the conversion functions can be considered the \emph{opinion} of the language (type system), even if the technique used is based on sound rules, like maximizing conversion accuracy (non-lossy).
166	The difference in opinion results when the language conversion rules differ from a programmer's expectations.
167	However, without implicit conversions, programmers may have to write an exponential number of functions covering all possible exact-match cases among all reasonable types.
168	\CFA's \emph{opinion} on conversions must match C's and then rely on programmers to understand the effects.
169	That is, let the compiler do the heavy-lifting of selecting a \emph{best} set of conversions that maximize safety.
170	Hence, removing implicit conversions from \CFA is not an option (as is true in most other programming languages), so it must do the best possible job to get it right.
171
172	\item
173	Why are there two forms of \emph{overloading} (general and parametric) in different programming languages?
174
175	\noindent
176	\newterm{General overloading} occurs when the type-system \emph{knows} a function's parameters and return types (or a variable's type for variable overloading).
177	In functional programming-languages, there is always a return type (except for a monad).
178	If a return type is specified, the compiler does not have to inference the routine body.
179	For example, the compiler has complete knowledge about builtin types and their overloaded arithmetic operators.
180	In this context, there is a fixed set of overloads for a given name that are completely specified.
181	Overload resolution then involves finding an exact match between a call and the overload prototypes based on argument type(s) and possibly return context.
182	If an \emph{exact} match is not found, the call is either ill formed (ambiguous) or further attempts are made to find a \emph{best} match using transfer functions (conversions).
183	As a consequence, no additional runtime information is needed per call, \ie the call is a direct transfer (branch) with possibly converted arguments.
184
185	\noindent
186	\newterm{Parametric overloading} occurs when the type-system does not know a function's parameter and return types.
187	Types are inferred from constant types and/or constraint information that includes typing.
188	Parametric overloading starts with a universal/generic type used to define parameters and returns.
189	\begin{cfa}
190	forall( T ) T identity( T t ) { return t; }
191	int i = identify( 3 );
192	double d;
193	double e = identity( d );
194	\end{cfa}
195	Here, the type system is substituting the argument type for @T@, and ensuring the return type also matches.
196	This mechanism can be extended to overloaded parametric functions with multiple type parameters.
197	\begin{cfa}
198	forall( T ) [T, T] identity( T t1, T t2 ) { return [t1, t2 ]; } // return tuple
199	int j;
200	[i, j] = identity( 3, 4 );
201	forall( T, U ) [T, U] identity( T t1, U t2 ) { return [t1, t2 ]; }
202	[i, d] = identity( i, d );
203	\end{cfa}
204	Here, the type system is using techniques from general overloading, number and argument types, to disambiguate the overloaded parametric functions.
205	However, this exercise is moot because the types are so universal they have no practical operations within the function, modulo any general operations carried by every type, like copy or print.
206	This kind of polymorphism is restricted to moving type instances as abstract entities;
207	if type erasure is used, there is no way to recover the original type to perform useful actions on its value(s).
208	Hence, parametric overloading requires additional information about the universal types to make them useful.
209
210	This additional information often comes as a set of operations a type must supply (@trait@/-@concept@) and these operations can then be used in the body of the function.
211	\begin{cfa}
212	forall( T \| T ?@++@( T, T ) ) T inc( T t ) { return t@++@; }
213	int i = 3
214	i = inc( i )
215	double d = inc( 3.5 );
216	\end{cfa}
217	Given a qualifying trait, are its elements inferred or declared?
218	In the above example, the type system infers @int@ for @T@, infers it needs a @++@ operator that takes an @int@ and returns an @int@, and finds this function in the enclosing environment (\eg standard prelude).
219	This implicit inferencing is expensive if matched with implicit conversions when there is no exact match.
220	Alternatively, types opt-in to traits via declarations.
221	\begin{cfa}
222	trait postinc { T ?@++@( T, T ) }
223	type int provide postinc;
224	\end{cfa}
225	In this approach, after inferring @int@ for @T@, @int@ is examined for any necessary traits required in the function, so there is no searching.
226	Once all traits are satisfied, the compiler can inline them, pass functions pointers directly, or pass a virtual table composing the trait.
227	\end{enumerate}
228	The following sections illustrate how overloading is provided in \CFA.
229
230	\begin{comment}
231	\begin{lstlisting}[language=Haskell]
232	f( int, int ); f( int, float ); -- return types to be determined
233	g( int, int ); g( float, int );
234	let x = curry f( 3, _ ); -- which f
235	let y = curry g( _ , 3 ); -- which g
236	\end{lstlisting}
237	For the currying to succeed, there cannot be overloaded function names resulting in ambiguities.
238	To allow currying to succeed requires an implicit disambiguating mechanism, \ie a kind of transfer function.
239	A type class~\cite{typeclass} is a mechanism to convert overloading into parametric polymorphism.
240	Parametric polymorphism has enough information to disambiguate the overloaded names because it removes the type inferencing.
241	\begin{cfa}
242	forall( T \| T has + $and$ - ) T f$\(_1\)$( T );
243	forall( T \| T has * $and$ - ) T f$\(_2\)$( T );
244	x = f$\(_1\)$( x ); // if x has + and - but not *
245	y = f$\(_2\)$( y ); // if y has * and - but not +
246	\end{cfa}
247	Here, the types of @x@ and @y@ are combined in the type-class contraints to provide secondary infomration for disambiguation.
248	This approach handles many overloading cases because the contraints overlap completely or are disjoint
249
250	A type class (trait) generically abstracts the set of the operations used in a function's implementation.
251	A type-class instance binds a specific type to the generic operations to form concrete instances, giving a name type-class.
252	Then Qualified types concisely express the operations required to convert an overloaded
253	The name type-class is used as a transfer function to convert an overloaded routine into a polymorphic routine that is uniquely qualified with the name type-class.
254	\begin{cfa}
255	void foo_int_trait( special int trait for operations in this foo );
256	void foo_int_int_trait( special (int, int) trait for operations in this foo );
257	\end{cfa}
258
259	In this case, the compiler implicitly changes the overloaded function to a parametrically polymorphic one.
260	Hence, the programmer does specify any additional information for the overloading to work.
261	Explicit overloading occurs when the compiler has to be told what operations are associated with a type programmer separately defines the associate type and subsequently associates the type with overloaded name.
262	\end{comment}
263
264	\begin{comment}
265	Date: Mon, 24 Feb 2025 11:26:12 -0500
266	Subject: Re: overloading
267	To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
268	CC: <f37yu@uwaterloo.ca>, <ajbeach@uwaterloo.ca>, <mlbrooks@uwaterloo.ca>,
269	<alvin.zhang@uwaterloo.ca>, <lseo@plg.uwaterloo.ca>,
270	<j82liang@uwaterloo.ca>
271	From: Gregor Richards <gregor.richards@uwaterloo.ca>
272
273	Yes.
274
275	With valediction,
276	- Gregor Richards
277
278	On 2/24/25 11:22, Peter A. Buhr wrote:
279	> Gregor Richards <gregor.richards@uwaterloo.ca> writes:
280	> In Haskell, `+` works for both because of typeclasses (inclusion
281	> polymorphism), and so is also not an unresolved type.
282	>
283	> I'm making this up. The Haskell type-class is a trait, like an interface or
284	> abstract class, and its usage/declaration/binding creates a specific trait
285	> instance for bound types, which is a vtable filled with the typed routines
286	> instantiated/located for the trait. The vtables are present at runtime and
287	> passed implicitly to ad-hoc polymorphic routines allowing differentiate of
288	> overloaded functions based on the number of traits and their specialization.
289	> (Major geek talk, YA! 8-)
290	>
291	> On 2/21/25 23:04, Fangren Yu wrote:
292	> > In a statically typed language I would rather have definitions like
293	> > double x = x+x be ambiguous than "an unresolved type" as the latter
294	> > sounds like a weaker version of a generic type, and being able to make
295	> > something generic without explicitly saying so is probably not a good
296	> > idea. Giving the unspecified parameter type an arbitrary preference is
297	> > the second best option IMO (does ML give you a warning on such not
298	> > fully inferred types?)
299	> > ------------------------------------------------------------------------
300	> > From: Gregor Richards <gregor.richards@uwaterloo.ca>
301	> > Sent: Wednesday, February 19, 2025 9:55:23 PM
302	> > To: Peter Buhr <pabuhr@uwaterloo.ca>
303	> > Cc: Andrew James Beach <ajbeach@uwaterloo.ca>; Michael Leslie Brooks
304	> > <mlbrooks@uwaterloo.ca>; Fangren Yu <f37yu@uwaterloo.ca>;
305	> > j82liang@uwaterloo.ca <j82liang@uwaterloo.ca>; Alvin Zhang
306	> > <alvin.zhang@uwaterloo.ca>; lseo@plg.uwaterloo.ca <lseo@plg.uwaterloo.ca>
307	> > Subject: Re: overloading
308	> > Jives with what I was saying, albeit not exactly the same; it's a result
309	> > of the same problem.
310	> >
311	> > 'doubles' refers to an unresolved 'double', and the latter can't be
312	> > resolved without the former, so you can't compile 'double' unless you
313	> > know what its arguments are. The solutions are:
314	> >
315	> > * Typeclasses make it possible by compiling with a handle. When you
316	> > call a function that takes a typeclass value as an argument, it takes an
317	> > extra, hidden argument internally which is the typeclass handle. That
318	> > handle tells the callee how to use the typeclass functions with this
319	> > particular value. And, of course, you hope that some aggressive inlining
320	> > gets rid of the dynamic dispatch :). But, no per se overloading
321	> > supported, only inclusion polymorphism.
322	> >
323	> > * If you do whole-world compilation, then you just compile what you
324	> > particularly need in context. If you call 'doubles' with a
325	> > float,int,int, then you compile that version. But, currying is unsolved.
326	> >
327	> > * If you do C++-style templating, this is a less severe problem, as
328	> > you compile it with the use of 'doubles', not with the definition. But,
329	> > either no currying, or you have to specify the extra types explicitly so
330	> > it knows what to curry, so no inference.
331	> >
332	> > * If you do Java-style generics, ... just kidding.
333	> >
334	> > In a language like Haskell or OCaml, if you want to compile this
335	> > modularly and have the code with the implementation, then naively it
336	> > would have to make eight implementations. But, even that is only true if
337	> > (x, y, z) is a single tuple argument. Currying is still the killer. If
338	> > you call `doubles 3`, the return is supposed to be some x -> y -> (int, x,
339	> > y), where 'x' and 'y' are "some type on which I can call a 'double'
340	> > function, but I don't know which double function yet because I don't
341	> > know what type". Even writing that type is hard enough, but having
342	> > values of that type float around at runtime? Yikes.
343	> >
344	> > To put it a different way: In C++ (and presumably CFA?), you can
345	> > overload all you want to, but you can't get a function pointer to an
346	> > unresolved overload. The function pointer is to a particular overload,
347	> > not the set of possible overloads. Well, in a functional language,
348	> > function pointers are the lifeblood of the language.
349	> >
350	> > With valediction,
351	> > - Gregor Richards
352	> >
353	> > On 2/19/25 21:25, Peter A. Buhr wrote:
354	> > > In the "Type Classes" chapter I sent out, the author says the
355	> > following. Does it
356	> > > jive with what you are saying about currying? BTW, I do not know who
357	> > wrote the
358	> > > book chapter.
359	> > >
360	> > >
361	> > ==========================================================================
362	> > >
363	> > > Suppose we have a language that overloads addition + and
364	> > multiplication *,
365	> > > providing versions that work over values of type Int and type Float.
366	> > Now,
367	> > > consider the double function, written in terms of the overloaded
368	> > addition
369	> > > operation:
370	> > >
371	> > > double x = x + x
372	> > >
373	> > > What does this definition mean? A naive interpretation would be to
374	> > say that
375	> > > double is also overloaded, defining one function of type Int -> Int
376	> > -> Int and a
377	> > > second of type Float -> Float -> Float. All seems fine, until we
378	> > consider the
379	> > > function
380	> > >
381	> > > doubles (x,y,z) = (double x, double y, double z)
382	> > >
383	> > > Under the proposed scheme, this definition would give rise to eight
384	> > different
385	> > > versions! This approach has not been widely used because of the
386	> > exponential
387	> > > growth in the number of versions.
388	> > >
389	> > > To avoid this blow-up, language designers have sometimes restricted the
390	> > > definition of overloaded functions. In this approach, which was
391	> > adopted in
392	> > > Standard ML, basic operations can be overloaded, but not functions
393	> > defined in
394	> > > terms of them. Instead, the language design specifies one of the
395	> > possible
396	> > > versions as the meaning of the function. For example, Standard ML give
397	> > > preference to the type int over real, so the type (and
398	> > implementation) of the
399	> > > function double would be int -> int. If the programmer wanted to
400	> > define a double
401	> > > function over floating point numbers, she would have to explicitly
402	> > write the
403	> > > type of the function in its definition and give the function a name
404	> > distinct
405	> > > from the double function on integers. This approach is not particularly
406	> > > satisfying, because it violates a general principle of language
407	> > design: giving
408	> > > the compiler the ability to define features that programmers cannot.
409	>
410	> [2:text/html Show Save:noname (10kB)]
411	\end{comment}
412
413
414	\subsection{Operator Overloading}
415
416	Virtually all programming languages provide general overloading of the arithmetic operators across the basic computational types using the number and type of parameters and returns.
417	However, in some languages, arithmetic operators may not be first class, and hence, cannot be overloaded.
418	Like \CC, \CFA does allow general operator overloading for user-defined types.
419	The \CFA syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
420	Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type.
421	\begin{cfa}
422	struct S { int i, j };
423	S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; }
424	S s1, s2;
425	s1 = s1 @+@ s2; $\C[1.75in]{// infix call}$
426	s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$
427	\end{cfa}
428	The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
429	If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s).
430	Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type.
431	Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
432	This approach matches with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
433	Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast.
434
435
436	\subsection{Function Overloading}
437
438	Both \CFA and \CC allow general overloading for functions, as long as their prototypes differ in the number and type of parameters and returns.
439	\begin{cfa}
440	void f( void ); $\C[2in]{// (1): no parameter}$
441	void f( char ); $\C{// (2): overloaded on the number and parameter type}$
442	void f( int, int ); $\C{// (3): overloaded on the number and parameter type}$
443	f( 'A' ); $\C{// select (2)}\CRT$
444	\end{cfa}
445	The type system examines each call size and first looks for an exact match and then a best match using conversions.
446
447	Ada, Scala, and \CFA type-systems also use the return type in resolving a call, to pinpoint the best overloaded name.
448	Essentailly, the return types are \emph{reversed curried} into output parameters of the function.
449	For example, in many programming languages with overloading, the following functions are ambiguous without using the return type.
450	\begin{cfa}
451	int f( int ); $\C[2in]{// (1); overloaded on return type and parameter}$
452	double f( int ); $\C{// (2); overloaded on return type and parameter}$
453	int i = f( 3 ); $\C{// select (1)}$
454	double d = f( 3 ); $\C{// select (2)}\CRT$
455	\end{cfa}
456	Alternatively, if the type system uses the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
457	This capability can be taken to the extreme, where the only differentiating factor is the return type.
458	\begin{cfa}
459	int random( void ); $\C[2in]{// (1); overloaded on return type}$
460	double random( void ); $\C{// (2); overloaded on return type}$
461	int i = random(); $\C{// select (1)}$
462	double d = random(); $\C{// select (2)}\CRT$
463	\end{cfa}
464	Again, there is an exact match for each call.
465	As for operator overloading, if there is no exact match, a set of minimal implicit conversions can be added to find a best match.
466	\begin{cfa}
467	short int = random(); $\C[2in]{// select (1), unsafe}$
468	long double = random(); $\C{// select (2), safe}\CRT$
469	\end{cfa}
470
471
472	\subsection{Variable Overloading}
473
474	Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
475	Shadow overloading is also possible for functions, in languages supporting nested-function declarations, \eg \CC named, nested, lambda functions.
476	\begin{cfa}
477	void foo( double d );
478	int v; $\C[2in]{// (1)}$
479	double v; $\C{// (2) variable overloading}$
480	foo( v ); $\C{// select (2)}$
481	{
482	int v; $\C{// (3) shadow overloading}$
483	double v; $\C{// (4) and variable overloading}$
484	foo( v ); $\C{// select (4)}\CRT$
485	}
486	\end{cfa}
487	It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
488	However, variable overloading within a scope is often considered dangerous, without any evidence to corroborate this claim.
489	In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems.
490
491	In \CFA, the type system simply treats an overloaded variable as an overloaded function returning a value with no parameters.
492	Hence, no effort is required to support this feature as it is available for differentiating among overloaded functions with no parameters.
493	\begin{cfa}
494	int MAX = 2147483647; $\C[2in]{// (1); overloaded on return type}$
495	long int MAX = ...; $\C{// (2); overloaded on return type}$
496	double MAX = ...; $\C{// (3); overloaded on return type}$
497	int i = MAX; $\C{// select (1)}$
498	long int i = MAX; $\C{// select (2)}$
499	double d = MAX; $\C{// select (3)}\CRT$
500	\end{cfa}
501	Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc.
502	The result is a significant reduction in names to access common constants.
503
504
505	\subsection{Constant Overloading}
506
507	\CFA is unique in providing restricted constant overloading for the values @0@ and @1@, which have special status in C.
508	For example, the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
509	In addition, several operations are defined in terms of values @0@ and @1@.
510	For example, @if@ and iteration statements in C compare the condition with @0@, and the increment and decrement operators are semantically equivalent to adding or subtracting the value @1@.
511	\begin{cfa}
512	if ( x ) ++x; => if ( x @!= 0@ ) x @+= 1@;
513	for ( ; x; --x ) => for ( ; x @!= 0@; x @-= 1@ )
514	\end{cfa}
515	To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work within the special @0@ and @1@ contexts.
516	The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
517	\begin{cfa}
518	struct S { int i, j; };
519	void ?{}( S & s, zero_t ) { s.[i,j] = 0; } $\C{// constant constructors}$
520	void ?{}( S & s, one_t ) { s.[i,j] = 1; }
521	S ?=?( S & dst, zero_t ) { dst.[i,j] = 0; return dst; } $\C{// constant assignments}$
522	S ?=?( S & dst, one_t ) { dst.[i,j] = 1; return dst; }
523	S ?+=?( S & s, one_t ) { s.[i,j] += 1; return s; } $\C{// increment/decrement each field}$
524	S ?-=?( S & s, one_t ) { s.[i,j] -= 1; return s; }
525	int ?!=?( S s, zero_t ) { return s.i != 0 && s.j != 0; } $\C{// constant comparison}$
526	S s = @0@; $\C{// initialization}$
527	s = @0@; $\C{// assignments}$
528	s = @1@;
529	if ( @s@ ) @++s@; $\C{// unary ++/-\,- come implicitly from +=/-=}$
530	\end{cfa}
531	Here, type @S@ is first-class with respect to the basic types, working with all existing implicit C mechanisms.
532
533
534	\section{Type Inferencing}
535	\label{s:IntoTypeInferencing}
536
537	Every variable has a type, but association between them can occur in different ways:
538	at the point where the variable comes into existence (declaration) and/or on each assignment to the variable.
539	\begin{cfa}
540	double x; $\C{// type only}$
541	float y = 3.1D; $\C{// type and initialization}$
542	auto z = y; $\C{// initialization only}$
543	z = "abc"; $\C{// assignment}$
544	\end{cfa}
545	For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
546	For type-and-initialization, the specified and initialization types may not agree requiring an implicit/explicit conversion.
547	For initialization-only, the compiler may select the type by melding programmer and context information.
548	When the compiler participates in type selection, it is called \newterm{type inferencing}.
549	Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed variables and performs a (possibly lossy) value conversion from one type to the other.
550	Finally, for assignment, the current variable and expression types may not agree.
551	Discovering a variable or function type is complex and has limitations.
552	The following covers these issues, and why this scheme is not amenable with the \CFA type system.
553
554	One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
555	Here, the type resolver starts with the types of the program constants used for initialization and these constant types flow throughout the program, setting all variable and expression types.
556	\begin{cfa}
557	auto f() {
558	x = 1; y = 3.5; $\C{// set types from constants}$
559	x = // expression involving x, y and other local initialized variables
560	y = // expression involving x, y and other local initialized variables
561	return x, y;
562	}
563	auto w = f(); $\C{// typing flows outwards}$
564
565	void f( auto x, auto y ) {
566	x = // expression involving x, y and other local initialized variables
567	y = // expression involving x, y and other local initialized variables
568	}
569	s = 1; t = 3.5; $\C{// set types from constants}$
570	f( s, t ); $\C{// typing flows inwards}$
571	\end{cfa}
572	In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
573	Like template meta-programming, there can be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
574	Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
575	in this case, parametric polymorphism can allow separate compilation.
576	In languages with type inferencing, there is often limited overloading to reduce the search space, which introduces the naming problem.
577	Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.
578
579	In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages.
580	\begin{cquote}
581	\setlength{\tabcolsep}{10pt}
582	\begin{tabular}{@{}lll@{}}
583	\multicolumn{1}{c}{\textbf{gcc / \CFA}} & \multicolumn{1}{c}{\textbf{\CC}} \\
584	\begin{cfa}
585	#define expr 3.0 * i
586	typeof(expr) x = expr;
587	int y;
588	typeof(y) z = y;
589	\end{cfa}
590	&
591	\begin{cfa}
592
593	auto x = 3.0 * i;
594	int y;
595	auto z = y;
596	\end{cfa}
597	&
598	\begin{cfa}
599
600	// use type of initialization expression
601
602	// use type of initialization expression
603	\end{cfa}
604	\end{tabular}
605	\end{cquote}
606	The two important capabilities are:
607	\begin{itemize}[topsep=0pt]
608	\item
609	Not determining or writing long generic types, \eg, given deeply nested generic types.
610	\begin{cfa}
611	typedef T1(int).T2(float).T3(char).T @ST@; $\C{// \CFA nested type declaration}$
612	@ST@ x, y, x;
613	\end{cfa}
614	This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
615	\item
616	Ensuring the type of secondary variables, match a primary variable.
617	\begin{cfa}
618	int x; $\C{// primary variable}$
619	typeof(x) y, z, w; $\C{// secondary variables match x's type}$
620	\end{cfa}
621	If the type of @x@ changes, the type of the secondary variables correspondingly updates.
622	There can be strong software-engineering reasons for binding the types of these variables.
623	\end{itemize}
624	Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
625	\begin{cfa}
626	int x;
627	type(x) y = ... // complex expression
628	type(x) z = ... // complex expression
629	\end{cfa}
630	Here, the types of @y@ and @z@ are fixed (branded), whereas with type inferencing, the types of @y@ and @z@ are potentially unknown.
631
632
633	\subsection{Type-Inferencing Issues}
634
635	Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions, or confusions.
636
637	A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience.
638
639	A restriction is the conundrum in type inferencing of when to \emph{brand} a type.
640	That is, when is the type of the variable/function more important than the type of its initialization expression(s).
641	For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors.
642	At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes.
643	Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization.
644	\begin{cfa}
645	void f( @int@ x, @int@ y ) { // brand function prototype
646	x = // expression involving x, y and other local initialized variables
647	y = // expression involving x, y and other local initialized variables
648	}
649	s = 1; t = 3.5;
650	f( s, @t@ ); // type mismatch
651	\end{cfa}
652	In Haskell, it is common for programmers to brand (type) function parameters.
653
654	A confusion is blocks of code where all declarations are @auto@, as is now common in \CC.
655	As a result, understanding and changing the code becomes almost impossible.
656	Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
657	In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types for them.
658	For example, given:
659	\begin{cfa}
660	auto x = @...@
661	\end{cfa}
662	and the need to write a routine to compute using @x@
663	\begin{cfa}
664	void rtn( @type of x@ parm );
665	rtn( x );
666	\end{cfa}
667	A programmer must re-engineer the type of @x@'s initialization expression, reconstructing the possibly long generic type-name.
668	In this situation, having the type name or its short alias is essential.
669
670	\CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
671	Type inferencing defeats this goal because there is no left-hand type.
672	Fundamentally, type inferencing tries to magic away variable types from the programmer.
673	However, this results in lazy programming with the potential for poor performance and safety concerns.
674	Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think!
675	A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{
676	There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.}
677	The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
678	Understanding space and time issues is an essential part of the programming craft.
679	Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system.
680	Should a significant need arise, this decision can be revisited.
681
682
683	\section{Polymorphism}
684
685	\CFA provides polymorphic functions and types, where a polymorphic function can constrain types using assertions based on traits.
686
687
688	\subsection{Polymorphic Function}
689
690	The signature feature of the \CFA type-system is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96}, generalized using a @forall@ clause (giving the language its name).
691	\begin{cfa}
692	@forall( T )@ T identity( T val ) { return val; }
693	int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$
694	\end{cfa}
695	This @identity@ function can be applied to an \newterm{object type}, \ie a type with a known size and alignment, which is sufficient to stack allocate, default or copy initialize, assign, and delete.
696	The \CFA implementation passes the size and alignment for each type parameter, as well as auto-generated default and copy constructors, assignment operator, and destructor.
697	For an incomplete \newterm{data type}, \eg pointer/reference types, this information is not needed.
698	\begin{cfa}
699	forall( T * ) T * identity( T * val ) { return val; }
700	int i, * ip = identity( &i );
701	\end{cfa}
702	Unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
703
704	To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
705	Here, the function @twice@ works for any type @T@ with a matching addition operator.
706	\begin{cfa}
707	forall( T @\| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }
708	int val = twice( twice( 3 ) ); $\C{// val == 12}$
709	\end{cfa}
710	Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) C functions, like @qsort@ for sorting an array of unknown values.
711	\begin{cfa}
712	void qsort( void * base, size_t nmemb, size_t size, int (cmp)( const void , const void * ) );
713	\end{cfa}
714	Here, the polymorphism is type-erasure, and the parametric assertion is the comparison routine, which is explicitly passed.
715	\begin{cfa}
716	enum { N = 5 };
717	double val[N] = { 5.1, 4.1, 3.1, 2.1, 1.1 };
718	int cmp( const void * v1, const void * v2 ) { $\C{// compare two doubles}$
719	return (double )v1 < (double )v2 ? -1 : (double )v2 < (double )v1 ? 1 : 0;
720	}
721	qsort( val, N, sizeof( double ), cmp );
722	\end{cfa}
723	The equivalent type-safe version in \CFA is a wrapper over the C version.
724	\begin{cfa}
725	forall( ET \| { int @?<?@( ET, ET ); } ) $\C{// type must have < operator}$
726	void qsort( ET * vals, size_t dim ) {
727	int cmp( const void * t1, const void * t2 ) { $\C{// nested function}$
728	return (ET )t1 @<@ (ET )t2 ? -1 : (ET )t2 @<@ (ET )t1 ? 1 : 0;
729	}
730	qsort( vals, dim, sizeof(ET), cmp ); $\C{// call C version}$
731	}
732	qsort( val, N ); $\C{// deduct type double, and pass builtin < for double}$
733	\end{cfa}
734	The nested function @cmp@ is implicitly built and provides the interface from typed \CFA to untyped (@void *@) C.
735	Providing a hidden @cmp@ function in \CC is awkward as lambdas do not use C calling conventions and template declarations cannot appear in block scope.
736	% In addition, an alternate kind of return is made available: position versus pointer to found element.
737	% \CC's type system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a template @bsearch@.
738	Call-site inferencing and nested functions provide a localized form of inheritance.
739	For example, the \CFA @qsort@ can be made to sort in descending order by locally changing the behaviour of @<@.
740	\begin{cfa}
741	{
742	int ?<?( double x, double y ) { return x @>@ y; } $\C{// locally override behaviour}$
743	qsort( vals, 10 ); $\C{// descending sort}$
744	}
745	\end{cfa}
746	The local version of @?<?@ overrides the built-in @?<?@ so it is passed to @qsort@.
747	The local version performs @?>?@, making @qsort@ sort in descending order.
748	Hence, any number of assertion functions can be overridden locally to maximize the reuse of existing functions and types, without the construction of a named inheritance hierarchy.
749	A final example is a type-safe wrapper for C @malloc@, where the return type supplies the type/size of the allocation, which is impossible in most type systems.
750	\begin{cfa}
751	static inline forall( T & \| sized(T) )
752	T * malloc( void ) {
753	if ( _Alignof(T) <= __BIGGEST_ALIGNMENT__ ) return (T *)malloc( sizeof(T) ); // C allocation
754	else return (T *)memalign( _Alignof(T), sizeof(T) );
755	}
756	// select type and size from left-hand side
757	int * ip = malloc(); double * dp = malloc(); [[aligned(64)]] struct S {...} * sp = malloc();
758	\end{cfa}
759	The @sized@ assertion passes size and alignment as a data object has no implicit assertions.
760	Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@.
761	In practise, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@.
762
763	This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
764	Here, existing C legacy code is leveraged as much as possible;
765	other programming languages must build supporting libraries from scratch, even in \CC.
766
767
768	\subsection{Traits}
769
770	\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
771	\begin{cquote}
772	\begin{tabular}{@{}l\|@{\hspace{10pt}}l@{}}
773	\begin{cfa}
774	trait @sumable@( T ) {
775	void @?{}@( T &, zero_t ); // 0 literal constructor
776	T ?+?( T, T ); // assortment of additions
777	T @?+=?@( T &, T );
778	T ++?( T & );
779	T ?++( T & );
780	};
781	\end{cfa}
782	&
783	\begin{cfa}
784	forall( T @\| sumable( T )@ ) // use trait
785	T sum( T a[$\,$], size_t size ) {
786	@T@ total = { @0@ }; // initialize by 0 constructor
787	for ( size_t i = 0; i < size; i += 1 )
788	total @+=@ a[i]; // select appropriate +
789	return total;
790	}
791	\end{cfa}
792	\end{tabular}
793	\end{cquote}
794	Traits are implemented by flatten them at use points, as if written in full by the programmer.
795	Flattening often results in overlapping assertions, \eg operator @+@.
796	Hence, trait names play no part in type equivalence.
797	In the example, type @T@ is an object type, and hence, has the implicit internal trait @otype@.
798	\begin{cfa}
799	trait otype( T & \| sized(T) ) {
800	void ?{}( T & ); $\C{// default constructor}$
801	void ?{}( T &, T ); $\C{// copy constructor}$
802	void ?=?( T &, T ); $\C{// assignment operator}$
803	void ^?{}( T & ); $\C{// destructor}$
804	};
805	\end{cfa}
806	These implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
807
808	If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting.
809	This case is generalized in polymorphic container-types, such as a list with @insert@ and @remove@ operations, and an element type with copy and assignment.
810
811
812	\subsection{Generic Types}
813
814	A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
815	Broadly speaking, there are three approaches to implement abstract data structures in C.
816	\begin{enumerate}[leftmargin=*]
817	\item
818	Write bespoke data structures for each context.
819	While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures.
820	\item
821	Use @void *@-based polymorphism, \eg the C standard library functions @bsearch@ and @qsort@, which allow for the reuse of code with common functionality.
822	However, this approach eliminates the type checker's ability to ensure argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary.
823	\item
824	Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret.
825	Furthermore, writing and using complex preprocessor macros is difficult and inflexible.
826	\end{enumerate}
827
828	\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
829	\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
830	For concrete parameters, the generic-type definition can be inlined, like \CC templates, if its definition appears in a header file (\eg @static inline@).
831
832	A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
833	\begin{cquote}
834	\begin{tabular}{@{}l\|@{\hspace{10pt}}l@{}}
835	\begin{cfa}
836	@forall( F, S )@ struct pair {
837	F first; S second;
838	};
839	@forall( F, S )@ // object
840	S second( pair( F, S ) p ) { return p.second; }
841	@forall( F , S )@ // sized
842	S * second( pair( F , S ) p ) { return p.second; }
843	\end{cfa}
844	&
845	\begin{cfa}
846	pair( double, int ) dpr = { 3.5, 42 };
847	int i = second( dpr );
848	pair( void , int ) vipr = { 0p, &i };
849	int * ip = second( vipr );
850	double d = 1.0;
851	pair( int , double ) idpr = { &i, &d };
852	double * dp = second( idpr );
853	\end{cfa}
854	\end{tabular}
855	\end{cquote}
856	\CFA generic types are \newterm{fixed} or \newterm{dynamic} sized.
857	Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters.
858	For example, the type variable @T @ is fixed size and is represented by @void @ in code generation;
859	whereas, the type variable @T@ is dynamic and set at the point of instantiation.
860	The difference between fixed and dynamic is the complexity and cost of field access.
861	For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
862	For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
863	See~\cite[\S~3.2]{Moss19} for complete implementation details.
864
865	Currently, \CFA generic types allow assertion.
866	For example, the following declaration of a sorted set-type ensures the set key supports equality and relational comparison.
867	\begin{cfa}
868	forall( Elem, @Key@ \| { _Bool ?==?( Key, Key ); _Bool ?<?( Key, Key ); } )
869	struct Sorted_Set { Elem elem; @Key@ key; ... };
870	\end{cfa}
871	However, the operations that insert/remove elements from the set should not appear as part of the generic-types assertions.
872	\begin{cfa}
873	forall( @Elem@ \| /* any assertions on element type */ ) {
874	void insert( Sorted_Set set, @Elem@ elem ) { ... }
875	bool remove( Sorted_Set set, @Elem@ elem ) { ... } // false => element not present
876	... // more set operations
877	} // distribution
878	\end{cfa}
879	(Note, the @forall@ clause can be distributed across multiple functions.)
880	For software-engineering reasons, the set assertions would be refactored into a trait to allow alternative implementations, like a Java \lstinline[language=java]{interface}.
881
882	In summation, the \CFA type system inherits \newterm{nominal typing} for concrete types from C, and adds \newterm{structural typing} for polymorphic types.
883	Traits are used like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance relationships.
884	Instead, each polymorphic function or generic type defines the structural type needed for its execution, which is fulfilled at each call site from the lexical environment, like Go~\cite{Go} or Rust~\cite{Rust} interfaces.
885	Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal inheritance hierarchy.
886
887
888	\section{Language Comparison}
889
890	\begin{table}
891	\caption{Overload Features in Programming Languages}
892	\label{t:OverloadingFeatures}
893	\centering
894	\begin{minipage}{\linewidth}
895	\setlength{\tabcolsep}{5pt}
896	\begin{tabular}{@{}r\|cccccccc@{}}
897	Feature\,{\textbackslash}\,Language & Ada & \CC & \CFA & Java & Scala & Swift & Rust & Haskell \\
898	\hline
899	general\footnote{overloadable entities: V $\Rightarrow$ variable, O $\Rightarrow$ operator, F $\Rightarrow$ function, M $\Rightarrow$ member}
900	& O\footnote{except assignment}/F & O/F/M & V/O/F & M\footnote{not universal} & O/M & O/F/M & no & no \\
901	general constraints\footnote{parameter \# $\Rightarrow$ number, T $\Rightarrow$ type, N $\Rightarrow$ name; R $\Rightarrow$ return type}
902	& \#/T/R & \#/T & \#/T/R & \#/T & \#/T/N/R & \#/T/N/R & \#/T/N & T/R \\
903	type parameters & no & yes & yes & yes & yes & yes & yes & yes \\
904	type constraint\footnote{T $\Rightarrow$ trait/concept, B $\Rightarrow$ type bounds, TC $\Rightarrow$ type class}
905	& no & T & T & T & B & T & T & TC \\
906	Safe/Unsafe arg. conv. & no & S/U\footnote{no conversions allowed during template parameter deduction} & S/U
907	& S\footnote{unsafe (narrowing) conversion only allowed in assignment or initialization to a primitive (builtin) type} & S
908	& no\footnote{literals only, Int $\rightarrow$ Double (Safe)} & no & no
909	\end{tabular}
910	\end{minipage}
911	% https://doc.rust-lang.org/rust-by-example/trait/impl_trait.html
912	% https://dl.acm.org/doi/10.1145/75277.75283
913	\end{table}
914
915	Because \CFA's type system is focused on overloading and overload resolution, \VRef[Table]{t:OverloadingFeatures} compares \CFA with a representative set of popular programming languages and their take on overloading.
916	The first row classifies whether there is general overloading, and what entities may be overloaded.
917	\begin{cquote}
918	\setlength{\tabcolsep}{10pt}
919	\begin{tabular}{@{}llll@{}}
920	\multicolumn{1}{c}{\textbf{variable}} & \multicolumn{1}{c}{\textbf{operator}} & \multicolumn{1}{c}{\textbf{function}} & \multicolumn{1}{c}{\textbf{member}} \\
921	\begin{cfa}
922	int foo;
923	double foo;
924
925
926	\end{cfa}
927	&
928	\begin{cfa}
929	int ?+?( int );
930	double ?+?( double );
931
932
933	\end{cfa}
934	&
935	\begin{cfa}
936	int foo( int );
937	double foo( int, int );
938
939
940	\end{cfa}
941	&
942	\begin{c++}
943	class C {
944	int foo( int );
945	double foo( int, int );
946	};
947	\end{c++}
948	\end{tabular}
949	\end{cquote}
950	The second row classifies the specialization mechanisms used to distinguish among the general overload capabilities.
951	\begin{cquote}
952	\begin{tabular}{@{}llll@{}}
953	\multicolumn{1}{c}{\textbf{number}} & \multicolumn{1}{c}{\textbf{type}} & \multicolumn{1}{c}{\textbf{name}} & \multicolumn{1}{c}{\textbf{return}} \\
954	\begin{lstlisting}[language=swift]
955	func foo( _ x : Int )
956	func foo( _ x : Int, _ y : Int )
957	foo( 3 )
958	foo( 3, 3 )
959	\end{lstlisting}
960	&
961	\begin{lstlisting}[language=swift]
962	func foo( _ x : Int )
963	func foo( _ x : Double )
964	foo( 3 )
965	foo( 3.5 )
966	\end{lstlisting}
967	&
968	\begin{lstlisting}[language=swift]
969	func foo( x : Int )
970	func foo( y : Int )
971	foo( x : 3 )
972	foo( y : 3 );
973	\end{lstlisting}
974	&
975	\begin{lstlisting}[language=swift]
976	func foo() -> Int
977	func foo() -> String
978	var i : Int = foo()
979	var s : String = foo();
980	\end{lstlisting}
981	\end{tabular}
982	\end{cquote}
983	The third row classifies if generic functions can be overloaded based on the number and differing type variables.
984	\begin{cfa}
985	forall( T ) T foo( T t );
986	forall( T ) T foo( T t, T s );
987	forall( T, U ) T foo( T t, U s );
988	\end{cfa}
989	The fourth row classifies the mechanism used to specialize the type variables to make them practical.
990	\begin{cfa}
991	forall( T \| T ?+?( T, T ) ) T foo( T t );
992	\end{cfa}
993	The fifth row classifies if conversions are attempted beyond exact match.
994	\begin{cfa}
995	int foo( double ); // 1
996	double foo( int ); // 2
997	int i = foo( 3 ); // 1 : 3 => 3.0
998	double d = foo( 3.5 ); // 1 : int result => double
999	\end{cfa}
1000
1001
1002	\section{Contributions}
1003
1004	The \CFA compiler performance and type capability have been greatly improved through my development work.
1005	\begin{enumerate}
1006	\item
1007	The compilation time of various \CFA library units and test programs has been reduced by an order of magnitude, from minutes to seconds \see{\VRef[Table]{t:SelectedFileByCompilerBuild}}, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features.
1008	The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis.
1009	\item
1010	The thesis presents a systematic review of the new features added to the \CFA language and its type system.
1011	Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance.
1012	Several issues coming from the interactions of various language features are identified and discussed in this thesis;
1013	some of them I have resolved, while others are given temporary fixes and need to be reworked in the future.
1014	\item
1015	Finally, this thesis provides constructive ideas for fixing a number of high-level issues in the \CFA language design and implementation, and gives a path for future improvements to the language and compiler.
1016	\end{enumerate}
1017
1018
1019	\begin{comment}
1020	From: Andrew James Beach <ajbeach@uwaterloo.ca>
1021	To: Peter Buhr <pabuhr@uwaterloo.ca>, Michael Leslie Brooks <mlbrooks@uwaterloo.ca>,
1022	Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>
1023	Subject: Re: Haskell
1024	Date: Fri, 30 Aug 2024 16:09:06 +0000
1025
1026	Do you mean:
1027
1028	one = 1
1029
1030	And then write a bunch of code that assumes it is an Int or Integer (which are roughly int and Int in Cforall) and then replace it with:
1031
1032	one = 1.0
1033
1034	And have that crash? That is actually enough, for some reason Haskell is happy to narrow the type of the first literal (Num a => a) down to Integer but will not do the same for (Fractional a => a) and Rational (which is roughly Integer for real numbers). Possibly a compatibility thing since before Haskell had polymorphic literals.
1035
1036	Now, writing even the first version will fire a -Wmissing-signatures warning, because it does appear to be understood that just from a documentation perspective, people want to know what types are being used. Now, if you have the original case and start updating the signatures (adding one :: Fractional a => a), you can eventually get into issues, for example:
1037
1038	import Data.Array (Array, Ix, (!))
1039	atOne :: (Ix a, Frational a) => Array a b -> b - - In CFA: forall(a \| Ix(a) \| Frational(a), b) b atOne(Array(a, b) const & array)
1040	atOne = (! one)
1041
1042	Which compiles and is fine except for the slightly awkward fact that I don't know of any types that are both Ix and Fractional types. So you might never be able to find a way to actually use that function. If that is good enough you can reduce that to three lines and use it.
1043
1044	Something that just occurred to me, after I did the above examples, is: Are there any classic examples in literature I could adapt to Haskell?
1045
1046	Andrew
1047
1048	PS, I think it is too obvious of a significant change to work as a good example but I did mock up the structure of what I am thinking you are thinking about with a function. If this helps here it is.
1049
1050	doubleInt :: Int -> Int
1051	doubleInt x = x * 2
1052
1053	doubleStr :: String -> String
1054	doubleStr x = x ++ x
1055
1056	-- Missing Signature
1057	action = doubleInt - replace with doubleStr
1058
1059	main :: IO ()
1060	main = print $ action 4
1061	\end{comment}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: