source: doc/theses/fangren_yu_MMath/intro.tex@ 2572add

Last change on this file since 2572add was 2572add, checked in by Peter A. Buhr <pabuhr@…>, 5 months ago

more proofreading

  • Property mode set to 100644
File size: 62.9 KB
Line 
1\chapter{Introduction}
2
3This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the \newterm{type-resolver} used to satisfy call-site assertions among overloaded variable and function names to allow polymorphic function calls.
4Assertions are the operations a function uses within its body to perform its computation.
5For example, a polymorphic function summing an array needs a size, zero, assignment, and plus for the array element-type, and a subscript operation for the array type.
6\begin{cfa}
7T sum( T a[$\,$], size_t size ) {
8 @T@ total = { @0@ }; $\C[1.75in]{// size, 0 for type T}$
9 for ( i; size )
10 total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$
11 return total;
12}
13\end{cfa}
14In certain cases, if the resolver fails to find an exact assertion match, it attempts to find a \emph{best} match using reasonable type conversions.
15Hence, \CFA follows the current trend of replacing nominal inheritance with traits composed of assertions for type matching.
16The over-arching goal in \CFA is to push the boundary on localized assertion matching, with advanced overloading resolution and type conversions that match programmer expectations in the C programming language.
17Together, the resulting type-system has a number of unique features making it different from other programming languages with expressive, static, type-systems.
18
19
20\section{Types}
21
22All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
23A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
24These language types are thrust upon programmers with their syntactic/semantic rules and restrictions.
25These rules are then used to transform a language expression to a unique hardware expression.
26Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
27Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
28Expressibility, generalization, and safety are all bound up in a language's type system, and hence, directly affect the capability, build time, and correctness of program development.
29
30
31\section{Overloading}
32
33\begin{quote}
34There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton
35\end{quote}
36Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
37Experience from \CC and \CFA developers shows the type system can implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
38In many cases, a programmer is unaware of name clashes, as they are silently resolved, simplifying the development process.
39
40Disambiguating among overloads is implemented by examining each call site and selecting the best matching overloaded function based on criteria like the types and number of arguments and the return context.
41Since the hardware does not support mixed-mode operands, @2 + 3.5@, the type system must disallow it or (safely) convert the operands to a common type.
42Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
43This approach matches with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
44Depending on the language, mix-mode conversions can be explicitly controlled using a cast.
45
46\newterm{Namespace pollution} refers to loading the global or other namespaces with many names, resulting in paranoia that the compiler could make wrong choices for overloaded names causing failure.
47This fear leads to coding styles where names are partitioned by language mechanisms and qualification is used to make names unique.
48This approach defeats the purpose of overloading and places an additional coding burden on both the developer and user.
49As well, many namespace systems provide a mechanism to open their scope returning to normal overloading, \ie no qualification.
50While namespace mechanisms are very important and provide a number of crucial program-development features, protection from overloading is overstated.
51Similarly, lexical nesting is another place where overloading occurs.
52For example, in object-oriented programming, class member names \newterm{shadow} names within members.
53Some programmers, qualify all member names with @class::@ or @this->@ to make them unique from names defined in members.
54Even nested lexical blocks result in shadowing, \eg multiple nested loop-indices called @i@.
55Again, coding styles exist requiring all variables in nested block to be unique to prevent name shadowing.
56Depending on the language, these possible ambiguities can be reported (as warnings or errors) and resolved explicitly using some form of qualification and/or cast.
57
58Formally, overloading is defined by Strachey as \newterm{ad hoc polymorphism}:
59\begin{quote}
60In ad hoc polymorphism there is no single systematic way of determining the type of the result from the type of the arguments.
61There may be several rules of limited extent which reduce the number of cases, but these are themselves ad hoc both in scope and content.
62All the ordinary arithmetic operators and functions come into this category.
63It seems, moreover, that the automatic insertion of transfer functions by the compiling system is limited to this.~\cite[p.~37]{Strachey00}
64\end{quote}
65where a \newterm{transfer function} is an implicit conversion to help find a matching overload:
66\begin{quote}
67The problem of dealing with polymorphic operators is complicated by the fact that the range of types sometimes overlap.
68Thus for example 3 may be an integer or a real and it may be necessary to change it from one type to the other.
69The functions which perform this operation are known as transfer functions and may either be used explicitly by the programmer, or, in some systems, inserted automatically by the compiling system.~\cite[p.~35]{Strachey00}
70\end{quote}
71The differentiating characteristic between parametric polymorphism and overloading is often stated as: polymorphic functions use one algorithm to operate on arguments of many different types, whereas overloaded functions use a different algorithm for each type of argument.
72A similar differentiation is applicable for overloading and default parameters.
73\begin{cquote}
74\setlength{\tabcolsep}{10pt}
75\begin{tabular}{@{}lll@{}}
76\multicolumn{1}{c}{\textbf{different implementations}} & \multicolumn{1}{c}{\textbf{same implementation}} \\
77\begin{cfa}
78void foo( int );
79void foo( int, int );
80\end{cfa}
81&
82\begin{cfa}
83void foo( int, int = 5 ); // default value
84
85\end{cfa}
86\end{tabular}
87\end{cquote}
88However, this distinguishing characteristic is vague.
89For example, should the operation @abs@ be overloaded or polymorphic or both?
90\begin{cquote}
91\setlength{\tabcolsep}{10pt}
92\begin{tabular}{@{}lll@{}}
93\multicolumn{1}{c}{\textbf{overloading}} & \multicolumn{1}{c}{\textbf{polymorphic}} \\
94\begin{cfa}
95int abs( int );
96double abs( double );
97\end{cfa}
98&
99\begin{cfa}
100forall( T | { void ?{}( T &, zero_t ); int ?<?( T, T ); T -?( T ); } )
101T abs( T );
102\end{cfa}
103\end{tabular}
104\end{cquote}
105Here, there are performance advantages for having specializations and code-reuse advantages for the generalization.
106
107The Strachey definitions raise several questions.
108\begin{enumerate}[leftmargin=*]
109\item
110Is overloading polymorphism?
111
112\noindent
113In type theory, polymorphism allows an overloaded type name to represent multiple different types.
114For example, generic types overload the type name for a container type.
115\begin{cfa}
116@List@<int> li; @List@<double> ld; @List@<struct S> ls;
117\end{cfa}
118For subtyping, a derived type masquerades as a base type, where the base and derived names cannot be overloaded.
119Instead, the mechanism relies on structural typing among the types.
120In both cases, the polymorphic mechanisms apply in the type domain and the names are in the type namespace.
121In the following C example:
122\begin{cfa}
123struct S { int S; };
124struct S s;
125void S( struct S S ) { S.S = 1; };
126enum E { E };
127enum E e = E;
128\end{cfa}
129the overloaded names @S@ and @E@ are separated into the type and object domain, and C uses the type kinds @struct@ and @enum@ to disambiguate the names.
130In general, types are not overloaded because inferencing them is difficult to imagine in a statically programming language.
131\begin{cquote}
132\setlength{\tabcolsep}{26pt}
133\begin{tabular}{@{}ll@{}}
134\begin{cfa}
135struct S { int i; }; // 1
136struct S { int i, j }; // 2
137union S { int i; double d; }; // 3
138typedef int S; // 4
139\end{cfa}
140&
141\begin{cfa}
142S s;
143if ( ... ) s = (S){ 3 }; // 1
144else s = 3; // 4
145
146\end{cfa}
147\end{tabular}
148\end{cquote}
149On the other hand, in ad-hoc overloading of variables and/or functions, the names are in the object domain and each overloaded object name has an anonymous associated type.
150\begin{cfa}
151@double@ foo( @int@ ); @char@ foo( @void@ ); @int@ foo( @double, double@ );
152@double@ foo; @char@ foo; @int@ foo;
153\end{cfa}
154Here, the associated type cannot be extracted using @typeof@/\lstinline[language={[11]C++}]{decltype} for typing purposes, as @typeof( foo )@ is always ambiguous.
155To disambiguate the type of @foo@ requires additional information from the call-site or a cast.
156\begin{cfa}
157double d = foo( 3 ); char c = foo(); int i = foo( 3.5, 4.1 ); $\C{// exact match}$
158d = foo; c = foo; i = foo; $\C{// exact match}$
159i = ((double (*)( int ))foo)( 3.5 ); $\C{// no exact match, explicit cast => conversions}$
160\end{cfa}
161Hence, depending on your perspective, overloading may not be polymorphism, as no single overloaded entity represents multiple types.
162
163\item
164Does ad-hoc polymorphism have a single systematic way of determining the type of the result from the type of the arguments?
165
166\noindent
167For exact type matches in overloading, there is a systematic way of matching arguments to parameters, and a function denotes its return type rather than using type inferencing.
168This matching is just as robust as other polymorphic analysis.
169The ad-hoc aspect might be the implicit transfer functions (conversions) applied to arguments to create an exact parameter type-match, as there may be multiple conversions leading to different exact matches.
170Note, conversion issues apply to both non-overloaded and overloaded functions.
171Here, the selection of the conversion functions can be considered the \emph{opinion} of the language (type system), even if the technique used is based on sound rules, like maximizing conversion accuracy (non-lossy).
172The difference in opinion results when the language conversion rules differ from a programmer's expectations.
173However, without implicit conversions, programmers may have to write an exponential number of functions covering all possible exact-match cases among all reasonable types.
174\CFA's \emph{opinion} on conversions must match C's and then rely on programmers to understand the effects.
175That is, let the compiler do the heavy-lifting of selecting a \emph{best} set of conversions that maximize safety.
176Hence, removing implicit conversions from \CFA is not an option (as is true in most other programming languages), so it must do the best possible job to get it right.
177
178\item
179Why are there two forms of \emph{overloading} (general and parametric) in different programming languages?
180
181\noindent
182\newterm{General overloading} occurs when the type-system \emph{knows} a function's parameters and return types (or a variable's type for variable overloading).
183In functional programming-languages, there is always a return type (except for a monad).
184If a return type is specified, the compiler does not have to inference the function body.
185For example, the compiler has complete knowledge about builtin types and their overloaded arithmetic operators.
186In this context, there is a fixed set of overloads for a given name that are completely specified.
187Overload resolution then involves finding an exact match between a call and the overload prototypes based on argument type(s) and possibly return context.
188If an \emph{exact} match is not found, the call is either ill formed (ambiguous) or further attempts are made to find a \emph{best} match using transfer functions (conversions).
189As a consequence, no additional runtime information is needed per call, \ie the call is a direct transfer (branch) with possibly converted arguments.
190
191\noindent
192\newterm{Parametric overloading} occurs when the type-system does not know a function's parameter and return types.
193Types are inferred from constant types and/or constraint information that includes typing.
194Parametric overloading starts with a universal/generic type used to define parameters and returns.
195\begin{cfa}
196forall( T ) T identity( T t ) { return t; }
197int i = identify( 3 );
198double d;
199double e = identity( d );
200\end{cfa}
201Here, the type system is substituting the argument type for @T@, and ensuring the return type also matches.
202This mechanism can be extended to overloaded parametric functions with multiple type parameters.
203\begin{cfa}
204forall( T ) [T, T] identity( T t1, T t2 ) { return [t1, t2 ]; } // return tuple
205int j;
206[i, j] = identity( 3, 4 );
207forall( T, U ) [T, U] identity( T t1, U t2 ) { return [t1, t2 ]; }
208[i, d] = identity( i, d );
209\end{cfa}
210Here, the type system is using techniques from general overloading, number and argument types, to disambiguate the overloaded parametric functions.
211However, this exercise is moot because the types are so universal they have no practical operations within the function, modulo any general operations carried by every type, like copy or print.
212This kind of polymorphism is restricted to moving type instances as abstract entities;
213if type erasure is used, there is no way to recover the original type to perform useful actions on its value(s).
214Hence, parametric overloading requires additional information about the universal types to make them useful.
215
216This additional information often comes as a set of operations a type must supply (@trait@/-@concept@) and these operations can then be used in the body of the function.
217\begin{cfa}
218forall( T | T ?@++@( T, T ) ) T inc( T t ) { return t@++@; }
219int i = 3
220i = inc( i )
221double d = inc( 3.5 );
222\end{cfa}
223Given a qualifying trait, are its elements inferred or declared?
224In the above example, the type system infers @int@ for @T@, infers it needs a @++@ operator that takes an @int@ and returns an @int@, and finds this function in the enclosing environment (\eg standard prelude).
225This implicit inferencing is expensive if matched with implicit conversions when there is no exact match.
226Alternatively, types opt-in to traits via declarations.
227\begin{cfa}
228trait postinc { T ?@++@( T, T ) }
229type int provide postinc;
230\end{cfa}
231In this approach, after inferring @int@ for @T@, @int@ is examined for any necessary traits required in the function, so there is no searching.
232Once all traits are satisfied, the compiler can inline them, pass functions pointers directly, or pass a virtual table composing the trait.
233\end{enumerate}
234The following sections illustrate how overloading is provided in \CFA.
235
236\begin{comment}
237\begin{haskell}
238f( int, int ); f( int, float ); -- return types to be determined
239g( int, int ); g( float, int );
240let x = curry f( 3, _ ); -- which f
241let y = curry g( _ , 3 ); -- which g
242\end{haskell}
243For the currying to succeed, there cannot be overloaded function names resulting in ambiguities.
244To allow currying to succeed requires an implicit disambiguating mechanism, \ie a kind of transfer function.
245A type class~\cite{typeclass} is a mechanism to convert overloading into parametric polymorphism.
246Parametric polymorphism has enough information to disambiguate the overloaded names because it removes the type inferencing.
247\begin{cfa}
248forall( T | T has + $and$ - ) T f$\(_1\)$( T );
249forall( T | T has * $and$ - ) T f$\(_2\)$( T );
250x = f$\(_1\)$( x ); // if x has + and - but not *
251y = f$\(_2\)$( y ); // if y has * and - but not +
252\end{cfa}
253Here, the types of @x@ and @y@ are combined in the type-class contraints to provide secondary infomration for disambiguation.
254This approach handles many overloading cases because the contraints overlap completely or are disjoint
255
256A type class (trait) generically abstracts the set of the operations used in a function's implementation.
257A type-class instance binds a specific type to the generic operations to form concrete instances, giving a name type-class.
258Then Qualified types concisely express the operations required to convert an overloaded
259The name type-class is used as a transfer function to convert an overloaded function into a polymorphic function that is uniquely qualified with the name type-class.
260\begin{cfa}
261void foo_int_trait( special int trait for operations in this foo );
262void foo_int_int_trait( special (int, int) trait for operations in this foo );
263\end{cfa}
264
265In this case, the compiler implicitly changes the overloaded function to a parametrically polymorphic one.
266Hence, the programmer does specify any additional information for the overloading to work.
267Explicit overloading occurs when the compiler has to be told what operations are associated with a type programmer separately defines the associate type and subsequently associates the type with overloaded name.
268\end{comment}
269
270\begin{comment}
271Date: Mon, 24 Feb 2025 11:26:12 -0500
272Subject: Re: overloading
273To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
274CC: <f37yu@uwaterloo.ca>, <ajbeach@uwaterloo.ca>, <mlbrooks@uwaterloo.ca>,
275 <alvin.zhang@uwaterloo.ca>, <lseo@plg.uwaterloo.ca>,
276 <j82liang@uwaterloo.ca>
277From: Gregor Richards <gregor.richards@uwaterloo.ca>
278
279Yes.
280
281With valediction,
282 - Gregor Richards
283
284On 2/24/25 11:22, Peter A. Buhr wrote:
285> Gregor Richards <gregor.richards@uwaterloo.ca> writes:
286> In Haskell, `+` works for both because of typeclasses (inclusion
287> polymorphism), and so is also not an unresolved type.
288>
289> I'm making this up. The Haskell type-class is a trait, like an interface or
290> abstract class, and its usage/declaration/binding creates a specific trait
291> instance for bound types, which is a vtable filled with the typed functions
292> instantiated/located for the trait. The vtables are present at runtime and
293> passed implicitly to ad-hoc polymorphic functions allowing differentiate of
294> overloaded functions based on the number of traits and their specialization.
295> (Major geek talk, YA! 8-)
296>
297> On 2/21/25 23:04, Fangren Yu wrote:
298> > In a statically typed language I would rather have definitions like
299> > double x = x+x be ambiguous than "an unresolved type" as the latter
300> > sounds like a weaker version of a generic type, and being able to make
301> > something generic without explicitly saying so is probably not a good
302> > idea. Giving the unspecified parameter type an arbitrary preference is
303> > the second best option IMO (does ML give you a warning on such not
304> > fully inferred types?)
305> > ------------------------------------------------------------------------
306> > *From:* Gregor Richards <gregor.richards@uwaterloo.ca>
307> > *Sent:* Wednesday, February 19, 2025 9:55:23 PM
308> > *To:* Peter Buhr <pabuhr@uwaterloo.ca>
309> > *Cc:* Andrew James Beach <ajbeach@uwaterloo.ca>; Michael Leslie Brooks
310> > <mlbrooks@uwaterloo.ca>; Fangren Yu <f37yu@uwaterloo.ca>;
311> > j82liang@uwaterloo.ca <j82liang@uwaterloo.ca>; Alvin Zhang
312> > <alvin.zhang@uwaterloo.ca>; lseo@plg.uwaterloo.ca <lseo@plg.uwaterloo.ca>
313> > *Subject:* Re: overloading
314> > Jives with what I was saying, albeit not exactly the same; it's a result
315> > of the same problem.
316> >
317> > 'doubles' refers to an unresolved 'double', and the latter can't be
318> > resolved without the former, so you can't compile 'double' unless you
319> > know what its arguments are. The solutions are:
320> >
321> > * Typeclasses make it possible by compiling with a handle. When you
322> > call a function that takes a typeclass value as an argument, it takes an
323> > extra, hidden argument internally which is the typeclass handle. That
324> > handle tells the callee how to use the typeclass functions with this
325> > particular value. And, of course, you hope that some aggressive inlining
326> > gets rid of the dynamic dispatch :). But, no per se overloading
327> > supported, only inclusion polymorphism.
328> >
329> > * If you do whole-world compilation, then you just compile what you
330> > particularly need in context. If you call 'doubles' with a
331> > float,int,int, then you compile that version. But, currying is unsolved.
332> >
333> > * If you do C++-style templating, this is a less severe problem, as
334> > you compile it with the use of 'doubles', not with the definition. But,
335> > either no currying, or you have to specify the extra types explicitly so
336> > it knows what to curry, so no inference.
337> >
338> > * If you do Java-style generics, ... just kidding.
339> >
340> > In a language like Haskell or OCaml, if you want to compile this
341> > modularly and have the code with the implementation, then naively it
342> > would have to make eight implementations. But, even that is only true if
343> > (x, y, z) is a single tuple argument. Currying is still the killer. If
344> > you call `doubles 3`, the return is supposed to be some x -> y -> (int, x,
345> > y), where 'x' and 'y' are "some type on which I can call a 'double'
346> > function, but I don't know which double function yet because I don't
347> > know what type". Even *writing* that type is hard enough, but having
348> > values of that type float around at runtime? Yikes.
349> >
350> > To put it a different way: In C++ (and presumably CFA?), you can
351> > overload all you want to, but you can't get a function pointer to an
352> > unresolved overload. The function pointer is to a *particular* overload,
353> > not the set of possible overloads. Well, in a functional language,
354> > function pointers are the lifeblood of the language.
355> >
356> > With valediction,
357> > - Gregor Richards
358> >
359> > On 2/19/25 21:25, Peter A. Buhr wrote:
360> > > In the "Type Classes" chapter I sent out, the author says the
361> > following. Does it
362> > > jive with what you are saying about currying? BTW, I do not know who
363> > wrote the
364> > > book chapter.
365> > >
366> > >
367> > ==========================================================================
368> > >
369> > > Suppose we have a language that overloads addition + and
370> > multiplication *,
371> > > providing versions that work over values of type Int and type Float.
372> > Now,
373> > > consider the double function, written in terms of the overloaded
374> > addition
375> > > operation:
376> > >
377> > > double x = x + x
378> > >
379> > > What does this definition mean? A naive interpretation would be to
380> > say that
381> > > double is also overloaded, defining one function of type Int -> Int
382> > -> Int and a
383> > > second of type Float -> Float -> Float. All seems fine, until we
384> > consider the
385> > > function
386> > >
387> > > doubles (x,y,z) = (double x, double y, double z)
388> > >
389> > > Under the proposed scheme, this definition would give rise to eight
390> > different
391> > > versions! This approach has not been widely used because of the
392> > exponential
393> > > growth in the number of versions.
394> > >
395> > > To avoid this blow-up, language designers have sometimes restricted the
396> > > definition of overloaded functions. In this approach, which was
397> > adopted in
398> > > Standard ML, basic operations can be overloaded, but not functions
399> > defined in
400> > > terms of them. Instead, the language design specifies one of the
401> > possible
402> > > versions as the meaning of the function. For example, Standard ML give
403> > > preference to the type int over real, so the type (and
404> > implementation) of the
405> > > function double would be int -> int. If the programmer wanted to
406> > define a double
407> > > function over floating point numbers, she would have to explicitly
408> > write the
409> > > type of the function in its definition and give the function a name
410> > distinct
411> > > from the double function on integers. This approach is not particularly
412> > > satisfying, because it violates a general principle of language
413> > design: giving
414> > > the compiler the ability to define features that programmers cannot.
415>
416> [2:text/html Show Save:noname (10kB)]
417\end{comment}
418
419
420\subsection{Operator Overloading}
421
422Virtually all programming languages provide general overloading of the arithmetic operators across the basic computational types using the number and type of parameters and returns.
423However, in some languages, arithmetic operators may not be first class, and hence, cannot be overloaded.
424Like \CC, \CFA allows general operator overloading for user-defined types.
425The \CFA syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
426Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type.
427\begin{cfa}
428struct S { int i, j };
429S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; }
430S s1, s2;
431s1 = s1 @+@ s2; $\C[1.75in]{// infix call}$
432s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$
433\end{cfa}
434\CFA excludes overloading the comma operator, short-circuit logical operators \lstinline{&&} and \lstinline{||}, and ternary conditional operator \lstinline{?} (\eg @max = x > y ? x : y@), all of which are short-hand control structures rather than operators.
435\CC \emph{does} overload the comma and short-circuit logical operators, where overloading the comma operator is esoteric and rarely used, and the short-circuit operators do not exhibit short-circuit semantics, both of which are confusing and/or inconsistent.
436
437
438\subsection{Function Overloading}
439
440Both \CFA and \CC allow general overloading for functions, as long as their prototypes differ in the number and type of parameters and returns.
441\begin{cfa}
442void f( void ); $\C[2in]{// (1): no parameter}$
443void f( char ); $\C{// (2): overloaded on the number and parameter type}$
444void f( int, int ); $\C{// (3): overloaded on the number and parameter type}$
445f( 'A' ); $\C{// select (2)}\CRT$
446\end{cfa}
447The type system examines each call size and first looks for an exact match and then a best match using conversions.
448
449Ada, Scala, and \CFA type-systems also use the return type in resolving a call, to pinpoint the best overloaded name.
450Essentailly, the return types are \emph{reversed curried} into output parameters of the function.
451For example, in many programming languages with overloading, the following functions are ambiguous without using the return type.
452\begin{cfa}
453int f( int ); $\C[2in]{// (1); overloaded on return type and parameter}$
454double f( int ); $\C{// (2); overloaded on return type and parameter}$
455int i = f( 3 ); $\C{// select (1)}$
456double d = f( 3 ); $\C{// select (2)}\CRT$
457\end{cfa}
458Alternatively, if the type system uses the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
459This capability can be taken to the extreme, where the only differentiating factor is the return type.
460\begin{cfa}
461int random( void ); $\C[2in]{// (1); overloaded on return type}$
462double random( void ); $\C{// (2); overloaded on return type}$
463int i = random(); $\C{// select (1)}$
464double d = random(); $\C{// select (2)}\CRT$
465\end{cfa}
466Again, there is an exact match for each call.
467As for operator overloading, if there is no exact match, a set of minimal implicit conversions can be added to find a best match.
468\begin{cfa}
469short int = random(); $\C[2in]{// select (1), unsafe}$
470long double = random(); $\C{// select (2), safe}\CRT$
471\end{cfa}
472
473
474\subsection{Variable Overloading}
475
476Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
477Shadow overloading is also possible for functions, in languages supporting nested-function declarations, \eg \CC named, nested, lambda functions.
478\begin{cfa}
479void foo( double d );
480int v; $\C[2in]{// (1)}$
481double v; $\C{// (2) variable overloading}$
482foo( v ); $\C{// select (2)}$
483{
484 int v; $\C{// (3) shadow overloading}$
485 double v; $\C{// (4) and variable overloading}$
486 foo( v ); $\C{// select (4)}\CRT$
487}
488\end{cfa}
489It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
490However, variable overloading within a scope is often considered dangerous, without any evidence to corroborate this claim.
491In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems.
492
493In \CFA, the type system simply treats an overloaded variable as an overloaded function returning a value with no parameters.
494Hence, no effort is required to support this feature as it is available for differentiating among overloaded functions with no parameters.
495\begin{cfa}
496int MAX = 2147483647; $\C[2in]{// (1); overloaded on return type}$
497long int MAX = ...; $\C{// (2); overloaded on return type}$
498double MAX = ...; $\C{// (3); overloaded on return type}$
499int i = MAX; $\C{// select (1)}$
500long int i = MAX; $\C{// select (2)}$
501double d = MAX; $\C{// select (3)}\CRT$
502\end{cfa}
503Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc.
504The result is a significant reduction in names to access common constants.
505
506
507\subsection{Constant Overloading}
508
509\CFA is unique in providing restricted constant overloading for the values @0@ and @1@, which have special status in C.
510For example, the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
511In addition, several operations are defined in terms of values @0@ and @1@.
512For example, @if@ and iteration statements in C compare the condition with @0@, and the increment and decrement operators are semantically equivalent to adding or subtracting the value @1@.
513\begin{cfa}
514if ( x ) ++x; => if ( x @!= 0@ ) x @+= 1@;
515for ( ; x; --x ) => for ( ; x @!= 0@; x @-= 1@ )
516\end{cfa}
517To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work within the special @0@ and @1@ contexts.
518The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
519\begin{cfa}
520struct S { int i, j; };
521void ?{}( S & s, zero_t ) { s.[i,j] = 0; } $\C{// constant constructors}$
522void ?{}( S & s, one_t ) { s.[i,j] = 1; }
523S ?=?( S & dst, zero_t ) { dst.[i,j] = 0; return dst; } $\C{// constant assignments}$
524S ?=?( S & dst, one_t ) { dst.[i,j] = 1; return dst; }
525S ?+=?( S & s, one_t ) { s.[i,j] += 1; return s; } $\C{// increment/decrement each field}$
526S ?-=?( S & s, one_t ) { s.[i,j] -= 1; return s; }
527int ?!=?( S s, zero_t ) { return s.i != 0 && s.j != 0; } $\C{// constant comparison}$
528S s = @0@; $\C{// initialization}$
529s = @0@; $\C{// assignments}$
530s = @1@;
531if ( @s@ ) @++s@; $\C{// unary ++/-\,- come implicitly from +=/-=}$
532\end{cfa}
533Here, type @S@ is first-class with respect to the basic types, working with all existing implicit C mechanisms.
534
535
536\section{Type Inferencing}
537\label{s:IntoTypeInferencing}
538
539Every variable has a type, but association between them can occur in different ways:
540at the point where the variable comes into existence (declaration) and/or on each assignment to the variable.
541\begin{cfa}
542double x; $\C{// type only}$
543float y = 3.1D; $\C{// type and initialization}$
544auto z = y; $\C{// initialization only}$
545z = "abc"; $\C{// assignment}$
546\end{cfa}
547For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
548For type-and-initialization, the specified and initialization types may not agree requiring an implicit/explicit conversion.
549For initialization-only, the compiler may select the type by melding programmer and context information.
550When the compiler participates in type selection, it is called \newterm{type inferencing}.
551Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed variables and performs a (possibly lossy) value conversion from one type to the other.
552Finally, for assignment, the current variable and expression types may not agree.
553Discovering a variable or function type is complex and has limitations.
554The following covers these issues, and why this scheme is not amenable with the \CFA type system.
555
556One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
557Here, the type resolver starts with the types of the program constants used for initialization and these constant types flow throughout the program, setting all variable and expression types.
558\begin{cfa}
559auto f() {
560 x = 1; y = 3.5; $\C{// set types from constants}$
561 x = // expression involving x, y and other local initialized variables
562 y = // expression involving x, y and other local initialized variables
563 return x, y;
564}
565auto w = f(); $\C{// typing flows outwards}$
566
567void f( auto x, auto y ) {
568 x = // expression involving x, y and other local initialized variables
569 y = // expression involving x, y and other local initialized variables
570}
571s = 1; t = 3.5; $\C{// set types from constants}$
572f( s, t ); $\C{// typing flows inwards}$
573\end{cfa}
574In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
575Like template meta-programming, there can be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
576Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
577in this case, parametric polymorphism can allow separate compilation.
578In languages with type inferencing, there is often limited overloading to reduce the search space, which introduces the naming problem.
579Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.
580
581In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages.
582\begin{cquote}
583\setlength{\tabcolsep}{10pt}
584\begin{tabular}{@{}lll@{}}
585\multicolumn{1}{c}{\textbf{gcc / \CFA}} & \multicolumn{1}{c}{\textbf{\CC}} \\
586\begin{cfa}
587#define expr 3.0 * i
588typeof(expr) x = expr;
589int y;
590typeof(y) z = y;
591\end{cfa}
592&
593\begin{cfa}
594
595auto x = 3.0 * i;
596int y;
597auto z = y;
598\end{cfa}
599&
600\begin{cfa}
601
602// use type of initialization expression
603
604// use type of initialization expression
605\end{cfa}
606\end{tabular}
607\end{cquote}
608The two important capabilities are:
609\begin{itemize}[topsep=0pt]
610\item
611Not determining or writing long generic types, \eg, given deeply nested generic types.
612\begin{cfa}
613typedef T1(int).T2(float).T3(char).T @ST@; $\C{// \CFA nested type declaration}$
614@ST@ x, y, x;
615\end{cfa}
616This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
617\item
618Ensuring the type of secondary variables, match a primary variable.
619\begin{cfa}
620int x; $\C{// primary variable}$
621typeof(x) y, z, w; $\C{// secondary variables match x's type}$
622\end{cfa}
623If the type of @x@ changes, the type of the secondary variables correspondingly updates.
624There can be strong software-engineering reasons for binding the types of these variables.
625\end{itemize}
626Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
627\begin{cfa}
628int x;
629type(x) y = ... // complex expression
630type(x) z = ... // complex expression
631\end{cfa}
632Here, the types of @y@ and @z@ are fixed (branded), whereas with type inferencing, the types of @y@ and @z@ are potentially unknown.
633
634
635\subsection{Type-Inferencing Issues}
636
637Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions, or confusions.
638
639A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience.
640
641A restriction is the conundrum in type inferencing of when to \emph{brand} a type.
642That is, when is the type of the variable/function more important than the type of its initialization expression(s).
643For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors.
644At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes.
645Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization.
646\begin{cfa}
647void f( @int@ x, @int@ y ) { // brand function prototype
648 x = // expression involving x, y and other local initialized variables
649 y = // expression involving x, y and other local initialized variables
650}
651s = 1; t = 3.5;
652f( s, @t@ ); // type mismatch
653\end{cfa}
654In Haskell, it is common for programmers to brand (type) function parameters.
655
656A confusion is blocks of code where all declarations are @auto@, as is now common in \CC.
657As a result, understanding and changing the code becomes almost impossible.
658Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
659In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types for them.
660For example, given:
661\begin{cfa}
662auto x = @...@
663\end{cfa}
664and the need to write a function to compute using @x@
665\begin{cfa}
666void rtn( @type of x@ parm );
667rtn( x );
668\end{cfa}
669A programmer must re-engineer the type of @x@'s initialization expression, reconstructing the possibly long generic type-name.
670In this situation, having the type name or its short alias is essential.
671
672\CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
673Type inferencing defeats this goal because there is no left-hand type.
674Fundamentally, type inferencing tries to magic away variable types from the programmer.
675However, this results in lazy programming with the potential for poor performance and safety concerns.
676Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think!
677A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{
678There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.}
679The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
680Understanding space and time issues is an essential part of the programming craft.
681Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system.
682Should a significant need arise, this decision can be revisited.
683
684
685\section{Polymorphism}
686
687\CFA provides polymorphic functions and types, where a polymorphic function can constrain types using assertions based on traits.
688
689
690\subsection{Polymorphic Function}
691
692The signature feature of the \CFA type-system is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96}, generalized using a @forall@ clause (giving the language its name).
693\begin{cfa}
694@forall( T )@ T identity( T val ) { return val; }
695int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$
696\end{cfa}
697This @identity@ function can be applied to an \newterm{object type}, \ie a type with a known size and alignment, which is sufficient to stack allocate, default or copy initialize, assign, and delete.
698The \CFA implementation passes the size and alignment for each type parameter, as well as auto-generated default and copy constructors, assignment operator, and destructor.
699For an incomplete \newterm{data type}, \eg pointer/reference types, this information is not needed.
700\begin{cfa}
701forall( T * ) T * identity( T * val ) { return val; }
702int i, * ip = identity( &i );
703\end{cfa}
704Unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
705
706To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
707Here, the function @twice@ works for any type @T@ with a matching addition operator.
708\begin{cfa}
709forall( T @| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }
710int val = twice( twice( 3 ) ); $\C{// val == 12}$
711\end{cfa}
712Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) C functions, like @qsort@ for sorting an array of unknown values.
713\begin{cfa}
714void qsort( void * base, size_t nmemb, size_t size, int (*cmp)( const void *, const void * ) );
715\end{cfa}
716Here, the polymorphism is type-erasure, and the parametric assertion is the comparison function, which is explicitly passed.
717\begin{cfa}
718enum { N = 5 };
719double val[N] = { 5.1, 4.1, 3.1, 2.1, 1.1 };
720int cmp( const void * v1, const void * v2 ) { $\C{// compare two doubles}$
721 return *(double *)v1 < *(double *)v2 ? -1 : *(double *)v2 < *(double *)v1 ? 1 : 0;
722}
723qsort( val, N, sizeof( double ), cmp );
724\end{cfa}
725The equivalent type-safe version in \CFA is a wrapper over the C version.
726\begin{cfa}
727forall( ET | { int @?<?@( ET, ET ); } ) $\C{// type must have < operator}$
728void qsort( ET * vals, size_t dim ) {
729 int cmp( const void * t1, const void * t2 ) { $\C{// nested function}$
730 return *(ET *)t1 @<@ *(ET *)t2 ? -1 : *(ET *)t2 @<@ *(ET *)t1 ? 1 : 0;
731 }
732 qsort( vals, dim, sizeof(ET), cmp ); $\C{// call C version}$
733}
734qsort( val, N ); $\C{// deduct type double, and pass builtin < for double}$
735\end{cfa}
736The nested function @cmp@ is implicitly built and provides the interface from typed \CFA to untyped (@void *@) C.
737Providing a hidden @cmp@ function in \CC is awkward as lambdas do not use C calling conventions and template declarations cannot appear in block scope.
738% In addition, an alternate kind of return is made available: position versus pointer to found element.
739% \CC's type system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a template @bsearch@.
740Call-site inferencing and nested functions provide a localized form of inheritance.
741For example, the \CFA @qsort@ can be made to sort in descending order by locally changing the behaviour of @<@.
742\begin{cfa}
743{
744 int ?<?( double x, double y ) { return x @>@ y; } $\C{// locally override behaviour}$
745 qsort( vals, 10 ); $\C{// descending sort}$
746}
747\end{cfa}
748The local version of @?<?@ overrides the built-in @?<?@ so it is passed to @qsort@.
749The local version performs @?>?@, making @qsort@ sort in descending order.
750Hence, any number of assertion functions can be overridden locally to maximize the reuse of existing functions and types, without the construction of a named inheritance hierarchy.
751A final example is a type-safe wrapper for C @malloc@, where the return type supplies the type/size of the allocation, which is impossible in most type systems.
752\begin{cfa}
753static inline forall( T & | sized(T) )
754T * malloc( void ) {
755 if ( _Alignof(T) <= __BIGGEST_ALIGNMENT__ ) return (T *)malloc( sizeof(T) ); // C allocation
756 else return (T *)memalign( _Alignof(T), sizeof(T) );
757}
758// select type and size from left-hand side
759int * ip = malloc(); double * dp = malloc(); [[aligned(64)]] struct S {...} * sp = malloc();
760\end{cfa}
761The @sized@ assertion passes size and alignment as a data object has no implicit assertions.
762Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@.
763In practise, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@.
764
765This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
766Here, existing C legacy code is leveraged as much as possible;
767other programming languages must build supporting libraries from scratch, even in \CC.
768
769
770\subsection{Traits}
771\label{s:Traits}
772
773\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
774\begin{cquote}
775\begin{tabular}{@{}l|@{\hspace{10pt}}l@{}}
776\begin{cfa}
777forall( T ) trait @sumable@ {
778 void @?{}@( T &, zero_t ); // 0 literal constructor
779 T @?+=?@( T &, T ); // assortment of additions
780 T ?+?( T, T );
781 T ++?( T & );
782 T ?++( T & );
783};
784\end{cfa}
785&
786\begin{cfa}
787forall( T @| sumable( T )@ ) // use trait
788T sum( T a[$\,$], size_t size ) {
789 @T@ total = 0; // initialize by 0 constructor
790 for ( i; size )
791 total @+=@ a[i]; // select appropriate +
792 return total;
793}
794\end{cfa}
795\end{tabular}
796\end{cquote}
797Traits are implemented by flatten them at use points, as if written in full by the programmer.
798Flattening often results in overlapping assertions, \eg operator @+@.
799Hence, trait names play no part in type equivalence.
800In the example, type @T@ is an object type, and hence, has the implicit internal trait @otype@.
801\begin{cfa}
802trait otype( T & | sized(T) ) {
803 void ?{}( T & ); $\C{// default constructor}$
804 void ?{}( T &, T ); $\C{// copy constructor}$
805 void ?=?( T &, T ); $\C{// assignment operator}$
806 void ^?{}( T & ); $\C{// destructor}$
807};
808\end{cfa}
809These implicit functions are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
810
811If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting.
812This case is generalized in polymorphic container-types, such as a list with @insert@ and @remove@ operations, and an element type with copy and assignment.
813
814
815\subsection{Generic Types}
816
817A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
818Broadly speaking, there are three approaches to implement abstract data structures in C.
819\begin{enumerate}[leftmargin=*]
820\item
821Write bespoke data structures for each context.
822While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures.
823\item
824Use @void *@-based polymorphism, \eg the C standard library functions @bsearch@ and @qsort@, which allow for the reuse of code with common functionality.
825However, this approach eliminates the type checker's ability to ensure argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary.
826\item
827Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret.
828Furthermore, writing and using complex preprocessor macros is difficult and inflexible.
829\end{enumerate}
830
831\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
832\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
833For concrete parameters, the generic-type definition can be inlined, like \CC templates, if its definition appears in a header file (\eg @static inline@).
834
835A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
836\begin{cquote}
837\begin{tabular}{@{}l|@{\hspace{10pt}}l@{}}
838\begin{cfa}
839@forall( F, S )@ struct pair {
840 F first; S second;
841};
842@forall( F, S )@ // object
843S second( pair( F, S ) p ) { return p.second; }
844@forall( F *, S * )@ // sized
845S * second( pair( F *, S * ) p ) { return p.second; }
846\end{cfa}
847&
848\begin{cfa}
849pair( double, int ) dpr = { 3.5, 42 };
850int i = second( dpr );
851pair( void *, int * ) vipr = { 0p, &i };
852int * ip = second( vipr );
853double d = 1.0;
854pair( int *, double * ) idpr = { &i, &d };
855double * dp = second( idpr );
856\end{cfa}
857\end{tabular}
858\end{cquote}
859\CFA generic types are \newterm{fixed} or \newterm{dynamic} sized.
860Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters.
861For example, the type variable @T *@ is fixed size and is represented by @void *@ in code generation;
862whereas, the type variable @T@ is dynamic and set at the point of instantiation.
863The difference between fixed and dynamic is the complexity and cost of field access.
864For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
865For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic function.
866See~\cite[\S~3.2]{Moss19} for complete implementation details.
867
868Currently, \CFA generic types allow assertion.
869For example, the following declaration of a sorted set-type ensures the set key supports equality and relational comparison.
870\begin{cfa}
871forall( Elem, @Key@ | { _Bool ?==?( Key, Key ); _Bool ?<?( Key, Key ); } )
872struct Sorted_Set { Elem elem; @Key@ key; ... };
873\end{cfa}
874However, the operations that insert/remove elements from the set should not appear as part of the generic-types assertions.
875\begin{cfa}
876forall( @Elem@ | /* any assertions on element type */ ) {
877 void insert( Sorted_Set set, @Elem@ elem ) { ... }
878 bool remove( Sorted_Set set, @Elem@ elem ) { ... } // false => element not present
879 ... // more set operations
880} // distribution
881\end{cfa}
882(Note, the @forall@ clause can be distributed across multiple functions.)
883For software-engineering reasons, the set assertions would be refactored into a trait to allow alternative implementations, like a Java \lstinline[language=java]{interface}.
884
885In summation, the \CFA type system inherits \newterm{nominal typing} for concrete types from C, and adds \newterm{structural typing} for polymorphic types.
886Traits are used like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance relationships.
887Instead, each polymorphic function or generic type defines the structural type needed for its execution, which is fulfilled at each call site from the lexical environment, like Go~\cite{Go} or Rust~\cite{Rust} interfaces.
888Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal inheritance hierarchy.
889
890
891\section{Language Comparison}
892
893\begin{table}
894\caption{Overload Features in Programming Languages}
895\label{t:OverloadingFeatures}
896\centering
897\begin{minipage}{\linewidth}
898\setlength{\tabcolsep}{5pt}
899\begin{tabular}{@{}r|cccccccc@{}}
900Feature\,{\textbackslash}\,Language & Ada & \CC & \CFA & Java & Scala & Swift & Rust & Haskell \\
901\hline
902general\footnote{overloadable entities: V $\Rightarrow$ variable, O $\Rightarrow$ operator, F $\Rightarrow$ function, M $\Rightarrow$ member}
903 & O\footnote{except assignment}/F & O/F/M & V/O/F & M\footnote{not universal} & O/M & O/F/M & no & no \\
904general constraints\footnote{T $\Rightarrow$ parameter type, \# $\Rightarrow$ parameter number, N $\Rightarrow$ parameter name; R $\Rightarrow$ return type}
905 & T/\#//R\footnote{parameter names can be used to disambiguate among overloads but not create overloads}
906 & T/\# & T/\#/R & T/\# & T/\#/N/R & T/\#/N/R & T/\#/N & T/R \\
907univ. type constraints\footnote{A $\Rightarrow$ concept, T $\Rightarrow$ interface/trait/type-class, B $\Rightarrow$ type bounds}
908 & no\footnote{generic cannot be overloaded} & C & T & B & B & T & T & T \\
909Safe/Unsafe arg. conv. & no & S/U\footnote{no conversions allowed during template parameter deduction} & S/U
910 & S\footnote{unsafe (narrowing) conversion only allowed in assignment or initialization to a primitive (builtin) type} & S
911 & no\footnote{literals only, Int $\rightarrow$ Double (Safe)} & no & no
912\end{tabular}
913\end{minipage}
914% https://doc.rust-lang.org/rust-by-example/trait/impl_trait.html
915% https://dl.acm.org/doi/10.1145/75277.75283
916\end{table}
917
918Because \CFA's type system is focused on overloading and overload resolution, \VRef[Table]{t:OverloadingFeatures} compares \CFA with a representative set of popular programming languages and their take on overloading.
919The first row classifies whether there is general overloading, and what entities may be overloaded.
920\begin{cquote}
921\setlength{\tabcolsep}{10pt}
922\begin{tabular}{@{}llll@{}}
923\multicolumn{1}{c}{\textbf{variable}} & \multicolumn{1}{c}{\textbf{operator}} & \multicolumn{1}{c}{\textbf{function}} & \multicolumn{1}{c}{\textbf{member}} \\
924\begin{cfa}
925int foo;
926double foo;
927
928
929\end{cfa}
930&
931\begin{cfa}
932int ?+?( int );
933double ?+?( double );
934
935
936\end{cfa}
937&
938\begin{cfa}
939int foo( int );
940double foo( int, int );
941
942
943\end{cfa}
944&
945\begin{c++}
946class C {
947 int foo( int );
948 double foo( int, int );
949};
950\end{c++}
951\end{tabular}
952\end{cquote}
953
954The second row classifies the specialization mechanisms used to distinguish among the general overload capabilities.
955\begin{cquote}
956\begin{tabular}{@{}llll@{}}
957\multicolumn{1}{c}{\textbf{type}} & \multicolumn{1}{c}{\textbf{number}} & \multicolumn{1}{c}{\textbf{name}} & \multicolumn{1}{c}{\textbf{return}} \\
958\begin{swift}
959func foo( _ x : Int )
960func foo( _ x : Double )
961foo( 3 )
962foo( 3.5 )
963\end{swift}
964&
965\begin{swift}
966func foo( _ x : Int )
967func foo( _ x : Int, _ y : Int )
968foo( 3 )
969foo( 3, 3 )
970\end{swift}
971&
972\begin{swift}
973func foo( x : Int )
974func foo( y : Int )
975foo( x : 3 )
976foo( y : 3 );
977\end{swift}
978&
979\begin{swift}
980func foo() -> Int
981func foo() -> String
982var i : Int = foo()
983var s : String = foo();
984\end{swift}
985\end{tabular}
986\end{cquote}
987
988The third row classifies if generic functions can be overloaded based on differing type variables, number of type variables, and type-variable constraints.
989The following examples illustrates the first two overloading cases.
990\begin{swift}
991func foo<T>( a : T ); $\C[2.25in]{// foo1}$
992func foo<T>( a : T, b : T ); $\C{// foo2}$
993func foo<T,U>( a : T, b : U ); $\C{// foo3}$
994foo( a : 3.5 ); $\C{// foo1}$
995foo( a : 2, b : 2 ); $\C{// foo2}$
996foo( a : 2, b : 2.5 ); $\C{// foo3}\CRT$
997\end{swift}
998Here, the overloaded calls are disambiguated using argument types and their number.
999
1000However, the parameter operations are severely restricted because universal types have few operations.
1001For example, swift provides a @print@ operation for its universal type, and the java @Object@ class provides general methods: @toString@, @hashCode@, @equals@, @finalize@, \etc.
1002This restricted mechanism still supports a few useful functions, where the parameters are abstract entities, \eg:
1003\begin{swift}
1004func swap<T>( _ a : inout T, _ b : inout T ) { let temp : T = a; a = b; b = temp; }
1005var ix = 4, iy = 3;
1006swap( &ix, &iy );
1007var fx = 4.5, fy = 3.5;
1008swap( &fx, &fy );
1009\end{swift}
1010To make a universal function useable, an abstract description is needed for the operations used on the parameters within the function body.
1011Type matching these operations can occur by discover using techniques like \CC template expansion, or explicit stating, \eg interfaces, subtyping (inheritance), assertions (traits), type classes, type bounds.
1012The mechanism chosen can affect separate compilation or require runtime type information (RTTI).
1013\begin{description}
1014\item[concept] ~
1015\begin{c++}
1016concept range = requires( T& t ) {
1017 ranges::begin(t); // equality-preserving for forward iterators
1018 ranges::end (t);
1019};
1020\end{c++}
1021\item[traits/type-classes] ~
1022\begin{cfa}
1023forall( T | { T ?+?( T, T ) } )
1024\end{cfa}
1025\item[inheritance/type bounds] ~
1026\begin{scala}
1027class CarPark[Vehicle >: bike <: bus](val lot : Vehicle)
1028\end{scala}
1029\end{description}
1030
1031\begin{figure}
1032\setlength{\tabcolsep}{15pt}
1033\begin{tabular}{@{}ll@{}}
1034\multicolumn{1}{c}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{Haskell}} \\
1035\begin{cfa}
1036forall( T ) trait sumable {
1037 void ?{}( T &, zero_t );
1038 T ?+=?( T &, T );
1039};
1040forall( T | sumable( T ) )
1041T sum( T a[], size_t size ) {
1042 T total = 0;
1043 for ( i; size ) total += a[i];
1044 return total;
1045}
1046struct S { int i, j; };
1047void ?{}( S & s, zero_t ) { s.[i, j] = 0; }
1048void ?{}( S & s ) { s{0}; }
1049void ?{}( S & s, int i, int j ) { s.[i, j] = [i, j]; }
1050S ?+=?( S & l, S r ) { l.[i, j] += r.[i, j]; }
1051
1052int main() {
1053 int ia[] = { 1, 2, 3 };
1054 sout | sum( ia, 3 ); // trait inference
1055 double da[] = { 1.5, 2.5, 3.5 };
1056 sout | sum( da, 3 ); // trait inference
1057 S sa[] = { {1, 1}, {2, 2}, {3, 3 } };
1058 sout | sum( sa, 3 ).[i, j]; // trait inference
1059}
1060\end{cfa}
1061&
1062\begin{haskell}
1063class Sumable a where
1064 szero :: a
1065 sadd :: a -> a -> a
1066
1067ssum :: Sumable a $=>$ [a] -> a
1068ssum (x:xs) = sadd x (ssum xs)
1069ssum [] = szero
1070
1071
1072
1073data S = S Int Int deriving Show
1074@instance Sumable Int@ where
1075 szero = 0
1076 sadd = (+)
1077@instance Sumable Float@ where
1078 szero = 0.0
1079 sadd = (+)
1080@instance Sumable S@ where
1081 szero = S 0 0
1082 sadd (S l1 l2) (S r1 r2) = S (l1 + r1) (l2 + r2)
1083main = do
1084 print ( ssum [1::Int, 2, 3] )
1085 print ( ssum [1.5::Float, 2.5, 3.5] )
1086 print ( ssum [(S 1 1), (S 2 2), (S 3 3)] )
1087\end{haskell}
1088\end{tabular}
1089\caption{Implicitly/Explicitly Trait Inferencing}
1090\label{f:ImplicitlyExplicitlyTraitInferencing}
1091\end{figure}
1092
1093One differentiating feature among these specialization techniques is the ability to implicitly or explicitly infer the trait information at a class site.
1094\VRef[Figure]{f:ImplicitlyExplicitlyTraitInferencing} compares the @sumable@ trait and polymorphic @sum@ function \see{\VRef{s:Traits}} for \CFA and Haskell.
1095Here, the \CFA type system inferences the trait functions at each call site, so no additional specification is necessary by the programmer.
1096The Haskell program requires the programmer to explicitly bind the trait and to each type that can be summed.
1097(The extra casts in the Haskell program are necessary to differentiate between the two kinds of integers and floating-point numbers.)
1098Ada also requires explicit binding, creating a new function name for each generic instantiation.
1099\begin{ada}
1100function int_Twice is new Twice( Integer, "+" => "+" );
1101function float_Twice is new Twice( Float, "+" => "+" );
1102\end{ada}
1103Finally, there is a belief that certain type systems cannot support general overloading, \eg Haskell.
1104As \VRef[Table]{t:OverloadingFeatures} shows, there are multiple languages with both general and parametric overloading, so the decision to not support general overloading is based on the opinion of the language designers and the type system they choose, not any reason in type theory.
1105
1106The fourth row classifies if conversions are attempted beyond exact match.
1107\begin{cfa}
1108int foo( double ); $\C[1.75in]{// 1}$
1109double foo( int ); $\C{// 2}$
1110int i = foo( 3 ); $\C{// 1 : 3 to 3.0 argument conversion => all conversions are safe}$
1111double d = foo( 3.5 ); $\C{// 1 : int to double result conversion => all conversions are safe}\CRT$
1112\end{cfa}
1113
1114
1115\section{Contributions}
1116
1117The \CFA compiler performance and type capability have been greatly improved through my development work.
1118\begin{enumerate}
1119\item
1120The compilation time of various \CFA library units and test programs has been reduced by an order of magnitude, from minutes to seconds \see{\VRef[Table]{t:SelectedFileByCompilerBuild}}, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features.
1121The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis.
1122\item
1123The thesis presents a systematic review of the new features added to the \CFA language and its type system.
1124Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance.
1125Several issues coming from the interactions of various language features are identified and discussed in this thesis;
1126some of them I have resolved, while others are given temporary fixes and need to be reworked in the future.
1127\item
1128Finally, this thesis provides constructive ideas for fixing a number of high-level issues in the \CFA language design and implementation, and gives a path for future improvements to the language and compiler.
1129\end{enumerate}
1130
1131
1132\begin{comment}
1133From: Andrew James Beach <ajbeach@uwaterloo.ca>
1134To: Peter Buhr <pabuhr@uwaterloo.ca>, Michael Leslie Brooks <mlbrooks@uwaterloo.ca>,
1135 Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>
1136Subject: Re: Haskell
1137Date: Fri, 30 Aug 2024 16:09:06 +0000
1138
1139Do you mean:
1140
1141one = 1
1142
1143And then write a bunch of code that assumes it is an Int or Integer (which are roughly int and Int in Cforall) and then replace it with:
1144
1145one = 1.0
1146
1147And have that crash? That is actually enough, for some reason Haskell is happy to narrow the type of the first literal (Num a => a) down to Integer but will not do the same for (Fractional a => a) and Rational (which is roughly Integer for real numbers). Possibly a compatibility thing since before Haskell had polymorphic literals.
1148
1149Now, writing even the first version will fire a -Wmissing-signatures warning, because it does appear to be understood that just from a documentation perspective, people want to know what types are being used. Now, if you have the original case and start updating the signatures (adding one :: Fractional a => a), you can eventually get into issues, for example:
1150
1151import Data.Array (Array, Ix, (!))
1152atOne :: (Ix a, Frational a) => Array a b -> b - - In CFA: forall(a | Ix(a) | Frational(a), b) b atOne(Array(a, b) const & array)
1153atOne = (! one)
1154
1155Which compiles and is fine except for the slightly awkward fact that I don't know of any types that are both Ix and Fractional types. So you might never be able to find a way to actually use that function. If that is good enough you can reduce that to three lines and use it.
1156
1157Something that just occurred to me, after I did the above examples, is: Are there any classic examples in literature I could adapt to Haskell?
1158
1159Andrew
1160
1161PS, I think it is too obvious of a significant change to work as a good example but I did mock up the structure of what I am thinking you are thinking about with a function. If this helps here it is.
1162
1163doubleInt :: Int -> Int
1164doubleInt x = x * 2
1165
1166doubleStr :: String -> String
1167doubleStr x = x ++ x
1168
1169-- Missing Signature
1170action = doubleInt - replace with doubleStr
1171
1172main :: IO ()
1173main = print $ action 4
1174\end{comment}
Note: See TracBrowser for help on using the repository browser.