Context Navigation

source: doc/theses/fangren_yu_MMath/intro.tex @ 2980ccb8

Last change on this file since 2980ccb8 was 2980ccb8, checked in by Peter A. Buhr <pabuhr@…>, 19 hours ago
more proofreading of introduction chapter
Property mode set to `100644`
File size: 30.9 KB

Line
1	\chapter{Introduction}
2
3	This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the type resolver used to select polymorphic types among overloaded names.
4	Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
5	\begin{quote}
6	There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton
7	\end{quote}
8	Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
9	In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process.
10	Depending on the language, any ambiguous cases are resolved using some form of qualification and/or casting.
11
12	Therefore, one of the key goals in \CFA is to push the boundary on overloading, and hence, overload resolution.
13	As well, \CFA follows the current trend of replacing nominal inheritance with traits.
14	Together, the resulting \CFA type-system has a number of unique features making it different from other programming languages with expressive, static type-systems.
15
16
17	\section{Types}
18
19	\begin{quote}
20	Some are born great, some achieve greatness, and some have greatness thrust upon them. Twelfth Night, Act II Scene 5, William Shakespeare
21	\end{quote}
22
23	All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
24	A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
25	These language types are \emph{thrust} upon programmers, with their syntactic and semantic rules and restrictions.
26	These rules are used to transform a language expression to a hardware expression.
27	Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
28	Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
29	Expressibility, generalization, and safety are all bound up in a language's type system, and hence, directly affect the capability, build time, and correctness of program development.
30
31
32	\section{Operator Overloading}
33
34	Virtually all programming languages overload the arithmetic operators across the basic computational types using the number and type of parameters and returns.
35	Like \CC, \CFA also allows these operators to be overloaded with user-defined types.
36	The syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
37	Here, a user-defined type is extended with an addition operation with the same syntax as builtin types.
38	\begin{cfa}
39	struct S { int i, j };
40	S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; }
41	S s1, s2;
42	s1 = s1 @+@ s2; $\C[1.75in]{// infix call}$
43	s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$
44	\end{cfa}
45	The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
46	If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s).
47	Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be the same type.
48	Note, without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types.
49	This approach does not match with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
50
51
52	\section{Function Overloading}
53
54	Both \CFA and \CC allow function names to be overloaded, as long as their prototypes differ in the number and type of parameters and returns.
55	\begin{cfa}
56	void f( void ); $\C[2in]{// (1): no parameter}$
57	void f( char ); $\C{// (2): overloaded on the number and parameter type}$
58	void f( int, int ); $\C{// (3): overloaded on the number and parameter type}$
59	f( 'A' ); $\C{// select (2)}\CRT$
60	\end{cfa}
61	In this case, the name @f@ is overloaded depending on the number and parameter types.
62	The type system examines each call size and selects the best match based on the number and types of the arguments.
63	Here, there is a perfect match for the call, @f( 'A' )@ with the number and parameter type of function (2).
64
65	Ada, Scala, and \CFA type-systems also use the return type in resolving a call, to pinpoint the best overloaded name.
66	For example, in many programming languages with overloading, the following functions are ambiguous without using the return type.
67	\begin{cfa}
68	int f( int ); $\C[2in]{// (1); overloaded on return type and parameter}$
69	double f( int ); $\C{// (2); overloaded on return type and parameter}$
70	int i = f( 3 ); $\C{// select (1)}$
71	double d = f( 3 ); $\C{// select (2)}\CRT$
72	\end{cfa}
73	Alternatively, if the type system looks at the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
74	This capability can be taken to the extreme, where there are no function parameters.
75	\begin{cfa}
76	int random( void ); $\C[2in]{// (1); overloaded on return type}$
77	double random( void ); $\C{// (2); overloaded on return type}$
78	int i = random(); $\C{// select (1)}$
79	double d = random(); $\C{// select (2)}\CRT$
80	\end{cfa}
81	Again, there is an exact match for each call.
82	If there is no exact match, a set of minimal, safe conversions can be added to find a best match, as for operator overloading.
83
84
85	\section{Variable Overloading}
86
87	Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
88	(Shadow overloading is also possible for functions, if a language supports nested function declarations, \eg \CC named, nested, lambda functions.)
89	\begin{cfa}
90	void foo( double d );
91	int v; $\C[2in]{// (1)}$
92	double v; $\C{// (2) variable overloading}$
93	foo( v ); $\C{// select (2)}$
94	{
95	int v; $\C{// (3) shadow overloading}$
96	double v; $\C{// (4) and variable overloading}$
97	foo( v ); $\C{// select (4)}\CRT$
98	}
99	\end{cfa}
100	It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
101	However, variable overloading within a scope is often considered extremely dangerous, without any evidence to corroborate this claim.
102	In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems.
103
104	In \CFA, the type system simply treats overloaded variables as an overloaded function returning a value with no parameters.
105	Hence, no significant effort is required to support this feature by leveraging the return type to disambiguate as variables have no parameters.
106	\begin{cfa}
107	int MAX = 2147483647; $\C[2in]{// (1); overloaded on return type}$
108	long int MAX = ...; $\C{// (2); overloaded on return type}$
109	double MAX = ...; $\C{// (3); overloaded on return type}$
110	int i = MAX; $\C{// select (1)}$
111	long int i = MAX; $\C{// select (2)}$
112	double d = MAX; $\C{// select (3)}\CRT$
113	\end{cfa}
114	Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc.
115	The result is a significant reduction in names to access typed constants.
116
117	As an aside, C has a separate namespace for type and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.
118	\begin{cfa}
119	void S() {
120	struct @S@ { int S; };
121	@struct S@ S;
122	void S( @struct S@ S ) { S.S = 1; };
123	}
124	\end{cfa}
125
126
127	\section{Constant Overloading}
128
129	\CFA is unique in providing restricted constant overloading for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
130	In addition, several operations are defined in terms of values @0@ and @1@.
131	For example, every @if@ and iteration statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
132	\begin{cfa}
133	if ( x ) ++x; => if ( x @!= 0@ ) x @+= 1@;
134	for ( ; x; --x ) => for ( ; x @!= 0@; x @-= 1@ )
135	\end{cfa}
136	To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work with the special @0@ and @1@ contexts.
137	The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
138	\begin{cfa}
139	struct S { int i, j; };
140	void ?{}( S & s, zero_t ) { s.[i,j] = 0; } $\C{// constructors}$
141	void ?{}( S & s, one_t ) { s.[i,j] = 1; }
142	S ?=?( S & dst, zero_t ) { dst.[i,j] = 0; return dst; } $\C{// assignment}$
143	S ?=?( S & dst, one_t ) { dst.[i,j] = 1; return dst; }
144	S ?+=?( S & s, one_t ) { s.[i,j] += 1; return s; } $\C{// increment/decrement each field}$
145	S ?-=?( S & s, one_t ) { s.[i,j] -= 1; return s; }
146	int ?!=?( S s, zero_t ) { return s.i != 0 && s.j != 0; } $\C{// comparison}$
147	S s = @0@; $\C{// initialization}$
148	s = @0@; $\C{// assignments}$
149	s = @1@;
150	if ( @s@ ) @++s@; $\C{// special, unary ++/-\,- come implicitly from +=/-=}$
151	\end{cfa}
152	Here, type @S@ is first-class with respect to the basic types, working with all existing implicit C mechanisms.
153
154
155	\section{Type Inferencing}
156
157	Every variable has a type, but association between them can occur in different ways:
158	at the point where the variable comes into existence (declaration) and/or on each assignment to the variable.
159	\begin{cfa}
160	double x; $\C{// type only}$
161	float y = 3.1D; $\C{// type and initialization}$
162	auto z = y; $\C{// initialization only}$
163	z = "abc"; $\C{// assignment}$
164	\end{cfa}
165	For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
166	For type-and-initialization, the specified and initialization types may not agree.
167	For initialization-only, the compiler may select the type by melding programmer and context information.
168	When the compiler participates in type selection, it is called \newterm{type inferencing}.
169	Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed values and performs a (possibly lossy) action to convert one value to the type of the other variable.
170	Finally, for assignment, the current variable and expression types may not agree.
171
172	One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
173	Here, the type resolver starts with the types of the program constants used for initialization and these constant types flow throughout the program, setting all variable and expression types.
174	\begin{cfa}
175	auto f() {
176	x = 1; y = 3.5; $\C{// set types from constants}$
177	x = // expression involving x, y and other local initialized variables
178	y = // expression involving x, y and other local initialized variables
179	return x, y;
180	}
181	auto w = f(); $\C{// typing flows outwards}$
182
183	void f( auto x, auto y ) {
184	x = // expression involving x, y and other local initialized variables
185	y = // expression involving x, y and other local initialized variables
186	}
187	s = 1; t = 3.5; $\C{// set types from constants}$
188	f( s, t ); $\C{// typing flows inwards}$
189	\end{cfa}
190	In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
191	Note, like template meta programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
192	Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
193	in this case, parametric polymorphism can allow separate compilation.
194	In languages with type inferencing, there is often limited overloading to reduce the search space, which introduces the naming problem.
195	Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.
196
197	In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages.
198	\begin{cquote}
199	\setlength{\tabcolsep}{10pt}
200	\begin{tabular}{@{}lll@{}}
201	\multicolumn{1}{c}{\textbf{gcc / \CFA}} & \multicolumn{1}{c}{\textbf{\CC}} \\
202	\begin{cfa}
203	#define expr 3.0 * i
204	typeof(expr) x = expr;
205	int y;
206	typeof(y) z = y;
207	\end{cfa}
208	&
209	\begin{cfa}
210
211	auto x = 3.0 * i;
212	int y;
213	auto z = y;
214	\end{cfa}
215	&
216	\begin{cfa}
217
218	// use type of initialization expression
219
220	// use type of initialization expression
221	\end{cfa}
222	\end{tabular}
223	\end{cquote}
224	The two important capabilities are:
225	\begin{itemize}[topsep=0pt]
226	\item
227	Not determining or writing long generic types, \eg, given deeply nested generic types.
228	\begin{cfa}
229	typedef T1(int).T2(float).T3(char).T @ST@; $\C{// \CFA nested type declaration}$
230	@ST@ x, y, x;
231	\end{cfa}
232	This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
233	\item
234	Ensuring the type of secondary variables, match a primary variable(s).
235	\begin{cfa}
236	int x; $\C{// primary variable}$
237	typeof(x) y, z, w; $\C{// secondary variables match x's type}$
238	\end{cfa}
239	If the type of @x@ changes, the type of the secondary variables correspondingly updates.
240	\end{itemize}
241	Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
242	\begin{cfa}
243	int x;
244	type(x) y = ... // complex expression
245	type(x) z = ... // complex expression
246	\end{cfa}
247	Here, the types of @y@ and @z@ are fixed (branded), whereas with type inferencing, the types of @y@ and @z@ are potentially unknown.
248
249
250	\section{Type-Inferencing Issues}
251
252	Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions, or confusions.
253
254	A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience.
255
256	A restriction is the conundrum in type inferencing of when to \emph{brand} a type.
257	That is, when is the type of the variable/function more important than the type of its initialization expression.
258	For example, if a change is made in an initialization expression, it can cause cascading type changes and/or errors.
259	At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or in error when it changes.
260	Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization.
261	\begin{cfa}
262	void f( @int@ x, @int@ y ) { // brand function prototype
263	x = // expression involving x, y and other local initialized variables
264	y = // expression involving x, y and other local initialized variables
265	}
266	s = 1; t = 3.5;
267	f( s, @t@ ); // type mismatch
268	\end{cfa}
269	In Haskell, it is common for programmers to brand (type) function parameters.
270
271	A confusion is large blocks of code where all declarations are @auto@, as is now common in \CC.
272	As a result, understanding and changing the code becomes almost impossible.
273	Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
274	In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types for them.
275	For example, given:
276	\begin{cfa}
277	auto x = @...@
278	\end{cfa}
279	and the need to write a routine to compute using @x@
280	\begin{cfa}
281	void rtn( @type of x@ parm );
282	rtn( x );
283	\end{cfa}
284	A programmer must re-engineer the type of @x@'s initialization expression, reconstructing the possibly long generic type-name.
285	In this situation, having the type name or its short alias is essential.
286
287	The \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
288	Type inferencing defeats this goal because there is no left-hand type.
289	Fundamentally, type inferencing tries to magic away variable types from the programmer.
290	However, this results in lazy programming with the potential for poor performance and safety concerns.
291	Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think!
292	A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{
293	There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.}
294	The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
295	Understanding space and time issues is an essential part of the programming craft.
296	Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, implicit type-inferencing is unsupported.
297	Should a significant need arise, this feature can be revisited.
298
299
300	\section{Polymorphism}
301
302	\CFA provides polymorphic functions and types, where a polymorphic function can constrain types using assertions based on traits.
303
304
305	\subsection{Polymorphic Function}
306
307	The signature feature of the \CFA type-system is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96}, generalized using a @forall@ clause (giving the language its name).
308	\begin{cfa}
309	@forall( T )@ T identity( T val ) { return val; }
310	int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$
311	\end{cfa}
312	This @identity@ function can be applied to an \newterm{object type}, \ie a type with a known size and alignment, which is sufficient to stack allocate, default or copy initialize, assign, and delete.
313	The \CFA implementation passes the size and alignment for each type parameter, as well as any implicit/explicit constructor, copy constructor, assignment operator, and destructor.
314	For an incomplete \newterm{data type}, \eg pointer/reference types, this information is not needed.
315	\begin{cfa}
316	forall( T * ) T * identity( T * val ) { return val; }
317	int i, * ip = identity( &i );
318	\end{cfa}
319	Unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
320
321	To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
322	For example, the function @twice@ works for any type @T@ with a matching addition operator.
323	\begin{cfa}
324	forall( T @\| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }
325	int val = twice( twice( 3 ) ); $\C{// val == 12}$
326	\end{cfa}
327	For example. parametric polymorphism and assertions occurs in existing type-unsafe (@void *@) C @qsort@ to sort an array.
328	\begin{cfa}
329	void qsort( void * base, size_t nmemb, size_t size, int (cmp)( const void , const void * ) );
330	\end{cfa}
331	Here, the polymorphism is type-erasure, and the parametric assertion is the comparison routine, which is explicitly passed.
332	\begin{cfa}
333	enum { N = 5 };
334	double val[N] = { 5.1, 4.1, 3.1, 2.1, 1.1 };
335	int cmp( const void * v1, const void * v2 ) { $\C{// compare two doubles}$
336	return (double )v1 < (double )v2 ? -1 : (double )v2 < (double )v1 ? 1 : 0;
337	}
338	qsort( val, N, sizeof( double ), cmp );
339	\end{cfa}
340	The equivalent type-safe version in \CFA is a wrapper over the C version.
341	\begin{cfa}
342	forall( ET \| { int @?<?@( ET, ET ); } ) $\C{// type must have < operator}$
343	void qsort( ET * vals, size_t dim ) {
344	int cmp( const void * t1, const void * t2 ) { $\C{// nested function}$
345	return (ET )t1 @<@ (ET )t2 ? -1 : (ET )t2 @<@ (ET )t1 ? 1 : 0;
346	}
347	qsort( vals, dim, sizeof(ET), cmp ); $\C{// call C version}$
348	}
349	qsort( val, N ); $\C{// deduct type double, and pass builtin < for double}$
350	\end{cfa}
351	The nested function @cmp@ is implicitly built and provides the interface from typed \CFA to untyped (@void *@) C.
352	Providing a hidden @cmp@ function in \CC is awkward as lambdas do not use C calling conventions and template declarations cannot appear in block scope.
353	% In addition, an alternate kind of return is made available: position versus pointer to found element.
354	% \CC's type system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a template @bsearch@.
355	Call-site inferencing and nested functions provide a localized form of inheritance.
356	For example, the \CFA @qsort@ can be made to sort in descending order by locally changing the behaviour of @<@.
357	\begin{cfa}
358	{
359	int ?<?( double x, double y ) { return x @>@ y; } $\C{// locally override behaviour}$
360	qsort( vals, 10 ); $\C{// descending sort}$
361	}
362	\end{cfa}
363	The local version of @?<?@ overrides the built-in @?<?@ so it is passed to @qsort@.
364	The local version performs @?>?@, making @qsort@ sort in descending order.
365	Hence, any number of assertion functions can be overridden locally to maximize the reuse of existing functions and types, without the construction of a named inheritance hierarchy.
366	A final example is a type-safe wrapper for C @malloc@, where the return type supplies the type/size of the allocation, which is impossible in most type systems.
367	\begin{cfa}
368	static inline forall( T & \| sized(T) )
369	T * malloc( void ) {
370	if ( _Alignof(T) <= __BIGGEST_ALIGNMENT__ ) return (T *)malloc( sizeof(T) ); // C allocation
371	else return (T *)memalign( _Alignof(T), sizeof(T) );
372	}
373	// select type and size from left-hand side
374	int * ip = malloc(); double * dp = malloc(); $@$[aligned(64)] struct S {...} * sp = malloc();
375	\end{cfa}
376	The @sized@ assertion passes size and alignment as a data object has no implicit assertions.
377	Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@.
378
379	These mechanism are used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
380	Hence, existing C legacy code is leveraged as much as possible;
381	other programming languages must build supporting libraries from scratch, even in \CC.
382
383
384	\subsection{Traits}
385
386	\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
387	\begin{cquote}
388	\begin{tabular}{@{}l\|@{\hspace{10pt}}l@{}}
389	\begin{cfa}
390	trait @sumable@( T ) {
391	void @?{}@( T &, zero_t ); // 0 literal constructor
392	T ?+?( T, T ); // assortment of additions
393	T @?+=?@( T &, T );
394	T ++?( T & );
395	T ?++( T & );
396	};
397	\end{cfa}
398	&
399	\begin{cfa}
400	forall( T @\| sumable( T )@ ) // use trait
401	T sum( T a[$\,$], size_t size ) {
402	@T@ total = { @0@ }; // initialize by 0 constructor
403	for ( size_t i = 0; i < size; i += 1 )
404	total @+=@ a[i]; // select appropriate +
405	return total;
406	}
407	\end{cfa}
408	\end{tabular}
409	\end{cquote}
410	Traits are simply flatten at the use point, as if written in full by the programmer, where traits often contain overlapping assertions, \eg operator @+@.
411	Hence, trait names play no part in type equivalence.
412	Note, the type @T@ is an object type, and hence, has the implicit internal trait @otype@.
413	\begin{cfa}
414	trait otype( T & \| sized(T) ) {
415	void ?{}( T & ); $\C{// default constructor}$
416	void ?{}( T &, T ); $\C{// copy constructor}$
417	void ?=?( T &, T ); $\C{// assignment operator}$
418	void ^?{}( T & ); $\C{// destructor}$
419	};
420	\end{cfa}
421	The implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
422
423
424	\subsection{Generic Types}
425
426	A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
427	Broadly speaking, there are three approaches to implement abstract data structures in C.
428	\begin{enumerate}[leftmargin=*]
429	\item
430	Write bespoke data structures for each context they are needed.
431	While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures.
432	\item
433	Use @void *@-based polymorphism, \eg the C standard library functions @bsearch@ and @qsort@, which allow for the reuse of code with common functionality.
434	However, this approach eliminates the type checker's ability to ensure argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary.
435	\item
436	Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret.
437	Furthermore, writing and using preprocessor macros is difficult and inflexible.
438	\end{enumerate}
439
440	\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
441	\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
442	However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.
443
444	A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
445	\begin{cquote}
446	\begin{tabular}{@{}l\|@{\hspace{10pt}}l@{}}
447	\begin{cfa}
448	@forall( F, S )@ struct pair {
449	F first; S second;
450	};
451	@forall( F, S )@ // object
452	S second( pair( F, S ) p ) { return p.second; }
453	@forall( F , S )@ // sized
454	S * second( pair( F , S ) p ) { return p.second; }
455	\end{cfa}
456	&
457	\begin{cfa}
458	pair( double, int ) dpr = { 3.5, 42 };
459	int i = second( dpr );
460	pair( void , int ) vipr = { 0p, &i };
461	int * ip = second( vipr );
462	double d = 1.0;
463	pair( int , double ) idpr = { &i, &d };
464	double * dp = second( idpr );
465	\end{cfa}
466	\end{tabular}
467	\end{cquote}
468	\CFA generic types are \newterm{fixed} or \newterm{dynamic} sized.
469	Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on their type parameters.
470	For example, the type variable @T @ is fixed size and is represented by @void @ in code generation;
471	whereas, the type variable @T@ is dynamic and set at the point of instantiation.
472	The difference between fixed and dynamic is the complexity and cost of field access.
473	For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
474	For dynamic, field offsets are computed at compile time at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
475	See~\cite[\S~3.2]{Moss19} for complete implementation details.
476
477	Currently, \CFA generic types allow assertion.
478	For example, the following declaration of a sorted set-type ensures the set key supports equality and relational comparison.
479	\begin{cfa}
480	forall( Elem, @Key@ \| { _Bool ?==?( Key, Key ); _Bool ?<?( Key, Key ); } )
481	struct Sorted_Set { Elem elem; @Key@ key; ... };
482	\end{cfa}
483	However, the operations that insert/remove elements from the set should not appear as part of the generic-types assertions.
484	\begin{cfa}
485	forall( @Elem@ \| /* any assertions on element type */ ) {
486	void insert( Sorted_Set set, @Elem@ elem ) { ... }
487	bool remove( Sorted_Set set, @Elem@ elem ) { ... } // false => element not present
488	... // more set operations
489	} // distribution
490	\end{cfa}
491	(Note, the @forall@ clause can be distributed across multiple functions.)
492	For software-engineering reasons, the set assertions would be refactored into a trait to allow alternative implementations, like a Java \lstinline[language=java]{interface}.
493
494	In summation, the \CFA type system inherits \newterm{nominal typing} for concrete types from C, and adds \newterm{structural typing} for polymorphic types.
495	Traits are used like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance relationships.
496	Instead, each polymorphic function or generic type defines the structural type needed for its execution, which is fulfilled at each call site from the lexical environment, like Go~\cite{Go} or Rust~\cite{Rust} interfaces.
497	Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal inheritance hierarchy.
498
499
500	\section{Contributions}
501
502	\begin{enumerate}
503	\item
504	\item
505	\item
506	\end{enumerate}
507
508
509	\begin{comment}
510	From: Andrew James Beach <ajbeach@uwaterloo.ca>
511	To: Peter Buhr <pabuhr@uwaterloo.ca>, Michael Leslie Brooks <mlbrooks@uwaterloo.ca>,
512	Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>
513	Subject: Re: Haskell
514	Date: Fri, 30 Aug 2024 16:09:06 +0000
515
516	Do you mean:
517
518	one = 1
519
520	And then write a bunch of code that assumes it is an Int or Integer (which are roughly int and Int in Cforall) and then replace it with:
521
522	one = 1.0
523
524	And have that crash? That is actually enough, for some reason Haskell is happy to narrow the type of the first literal (Num a => a) down to Integer but will not do the same for (Fractional a => a) and Rational (which is roughly Integer for real numbers). Possibly a compatibility thing since before Haskell had polymorphic literals.
525
526	Now, writing even the first version will fire a -Wmissing-signatures warning, because it does appear to be understood that just from a documentation perspective, people want to know what types are being used. Now, if you have the original case and start updating the signatures (adding one :: Fractional a => a), you can eventually get into issues, for example:
527
528	import Data.Array (Array, Ix, (!))
529	atOne :: (Ix a, Frational a) => Array a b -> b - - In CFA: forall(a \| Ix(a) \| Frational(a), b) b atOne(Array(a, b) const & array)
530	atOne = (! one)
531
532	Which compiles and is fine except for the slightly awkward fact that I don't know of any types that are both Ix and Fractional types. So you might never be able to find a way to actually use that function. If that is good enough you can reduce that to three lines and use it.
533
534	Something that just occurred to me, after I did the above examples, is: Are there any classic examples in literature I could adapt to Haskell?
535
536	Andrew
537
538	PS, I think it is too obvious of a significant change to work as a good example but I did mock up the structure of what I am thinking you are thinking about with a function. If this helps here it is.
539
540	doubleInt :: Int -> Int
541	doubleInt x = x * 2
542
543	doubleStr :: String -> String
544	doubleStr x = x ++ x
545
546	-- Missing Signature
547	action = doubleInt - replace with doubleStr
548
549	main :: IO ()
550	main = print $ action 4
551	\end{comment}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: