Context Navigation

source: doc/theses/fangren_yu_MMath/intro.tex@ bbbff10

Visit:

Last change on this file since bbbff10 was 0f070a4, checked in by Peter A. Buhr <pabuhr@…>, 10 months ago
proofread intro chapter
Property mode set to `100644`
File size: 34.2 KB

Rev	Line
[2e94f3e7]	1	\chapter{Introduction}
	2
[5a02308]	3	This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the \newterm{type-resolver} used to satisfy call-site assertions among overloaded variable and function names to allow polymorphic routine calls.
	4	Assertions are the operations a function uses within its body to perform its computation.
	5	For example, a polymorphic function summing an array needs a size, zero, assignment, and plus for the array element-type, and a subscript operation for the array type.
	6	\begin{cfa}
	7	T sum( T a[$\,$], size_t size ) {
[0f070a4]	8	@T@ total = { @0@ }; $\C[1.75in]{// size, 0 for type T}$
[5a02308]	9	for ( size_t i = 0; i < size; i += 1 )
[0f070a4]	10	total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$
[5a02308]	11	return total;
	12	}
	13	\end{cfa}
	14	In certain cases, if the resolver fails to find an exact assertion match, it attempts to find a \emph{best} match using reasonable type conversions.
	15	Hence, \CFA follows the current trend of replacing nominal inheritance with traits composed of assertions for type matching.
	16	The over-arching goal is to push the boundary on localized assertion matching, with advanced overloading resolution and type conversions that match programmer expectations in the C programming language.
	17	Together, the resulting \CFA type-system has a number of unique features making it different from other programming languages with expressive, static, type-systems.
[7f2e87a]	18
	19
[cdbb909]	20	\section{Types}
[7f2e87a]	21
[cdbb909]	22	All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
[b4c6e10]	23	A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
[0f070a4]	24	These language types are thrust upon programmers with their syntactic/semantic rules and restrictions.
	25	These rules are then used to transform a language expression to a hardware expression.
[cdbb909]	26	Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
	27	Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
	28	Expressibility, generalization, and safety are all bound up in a language's type system, and hence, directly affect the capability, build time, and correctness of program development.
	29
	30
[5a02308]	31	\section{Overloading}
	32
	33	\begin{quote}
	34	There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton
	35	\end{quote}
[0f070a4]	36	Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
[5a02308]	37	Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
	38	In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process.
[0f070a4]	39	Depending on the language, any ambiguous cases are resolved explicitly using some form of qualification and/or cast.
[5a02308]	40
	41
	42	\subsection{Operator Overloading}
[cdbb909]	43
	44	Virtually all programming languages overload the arithmetic operators across the basic computational types using the number and type of parameters and returns.
[5a02308]	45	Like \CC, \CFA maps operators to named functions and allows these operators to be overloaded with user-defined types.
[2980ccb8]	46	The syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
[0f070a4]	47	Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type.
[7f2e87a]	48	\begin{cfa}
	49	struct S { int i, j };
	50	S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; }
	51	S s1, s2;
	52	s1 = s1 @+@ s2; $\C[1.75in]{// infix call}$
[c82dad4]	53	s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$
[7f2e87a]	54	\end{cfa}
[b4c6e10]	55	The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
[2980ccb8]	56	If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s).
[0f070a4]	57	Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type.
	58	Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
	59	Without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types.
[cdbb909]	60	This approach does not match with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
[0f070a4]	61	Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast.
[7f2e87a]	62
	63
[5a02308]	64	\subsection{Function Overloading}
[7f2e87a]	65
	66	Both \CFA and \CC allow function names to be overloaded, as long as their prototypes differ in the number and type of parameters and returns.
	67	\begin{cfa}
[cdbb909]	68	void f( void ); $\C[2in]{// (1): no parameter}$
[7f2e87a]	69	void f( char ); $\C{// (2): overloaded on the number and parameter type}$
	70	void f( int, int ); $\C{// (3): overloaded on the number and parameter type}$
	71	f( 'A' ); $\C{// select (2)}\CRT$
	72	\end{cfa}
	73	In this case, the name @f@ is overloaded depending on the number and parameter types.
	74	The type system examines each call size and selects the best match based on the number and types of the arguments.
	75	Here, there is a perfect match for the call, @f( 'A' )@ with the number and parameter type of function (2).
	76
	77	Ada, Scala, and \CFA type-systems also use the return type in resolving a call, to pinpoint the best overloaded name.
[cdbb909]	78	For example, in many programming languages with overloading, the following functions are ambiguous without using the return type.
[7f2e87a]	79	\begin{cfa}
[cdbb909]	80	int f( int ); $\C[2in]{// (1); overloaded on return type and parameter}$
	81	double f( int ); $\C{// (2); overloaded on return type and parameter}$
	82	int i = f( 3 ); $\C{// select (1)}$
	83	double d = f( 3 ); $\C{// select (2)}\CRT$
[7f2e87a]	84	\end{cfa}
[0f070a4]	85	Alternatively, if the type system uses the return type, there is an exact match for each call, which again matches with programmer intuition and expectation.
	86	This capability can be taken to the extreme, where the only differentiating factor is the return type.
[cdbb909]	87	\begin{cfa}
	88	int random( void ); $\C[2in]{// (1); overloaded on return type}$
	89	double random( void ); $\C{// (2); overloaded on return type}$
	90	int i = random(); $\C{// select (1)}$
	91	double d = random(); $\C{// select (2)}\CRT$
	92	\end{cfa}
	93	Again, there is an exact match for each call.
[0f070a4]	94	As for operator overloading, if there is no exact match, a set of minimal, an implicit conversion can be added to find a best match.
	95	\begin{cfa}
	96	short int = random(); $\C[2in]{// select (1), unsafe}$
	97	long double = random(); $\C{// select (2), safe}\CRT$
	98	\end{cfa}
[cdbb909]	99
[7f2e87a]	100
[5a02308]	101	\subsection{Variable Overloading}
[7f2e87a]	102
[cdbb909]	103	Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
[0f070a4]	104	Shadow overloading is also possible for functions, in languages supporting nested-function declarations, \eg \CC named, nested, lambda functions.
[7f2e87a]	105	\begin{cfa}
	106	void foo( double d );
[cdbb909]	107	int v; $\C[2in]{// (1)}$
[7f2e87a]	108	double v; $\C{// (2) variable overloading}$
	109	foo( v ); $\C{// select (2)}$
	110	{
	111	int v; $\C{// (3) shadow overloading}$
	112	double v; $\C{// (4) and variable overloading}$
	113	foo( v ); $\C{// select (4)}\CRT$
	114	}
	115	\end{cfa}
[b4c6e10]	116	It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
[0f070a4]	117	However, variable overloading within a scope is considered extremely dangerous, without any evidence to corroborate this claim.
[2980ccb8]	118	In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems.
[cdbb909]	119
[0f070a4]	120	In \CFA, the type system simply treats an overloaded variable as an overloaded function returning a value with no parameters.
	121	Hence, no effort is required to support this feature as it is available for differentiating among overloaded functions with no parameters.
[cdbb909]	122	\begin{cfa}
	123	int MAX = 2147483647; $\C[2in]{// (1); overloaded on return type}$
[b4c6e10]	124	long int MAX = ...; $\C{// (2); overloaded on return type}$
	125	double MAX = ...; $\C{// (3); overloaded on return type}$
[cdbb909]	126	int i = MAX; $\C{// select (1)}$
[b4c6e10]	127	long int i = MAX; $\C{// select (2)}$
	128	double d = MAX; $\C{// select (3)}\CRT$
[cdbb909]	129	\end{cfa}
[b4c6e10]	130	Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc.
	131	The result is a significant reduction in names to access typed constants.
[2980ccb8]	132
[0f070a4]	133	As an aside, C has a separate namespace for types and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.
[2980ccb8]	134	\begin{cfa}
	135	void S() {
	136	struct @S@ { int S; };
	137	@struct S@ S;
	138	void S( @struct S@ S ) { S.S = 1; };
	139	}
	140	\end{cfa}
[0f070a4]	141	Here the name @S@ is an aggregate type and field, and a variable and parameter of type @S@.
[b4c6e10]	142
	143
[5a02308]	144	\subsection{Constant Overloading}
[b4c6e10]	145
[5a02308]	146	\CFA is unique in providing restricted constant overloading for the values @0@ and @1@, which have special status in C.
	147	For example, the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
[2980ccb8]	148	In addition, several operations are defined in terms of values @0@ and @1@.
[5a02308]	149	For example, @if@ and iteration statements in C compare the condition with @0@, and the increment and decrement operators are semantically equivalent to adding or subtracting the value @1@.
[b4c6e10]	150	\begin{cfa}
[2980ccb8]	151	if ( x ) ++x; => if ( x @!= 0@ ) x @+= 1@;
	152	for ( ; x; --x ) => for ( ; x @!= 0@; x @-= 1@ )
[b4c6e10]	153	\end{cfa}
[0f070a4]	154	To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work within the special @0@ and @1@ contexts.
[b4c6e10]	155	The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
	156	\begin{cfa}
	157	struct S { int i, j; };
[5a02308]	158	void ?{}( S & s, zero_t ) { s.[i,j] = 0; } $\C{// constant constructors}$
[b4c6e10]	159	void ?{}( S & s, one_t ) { s.[i,j] = 1; }
[5a02308]	160	S ?=?( S & dst, zero_t ) { dst.[i,j] = 0; return dst; } $\C{// constant assignments}$
[bc999b7]	161	S ?=?( S & dst, one_t ) { dst.[i,j] = 1; return dst; }
[2980ccb8]	162	S ?+=?( S & s, one_t ) { s.[i,j] += 1; return s; } $\C{// increment/decrement each field}$
[bc999b7]	163	S ?-=?( S & s, one_t ) { s.[i,j] -= 1; return s; }
[5a02308]	164	int ?!=?( S s, zero_t ) { return s.i != 0 && s.j != 0; } $\C{// constant comparison}$
[2980ccb8]	165	S s = @0@; $\C{// initialization}$
	166	s = @0@; $\C{// assignments}$
[bc999b7]	167	s = @1@;
[5a02308]	168	if ( @s@ ) @++s@; $\C{// unary ++/-\,- come implicitly from +=/-=}$
[b4c6e10]	169	\end{cfa}
[2980ccb8]	170	Here, type @S@ is first-class with respect to the basic types, working with all existing implicit C mechanisms.
[cdbb909]	171
	172
	173	\section{Type Inferencing}
[eb42db4]	174	\label{s:IntoTypeInferencing}
[cdbb909]	175
[c82dad4]	176	Every variable has a type, but association between them can occur in different ways:
	177	at the point where the variable comes into existence (declaration) and/or on each assignment to the variable.
	178	\begin{cfa}
	179	double x; $\C{// type only}$
	180	float y = 3.1D; $\C{// type and initialization}$
	181	auto z = y; $\C{// initialization only}$
	182	z = "abc"; $\C{// assignment}$
	183	\end{cfa}
	184	For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
[0f070a4]	185	For type-and-initialization, the specified and initialization types may not agree requiring an implicit/explicit conversion.
[b4c6e10]	186	For initialization-only, the compiler may select the type by melding programmer and context information.
[c82dad4]	187	When the compiler participates in type selection, it is called \newterm{type inferencing}.
[0f070a4]	188	Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed variables and performs a (possibly lossy) value conversion from one type to the other.
[b4c6e10]	189	Finally, for assignment, the current variable and expression types may not agree.
[5a02308]	190	Discovering a variable or function type is complex and has limitations.
[0f070a4]	191	The following covers these issues, and why this scheme is not amenable with the \CFA type system.
[c82dad4]	192
[cdbb909]	193	One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
	194	Here, the type resolver starts with the types of the program constants used for initialization and these constant types flow throughout the program, setting all variable and expression types.
	195	\begin{cfa}
	196	auto f() {
	197	x = 1; y = 3.5; $\C{// set types from constants}$
	198	x = // expression involving x, y and other local initialized variables
	199	y = // expression involving x, y and other local initialized variables
	200	return x, y;
	201	}
	202	auto w = f(); $\C{// typing flows outwards}$
	203
	204	void f( auto x, auto y ) {
	205	x = // expression involving x, y and other local initialized variables
	206	y = // expression involving x, y and other local initialized variables
	207	}
	208	s = 1; t = 3.5; $\C{// set types from constants}$
	209	f( s, t ); $\C{// typing flows inwards}$
	210	\end{cfa}
	211	In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
[0f070a4]	212	Like template meta-programming, there can be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
[cdbb909]	213	Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
	214	in this case, parametric polymorphism can allow separate compilation.
	215	In languages with type inferencing, there is often limited overloading to reduce the search space, which introduces the naming problem.
[b4c6e10]	216	Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.
[cdbb909]	217
[b4c6e10]	218	In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages.
[cdbb909]	219	\begin{cquote}
	220	\setlength{\tabcolsep}{10pt}
	221	\begin{tabular}{@{}lll@{}}
	222	\multicolumn{1}{c}{\textbf{gcc / \CFA}} & \multicolumn{1}{c}{\textbf{\CC}} \\
	223	\begin{cfa}
	224	#define expr 3.0 * i
	225	typeof(expr) x = expr;
	226	int y;
	227	typeof(y) z = y;
	228	\end{cfa}
	229	&
	230	\begin{cfa}
	231
[2980ccb8]	232	auto x = 3.0 * i;
[cdbb909]	233	int y;
	234	auto z = y;
	235	\end{cfa}
	236	&
	237	\begin{cfa}
	238
	239	// use type of initialization expression
	240
	241	// use type of initialization expression
	242	\end{cfa}
	243	\end{tabular}
	244	\end{cquote}
	245	The two important capabilities are:
	246	\begin{itemize}[topsep=0pt]
	247	\item
	248	Not determining or writing long generic types, \eg, given deeply nested generic types.
	249	\begin{cfa}
[c82dad4]	250	typedef T1(int).T2(float).T3(char).T @ST@; $\C{// \CFA nested type declaration}$
	251	@ST@ x, y, x;
[cdbb909]	252	\end{cfa}
	253	This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
	254	\item
[0f070a4]	255	Ensuring the type of secondary variables, match a primary variable.
[cdbb909]	256	\begin{cfa}
	257	int x; $\C{// primary variable}$
	258	typeof(x) y, z, w; $\C{// secondary variables match x's type}$
	259	\end{cfa}
[b4c6e10]	260	If the type of @x@ changes, the type of the secondary variables correspondingly updates.
[0f070a4]	261	There can be strong software-engineering reasons for binding the types of these variables.
[cdbb909]	262	\end{itemize}
	263	Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
	264	\begin{cfa}
	265	int x;
	266	type(x) y = ... // complex expression
	267	type(x) z = ... // complex expression
	268	\end{cfa}
	269	Here, the types of @y@ and @z@ are fixed (branded), whereas with type inferencing, the types of @y@ and @z@ are potentially unknown.
	270
	271
[5a02308]	272	\subsection{Type-Inferencing Issues}
[cdbb909]	273
[b4c6e10]	274	Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions, or confusions.
[c82dad4]	275
	276	A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience.
	277
	278	A restriction is the conundrum in type inferencing of when to \emph{brand} a type.
[0f070a4]	279	That is, when is the type of the variable/function more important than the type of its initialization expression(s).
	280	For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors.
	281	At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes.
[c82dad4]	282	Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization.
	283	\begin{cfa}
	284	void f( @int@ x, @int@ y ) { // brand function prototype
	285	x = // expression involving x, y and other local initialized variables
	286	y = // expression involving x, y and other local initialized variables
	287	}
	288	s = 1; t = 3.5;
	289	f( s, @t@ ); // type mismatch
	290	\end{cfa}
	291	In Haskell, it is common for programmers to brand (type) function parameters.
	292
[0f070a4]	293	A confusion is blocks of code where all declarations are @auto@, as is now common in \CC.
[cdbb909]	294	As a result, understanding and changing the code becomes almost impossible.
	295	Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
[b4c6e10]	296	In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types for them.
[c82dad4]	297	For example, given:
[cdbb909]	298	\begin{cfa}
	299	auto x = @...@
	300	\end{cfa}
	301	and the need to write a routine to compute using @x@
	302	\begin{cfa}
[c82dad4]	303	void rtn( @type of x@ parm );
[cdbb909]	304	rtn( x );
	305	\end{cfa}
	306	A programmer must re-engineer the type of @x@'s initialization expression, reconstructing the possibly long generic type-name.
	307	In this situation, having the type name or its short alias is essential.
	308
[0f070a4]	309	\CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
[cdbb909]	310	Type inferencing defeats this goal because there is no left-hand type.
[c82dad4]	311	Fundamentally, type inferencing tries to magic away variable types from the programmer.
[cdbb909]	312	However, this results in lazy programming with the potential for poor performance and safety concerns.
[c82dad4]	313	Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think!
[b4c6e10]	314	A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{
	315	There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.}
[cdbb909]	316	The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
[2980ccb8]	317	Understanding space and time issues is an essential part of the programming craft.
[0f070a4]	318	Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system.
	319	Should a significant need arise, this decision can be revisited.
[cdbb909]	320
	321
	322	\section{Polymorphism}
	323
[2980ccb8]	324	\CFA provides polymorphic functions and types, where a polymorphic function can constrain types using assertions based on traits.
[18a7dcf1]	325
	326
[2980ccb8]	327	\subsection{Polymorphic Function}
	328
	329	The signature feature of the \CFA type-system is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96}, generalized using a @forall@ clause (giving the language its name).
[18a7dcf1]	330	\begin{cfa}
	331	@forall( T )@ T identity( T val ) { return val; }
	332	int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$
	333	\end{cfa}
[2980ccb8]	334	This @identity@ function can be applied to an \newterm{object type}, \ie a type with a known size and alignment, which is sufficient to stack allocate, default or copy initialize, assign, and delete.
[5a02308]	335	The \CFA implementation passes the size and alignment for each type parameter, as well as auto-generated default and copy constructors, assignment operator, and destructor.
[2980ccb8]	336	For an incomplete \newterm{data type}, \eg pointer/reference types, this information is not needed.
[18a7dcf1]	337	\begin{cfa}
[2980ccb8]	338	forall( T * ) T * identity( T * val ) { return val; }
	339	int i, * ip = identity( &i );
[18a7dcf1]	340	\end{cfa}
[2980ccb8]	341	Unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
[18a7dcf1]	342
[2980ccb8]	343	To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable.
[0f070a4]	344	Here, the function @twice@ works for any type @T@ with a matching addition operator.
[18a7dcf1]	345	\begin{cfa}
[2980ccb8]	346	forall( T @\| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; }
	347	int val = twice( twice( 3 ) ); $\C{// val == 12}$
[18a7dcf1]	348	\end{cfa}
[0f070a4]	349	Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) C functions, like @qsort@ for sorting an array of unknown values.
[18a7dcf1]	350	\begin{cfa}
[2980ccb8]	351	void qsort( void * base, size_t nmemb, size_t size, int (cmp)( const void , const void * ) );
	352	\end{cfa}
	353	Here, the polymorphism is type-erasure, and the parametric assertion is the comparison routine, which is explicitly passed.
	354	\begin{cfa}
	355	enum { N = 5 };
	356	double val[N] = { 5.1, 4.1, 3.1, 2.1, 1.1 };
	357	int cmp( const void * v1, const void * v2 ) { $\C{// compare two doubles}$
	358	return (double )v1 < (double )v2 ? -1 : (double )v2 < (double )v1 ? 1 : 0;
[18a7dcf1]	359	}
[2980ccb8]	360	qsort( val, N, sizeof( double ), cmp );
[18a7dcf1]	361	\end{cfa}
[2980ccb8]	362	The equivalent type-safe version in \CFA is a wrapper over the C version.
[18a7dcf1]	363	\begin{cfa}
[2980ccb8]	364	forall( ET \| { int @?<?@( ET, ET ); } ) $\C{// type must have < operator}$
	365	void qsort( ET * vals, size_t dim ) {
	366	int cmp( const void * t1, const void * t2 ) { $\C{// nested function}$
	367	return (ET )t1 @<@ (ET )t2 ? -1 : (ET )t2 @<@ (ET )t1 ? 1 : 0;
	368	}
	369	qsort( vals, dim, sizeof(ET), cmp ); $\C{// call C version}$
	370	}
	371	qsort( val, N ); $\C{// deduct type double, and pass builtin < for double}$
[18a7dcf1]	372	\end{cfa}
[2980ccb8]	373	The nested function @cmp@ is implicitly built and provides the interface from typed \CFA to untyped (@void *@) C.
	374	Providing a hidden @cmp@ function in \CC is awkward as lambdas do not use C calling conventions and template declarations cannot appear in block scope.
	375	% In addition, an alternate kind of return is made available: position versus pointer to found element.
	376	% \CC's type system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a template @bsearch@.
	377	Call-site inferencing and nested functions provide a localized form of inheritance.
	378	For example, the \CFA @qsort@ can be made to sort in descending order by locally changing the behaviour of @<@.
[18a7dcf1]	379	\begin{cfa}
[2980ccb8]	380	{
	381	int ?<?( double x, double y ) { return x @>@ y; } $\C{// locally override behaviour}$
[18a7dcf1]	382	qsort( vals, 10 ); $\C{// descending sort}$
	383	}
	384	\end{cfa}
[2980ccb8]	385	The local version of @?<?@ overrides the built-in @?<?@ so it is passed to @qsort@.
	386	The local version performs @?>?@, making @qsort@ sort in descending order.
	387	Hence, any number of assertion functions can be overridden locally to maximize the reuse of existing functions and types, without the construction of a named inheritance hierarchy.
	388	A final example is a type-safe wrapper for C @malloc@, where the return type supplies the type/size of the allocation, which is impossible in most type systems.
[18a7dcf1]	389	\begin{cfa}
[2980ccb8]	390	static inline forall( T & \| sized(T) )
	391	T * malloc( void ) {
	392	if ( _Alignof(T) <= __BIGGEST_ALIGNMENT__ ) return (T *)malloc( sizeof(T) ); // C allocation
	393	else return (T *)memalign( _Alignof(T), sizeof(T) );
[18a7dcf1]	394	}
[2980ccb8]	395	// select type and size from left-hand side
[0f070a4]	396	int * ip = malloc(); double * dp = malloc(); [[aligned(64)]] struct S {...} * sp = malloc();
[18a7dcf1]	397	\end{cfa}
[2980ccb8]	398	The @sized@ assertion passes size and alignment as a data object has no implicit assertions.
	399	Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@.
[0f070a4]	400	In practise, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@.
[2980ccb8]	401
[0f070a4]	402	This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions.
	403	Here, existing C legacy code is leveraged as much as possible;
[2980ccb8]	404	other programming languages must build supporting libraries from scratch, even in \CC.
[18a7dcf1]	405
	406
	407	\subsection{Traits}
	408
	409	\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
	410	\begin{cquote}
[2980ccb8]	411	\begin{tabular}{@{}l\|@{\hspace{10pt}}l@{}}
[18a7dcf1]	412	\begin{cfa}
	413	trait @sumable@( T ) {
	414	void @?{}@( T &, zero_t ); // 0 literal constructor
[2980ccb8]	415	T ?+?( T, T ); // assortment of additions
[18a7dcf1]	416	T @?+=?@( T &, T );
	417	T ++?( T & );
	418	T ?++( T & );
	419	};
	420	\end{cfa}
	421	&
	422	\begin{cfa}
	423	forall( T @\| sumable( T )@ ) // use trait
	424	T sum( T a[$\,$], size_t size ) {
	425	@T@ total = { @0@ }; // initialize by 0 constructor
	426	for ( size_t i = 0; i < size; i += 1 )
	427	total @+=@ a[i]; // select appropriate +
	428	return total;
	429	}
	430	\end{cfa}
	431	\end{tabular}
	432	\end{cquote}
[0f070a4]	433	Traits are implemented by flatten them at use points, as if written in full by the programmer.
	434	Flattening often results in overlapping assertions, \eg operator @+@.
[2980ccb8]	435	Hence, trait names play no part in type equivalence.
[0f070a4]	436	In the example, type @T@ is an object type, and hence, has the implicit internal trait @otype@.
[18a7dcf1]	437	\begin{cfa}
[2980ccb8]	438	trait otype( T & \| sized(T) ) {
[18a7dcf1]	439	void ?{}( T & ); $\C{// default constructor}$
	440	void ?{}( T &, T ); $\C{// copy constructor}$
	441	void ?=?( T &, T ); $\C{// assignment operator}$
	442	void ^?{}( T & ); $\C{// destructor}$
	443	};
	444	\end{cfa}
[0f070a4]	445	These implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
[18a7dcf1]	446
[5a02308]	447	If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting.
	448	This case is generalized in polymorphic container-types, such as a list with @insert@ and @remove@ operations, and an element type with copy and assignment.
	449
[18a7dcf1]	450
	451	\subsection{Generic Types}
	452
	453	A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
	454	Broadly speaking, there are three approaches to implement abstract data structures in C.
[2980ccb8]	455	\begin{enumerate}[leftmargin=*]
	456	\item
[0f070a4]	457	Write bespoke data structures for each context.
[2980ccb8]	458	While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures.
	459	\item
	460	Use @void *@-based polymorphism, \eg the C standard library functions @bsearch@ and @qsort@, which allow for the reuse of code with common functionality.
	461	However, this approach eliminates the type checker's ability to ensure argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary.
	462	\item
	463	Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret.
[0f070a4]	464	Furthermore, writing and using complex preprocessor macros is difficult and inflexible.
[2980ccb8]	465	\end{enumerate}
	466
	467	\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
[18a7dcf1]	468	\CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation.
[0f070a4]	469	For concrete parameters, the generic-type definition can be inlined, like \CC templates, if its definition appears in a header file (\eg @static inline@).
[18a7dcf1]	470
	471	A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name.
	472	\begin{cquote}
[2980ccb8]	473	\begin{tabular}{@{}l\|@{\hspace{10pt}}l@{}}
[18a7dcf1]	474	\begin{cfa}
[2980ccb8]	475	@forall( F, S )@ struct pair {
	476	F first; S second;
[18a7dcf1]	477	};
[2980ccb8]	478	@forall( F, S )@ // object
	479	S second( pair( F, S ) p ) { return p.second; }
	480	@forall( F , S )@ // sized
	481	S * second( pair( F , S ) p ) { return p.second; }
[18a7dcf1]	482	\end{cfa}
	483	&
	484	\begin{cfa}
[2980ccb8]	485	pair( double, int ) dpr = { 3.5, 42 };
	486	int i = second( dpr );
	487	pair( void , int ) vipr = { 0p, &i };
	488	int * ip = second( vipr );
[18a7dcf1]	489	double d = 1.0;
[2980ccb8]	490	pair( int , double ) idpr = { &i, &d };
	491	double * dp = second( idpr );
[18a7dcf1]	492	\end{cfa}
	493	\end{tabular}
	494	\end{cquote}
[2980ccb8]	495	\CFA generic types are \newterm{fixed} or \newterm{dynamic} sized.
[0f070a4]	496	Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters.
[2980ccb8]	497	For example, the type variable @T @ is fixed size and is represented by @void @ in code generation;
	498	whereas, the type variable @T@ is dynamic and set at the point of instantiation.
	499	The difference between fixed and dynamic is the complexity and cost of field access.
	500	For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
[0f070a4]	501	For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
[2980ccb8]	502	See~\cite[\S~3.2]{Moss19} for complete implementation details.
	503
	504	Currently, \CFA generic types allow assertion.
	505	For example, the following declaration of a sorted set-type ensures the set key supports equality and relational comparison.
	506	\begin{cfa}
	507	forall( Elem, @Key@ \| { _Bool ?==?( Key, Key ); _Bool ?<?( Key, Key ); } )
	508	struct Sorted_Set { Elem elem; @Key@ key; ... };
	509	\end{cfa}
	510	However, the operations that insert/remove elements from the set should not appear as part of the generic-types assertions.
	511	\begin{cfa}
	512	forall( @Elem@ \| /* any assertions on element type */ ) {
	513	void insert( Sorted_Set set, @Elem@ elem ) { ... }
	514	bool remove( Sorted_Set set, @Elem@ elem ) { ... } // false => element not present
	515	... // more set operations
	516	} // distribution
	517	\end{cfa}
	518	(Note, the @forall@ clause can be distributed across multiple functions.)
	519	For software-engineering reasons, the set assertions would be refactored into a trait to allow alternative implementations, like a Java \lstinline[language=java]{interface}.
	520
	521	In summation, the \CFA type system inherits \newterm{nominal typing} for concrete types from C, and adds \newterm{structural typing} for polymorphic types.
	522	Traits are used like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance relationships.
	523	Instead, each polymorphic function or generic type defines the structural type needed for its execution, which is fulfilled at each call site from the lexical environment, like Go~\cite{Go} or Rust~\cite{Rust} interfaces.
	524	Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal inheritance hierarchy.
[cdbb909]	525
	526
	527	\section{Contributions}
[07dbcba]	528
[0f070a4]	529	The \CFA compiler performance and type capability have been greatly improved through my development work.
[2980ccb8]	530	\begin{enumerate}
[0f070a4]	531	\item
	532	The compilation time of various \CFA library units and test programs has been reduced by an order of magnitude, from minutes to seconds \see{\VRef[Table]{t:SelectedFileByCompilerBuild}}, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features.
	533	The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis.
	534	\item
	535	The thesis presents a systematic review of the new features added to the \CFA language and its type system.
	536	Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance.
	537	Several issues coming from the interactions of various language features are identified and discussed in this thesis;
	538	some of them I have resolved, while others are given temporary fixes and need to be reworked in the future.
	539	\item
	540	Finally, this thesis provides constructive ideas for fixing a number of high-level issues in the \CFA language design and implementation, and gives a path for future improvements to the language and compiler.
[2980ccb8]	541	\end{enumerate}
[07dbcba]	542
	543
	544	\begin{comment}
	545	From: Andrew James Beach <ajbeach@uwaterloo.ca>
	546	To: Peter Buhr <pabuhr@uwaterloo.ca>, Michael Leslie Brooks <mlbrooks@uwaterloo.ca>,
	547	Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>
	548	Subject: Re: Haskell
	549	Date: Fri, 30 Aug 2024 16:09:06 +0000
	550
	551	Do you mean:
	552
	553	one = 1
	554
	555	And then write a bunch of code that assumes it is an Int or Integer (which are roughly int and Int in Cforall) and then replace it with:
	556
	557	one = 1.0
	558
	559	And have that crash? That is actually enough, for some reason Haskell is happy to narrow the type of the first literal (Num a => a) down to Integer but will not do the same for (Fractional a => a) and Rational (which is roughly Integer for real numbers). Possibly a compatibility thing since before Haskell had polymorphic literals.
	560
	561	Now, writing even the first version will fire a -Wmissing-signatures warning, because it does appear to be understood that just from a documentation perspective, people want to know what types are being used. Now, if you have the original case and start updating the signatures (adding one :: Fractional a => a), you can eventually get into issues, for example:
	562
	563	import Data.Array (Array, Ix, (!))
	564	atOne :: (Ix a, Frational a) => Array a b -> b - - In CFA: forall(a \| Ix(a) \| Frational(a), b) b atOne(Array(a, b) const & array)
	565	atOne = (! one)
	566
	567	Which compiles and is fine except for the slightly awkward fact that I don't know of any types that are both Ix and Fractional types. So you might never be able to find a way to actually use that function. If that is good enough you can reduce that to three lines and use it.
	568
	569	Something that just occurred to me, after I did the above examples, is: Are there any classic examples in literature I could adapt to Haskell?
	570
	571	Andrew
	572
	573	PS, I think it is too obvious of a significant change to work as a good example but I did mock up the structure of what I am thinking you are thinking about with a function. If this helps here it is.
	574
	575	doubleInt :: Int -> Int
	576	doubleInt x = x * 2
	577
	578	doubleStr :: String -> String
	579	doubleStr x = x ++ x
	580
	581	-- Missing Signature
	582	action = doubleInt - replace with doubleStr
	583
	584	main :: IO ()
	585	main = print $ action 4
	586	\end{comment}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: