Context Navigation

source: doc/theses/jiada_liang_MMath/background.tex @ c141c09

Last change on this file since c141c09 was 21f4dff, checked in by JiadaL <j82liang@…>, 4 months ago
Add motivation for trait
Property mode set to `100644`
File size: 22.6 KB

Line
1	\chapter{Background}
2
3	\vspace*{-8pt}
4
5	\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
6	The following discussion covers C enumerations.
7
8	As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
9	\begin{clang}
10	#define Mon 0
11	static const int Mon = 0;
12	enum { Mon };
13	\end{clang}
14	\begin{enumerate}[leftmargin=*]
15	\item
16	For @#define@, the programmer has to explicitly manage the constant name and value.
17	Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
18	\item
19	The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
20	C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage.
21	\begin{clang}
22	$\$$ nm test.o
23	0000000000000018 r Mon
24	\end{clang}
25	\item
26	Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
27	\end{enumerate}
28
29
30	\section{C \lstinline{const}}
31	\label{s:Cconst}
32
33	C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
34	\begin{cquote}
35	\begin{tabular}{@{}ll@{}}
36	\multicolumn{1}{@{}c}{\textbf{static initialization}} & \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
37	\begin{clang}
38	static const int one = 0 + 1;
39	static const void * NIL = NULL;
40	static const double PI = 3.14159;
41	static const char Plus = '+';
42	static const char * Fred = "Fred";
43	static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44	Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45	\end{clang}
46	&
47	\begin{clang}
48	void foo() {
49	// auto scope only
50	const int r = random() % 100;
51	int va[r];
52	}
53	\end{clang}
54	\end{tabular}
55	\end{cquote}
56	However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
57	Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
58	Again, this form of aliasing is not an enumeration.
59
60	\section{C Enumeration}
61	\label{s:CEnumeration}
62
63	The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
64	\begin{clang}[identifierstyle=\linespread{0.9}\it]
65	$\it enum$-specifier:
66	enum identifier$\(_{opt}\)$ { enumerator-list }
67	enum identifier$\(_{opt}\)$ { enumerator-list , }
68	enum identifier
69	enumerator-list:
70	enumerator
71	enumerator-list , enumerator
72	enumerator:
73	enumeration-constant
74	enumeration-constant = constant-expression
75	\end{clang}
76	The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
77	The C enumeration semantics are discussed using examples.
78
79
80	\subsection{Type Name}
81	\label{s:TypeName}
82
83	An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
84	However, it is restricted to integral values.
85	\begin{clang}
86	enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
87	\end{clang}
88	Here, the aliased constants are: 20, 10, 20, 21, and -7.
89	Direct initialization is by a compile-time expression generating a constant value.
90	Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
91	Because multiple independent enumerators can be combined, enumerators with the same values can occur.
92	The enumerators are rvalues, so assignment is disallowed.
93	Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
94	For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
95
96	As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
97	While the semantics is misleading, this enumeration form matches with aggregate types:
98	\begin{cfa}
99	typedef struct @/* unnamed */@ { ... } S;
100	struct @/* unnamed */@ { ... } x, y, z; $\C{// questionable}$
101	struct S {
102	union @/* unnamed */@ { $\C{// unscoped fields}$
103	int i; double d ; char ch;
104	};
105	};
106	\end{cfa}
107	Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
108
109	A \emph{named} enumeration is an enumeration:
110	\begin{clang}
111	enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
112	\end{clang}
113	and adopts the same semantics with respect to direct and auto intialization.
114	For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
115	As well, initialization may occur in any order.
116	\begin{clang}
117	enum Week {
118	Thu@ = 10@, Fri, Sat, Sun,
119	Mon@ = 0@, Tue, Wed@,@ $\C{// terminating comma}$
120	};
121	\end{clang}
122	Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
123	A terminating comma appears in other C syntax, \eg the initializer list.}
124	This feature allow enumerator lines to be interchanged without moving a comma.
125	Named enumerators are also unscoped.
126
127
128	\subsection{Implementation}
129	\label{s:CenumImplementation}
130
131	In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
132	In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
133	However, type @int@ is defined as:
134	\begin{quote}
135	A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
136	\end{quote}
137	Howeveer, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
138	Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
139	In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization.
140	\begin{cfa}
141	enum E { IMin = INT_MIN, IMax = INT_MAX,
142	ILMin = LONG_MIN, ILMax = LONG_MAX,
143	ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
144	int main() {
145	printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
146	sizeof(IMin), IMin, IMax,
147	sizeof(ILMin), ILMin, ILMax,
148	sizeof(ILLMin), ILLMin, ILLMax );
149	}
150	4 -2147483648 2147483647
151	8 -9223372036854775808 9223372036854775807
152	8 -9223372036854775808 9223372036854775807
153	\end{cfa}
154	Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes.
155
156	\subsection{Usage}
157	\label{s:Usage}
158
159	C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type, and between two different enumeration.
160	\begin{clang}
161	enum Week week = Mon; $\C{// week == 0}$
162	week = Fri; $\C{// week == 11}$
163	int i = Sun; $\C{// implicit conversion to int, i == 13}$
164	@week = 10000;@ $\C{// UNDEFINED! implicit conversion to Week}$
165
166	enum Season {Spring, Summer, Fall, Winter };
167	@week = Winter;@ $\C{// UNDEFINED! implicit conversion to Week}$
168	\end{clang}
169	While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
170
171	Enumerators can appear in @switch@ and looping statements.
172	\begin{cfa}
173	enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
174	switch ( week ) {
175	case Mon ... Fri: $\C{// gcc case range}$
176	printf( "weekday\n" );
177	case Sat: case Sun:
178	printf( "weekend\n" );
179	}
180	for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
181	printf( "day %d\n", day ); // 0-6
182	}
183	\end{cfa}
184	For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
185	For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
186	Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
187	For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
188
189	There is a C idiom to automatically compute the number of enumerators in an enumeration.
190	\begin{cfa}
191	enum E { A, B, C, D, @N@ }; // N == 4
192	for ( enum E e = A; e < @N@; e += 1 ) ...
193	\end{cfa}
194	Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
195	@N@ is often used as the dimension for an array assocated with the enumeration.
196	\begin{cfa}
197	E array[@N@];
198	for ( enum E e = A; e < N; e += 1 ) {
199	array[e] = e;
200	}
201	\end{cfa}
202	However, for non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
203
204	This idiom is used in another C idiom for matching companion information.
205	For example, an enumeration is linked with a companion array of printable strings.
206	\begin{cfa}
207	enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
208	char * Integral_Name[@NO_OF_ITYPES@] = {
209	"char", "signed char", "unsigned char",
210	"signed short int", "unsigned short int",
211	"signed int", "unsigned int", ...
212	};
213	enum Integral_Type integral_type = ...
214	printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
215	\end{cfa}
216	However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
217	The need to harmonize is at best indicated by a comment before the enumeration.
218	This issue is exacerbated if enumeration and companion array are in different translation units.
219
220	\bigskip
221	While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages.
222
223	\section{\CFA Polymorphism}
224	\subsection{Function Overloading}
225	Function overloading is programming languages feature wherein functions may share the same name, but with different function signatures. In both C++ and \CFA, function names can be overloaded
226	with different entities as long as they are different in terms of the number and type of parameters.
227
228	\begin{cfa}
229	void f(); // (1)
230	void f(int); // (2); Overloaded on the number of parameters
231	void f(char); // (3); Overloaded on parameter type
232
233	f('A');
234	\end{cfa}
235	In this case, the name f is overloaded with a nullity function and two arity functions with different parameters types. Exactly which precedures being executed
236	is determined based on the passing arguments. The last expression of the preceding example calls f with one arguments, narrowing the possible candidates down to (2) and (3).
237	Between those, function argument 'A' is an exact match to the parameter expected by (3), while needing an @implicit conversion@ to call (2). The compiler determines (3) is the better candidates among
238	and procedure (3) is being executed.
239
240	\begin{cfa}
241	int f(int); // (4); Overloaded on return type
242	[int, int] f(int); // (5) Overloaded on the number of return value
243	\end{cfa}
244	The function declarations (4) and (5) show the ability of \CFA functions overloaded with different return value, a feature that is not shared by C++.
245
246
247	\subsection{Operator Overloading}
248	Operators in \CFA are specialized function and are overloadable by with specially-named functions represents the syntax used to call the operator.
249	% For example, @bool ?==?T(T lhs, T rhs)@ overloads equality operator for type T, where @?@ is the placeholders for operands for the operator.
250	\begin{cfa}
251	enum Weekday { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday };
252	bool ?<?(const Weekday a, const Weekday b) {
253	return ((int)a + 1);
254	}
255	Monday < Sunday; // False
256	?<?( Monday, Sunday ); // Equivalent syntax
257	\end{cfa}
258	Unary operators are functions that takes one argument and have name @operator?@ or @?operator@, where @?@ is the placeholders for operands.
259	Binary operators are function with two parameters. They are overloadable with function name @?operator?@.
260
261	\subsection{Constructor and Destructor}
262	In \CFA, all objects are initialized by @constructors@ during its allocation, including basic types,
263	which are initialized by auto-generated basic type constructors.
264
265	Constructors are overloadable functions with name @?{}@, return @void@, and have at least one parameter, which is a reference
266	to the object being constructored (Colloquially referred to "this" or "self" in other language).
267
268	\begin{cfa}
269	struct Employee {
270	const char * name;
271	double salary;
272	};
273
274	void ?{}( Employee& this, const char * name, double salary ) {
275	this.name = name;
276	this.salary = salary;
277	}
278
279	Employee Sara { "Sara Schmidt", 20.5 };
280	\end{cfa}
281	Like Python, the "self" reference is implicitly passed to a constructor. The Employee constructors takes two additional arugments used in its
282	field initialization.
283
284	A destructor in \CFA is a function that has name @^?{}@. It returns void, and take only one arugment as its "self".
285	\begin{cfa}
286	void ^?{}( Employee& this ) {
287	free(this.name);
288	this.name = 0p;
289	this.salary = 0;
290	}
291	\end{cfa}
292	Destructor can be explicitly evoked as a function call, or implicitly called at the end of the block in which the object is delcared.
293	\begin{cfa}
294	{
295	^Sara{};
296	Sara{ "Sara Craft", 20 };
297	} // ^Sara{}
298	\end{cfa}
299
300	\subsection{Variable Overloading}
301	C and C++ disallow more than one variable declared in the same scope with the same name. When a variable declare in a inner scope has the same name as
302	a variable in an outer scope, the outer scope variable is "shadowed" by the inner scope variable and cannot be accessed directly.
303
304	\CFA has variable overloading: multiple variables can share the same name in the same scope, as long as they have different type. Name shadowing only
305	happens when the inner scope variable and the outer scope ones have the same type.
306	\begin{cfa}
307	double i = 6.0;
308	int i = 5;
309	void foo( double i ) { sout \| i; } // 6.0
310	\end{cfa}
311
312	\subsection{Special Literals}
313	Literal 0 has special meanings within different contexts: it can means "nothing" or "empty", an additive identity in arithmetic, a default value as in C (null pointer),
314	or an initial state.
315	Awaring of its significance, \CFA provides a special type for the 0 literal, @zero_t@, to define the logical @zero@ for custom types.
316	\begin{cfa}
317	struct S { int i, j; };
318	void ?{}( S & this, @zero_t@ ) { this.i = 0; this.j = 0; } // zero_t, no parameter name allowed
319	S s0 = @0@;
320	\end{cfa}
321	Overloading @zero_t@ for S provides new definition for @zero@ of type S.
322
323	According to the C standard, @0@ is the @only@ false value. Any values compares equals to @0@ is false, and not euqals @0@ is true. As a consequence, control structure
324	such as @if()@ and @while()@ only runs it true clause when its predicate @not equals@ to @0@.
325
326	\CFA generalizes this concept and allows to logically overloads the boolean value for any type by overloading @not equal@ comparison against @zero_t@.
327	\begin{cfa}
328	int ?@!=@?( S this, @zero_t@ ) { return this.i != 0 && this.j != 0; }
329	\end{cfa}
330
331	% In C, the literal 0 represents the Boolean value false. The expression such as @if (x)@ is equivalent to @if (x != 0)@ .
332	% \CFA allows user to define the logical zero for a custom type by overloading the @!=@ operation against a special type, @zero_t@,
333	% so that an expression with the custom type can be used as a predicate without the need of conversion to the literal 0.
334	% \begin{cfa}
335	% struct S s;
336	% int ?!=?(S, zero_t);
337	% if (s) {}
338	% \end{cfa}
339	Literal 1 is also special. Particularly in C, the pre-increment operator and post-increment operator can be interpreted in terms of @+= 1@.
340	The logical @1@ in \CFA is represented by special type @one_t@.
341	\begin{cfa}
342	void ?{}( S & this, one_t ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed
343	S & ?+=?( S & this, one_t ) { this.i += 1; this.j += 1; return op; }
344	\end{cfa}
345	Without explictly overloaded by a user, \CFA uses the user-defined @+=(S&, one_t)@ to interpret @?++@ and @++?@, as both are polymorphic functions in \CFA.
346
347	\subsection{Polymorphics Functions}
348	Parametric-Polymorphics functions are the functions that applied to all types. \CFA functions are parametric-polymorphics when
349	they are written with the @forall@ clause.
350
351	\begin{cfa}
352	forall(T)
353	T identity(T x) { return x; }
354	identity(42);
355	\end{cfa}
356	The identity function accepts a value from any type as an arugment, and the type parameter @T@ is bounded to @int@ when the function
357	is called with 42.
358
359	The forall clause can takes @type assertions@ that restricts the polymorphics type.
360	\begin{cfa}
361	forall( T \| { void foo(T); } )
362	void bar(T t) { foo(t); }
363
364	struct S {} s;
365	void foo(struct S);
366
367	bar(s);
368	\end{cfa}
369	The assertion on @T@ restricts the range of types for bar to only those implements foo with the matching a signature, so that bar()
370	can call @foo@ in its body with type safe.
371	Calling on type with no mathcing @foo()@ implemented, such as int, causes a compile time type assertion error.
372
373	\subsection{trait}
374	A @forall@ clause can asserts on multiple types and with multiple asserting functions. A common practice in \CFA is to group
375	the asserting functions in to a named \newterm{trait}.
376
377	\begin{cfa}
378	trait Bird(T) {
379	int days_can_fly(T i);
380	void fly(T t);
381	};
382
383	forall(B \| Bird(B)) {
384	void bird_fly(int days_since_born, B bird) {
385	if (days_since_born > days_can_fly(bird)) {
386	fly(bird);
387	}
388	}
389	}
390
391	struct Robin {} r;
392	int days_can_fly(Robin r) { return 23; }
393	void fly(Robin r) {}
394
395	bird_fly( r );
396	\end{cfa}
397
398	Grouping type assertions into named trait effectively create a reusable interface for parametrics polymorphics types.
399
400	\section{Expression Resolution}
401
402	The overloading feature poses a challenge in \CFA expression resolution. Overloadeded identifiers can refer multiple
403	candidates, with multiples being simultaneously valid. The main task of \CFA resolver is to identity a best candidate that
404	involes less implicit conversion and polymorphism.
405
406	\subsection{Conversion Cost}
407	\label{s:ConversionCost}
408	In C, function call arguments and function parameters do not need to be a exact match. When types mismatch, C performs an \newterm{implicit conversion}
409	on argument to parameter type. The process is trivial with the exception on binary operators; When types of operands are different,
410	C nees to decide which operands need implicit conversion. C defines the resolution pattern as "usual arithmetic conversion",
411	in which C looks for a \newterm{common type} between operands, and convert either one or both operands to the common type.
412	Loosely defined, a common type is a the smallest type in terms of size of representation that both operands can be converted into without losing their precision.
413	Such conversion is called "widening" or "safe conversion".
414
415	C binary operators can be restated as 2-arity functions that overloaded with different parameters. "Usual arithmetic conversion" is to find a overloaded
416	instance that for both arguments, the conversion to parameter type is a widening conversion to the smallest type.
417
418	\CFA generalizes "usual arithmetic conversion" to \newterm{conversion cost}. In the first design by Bilson, conversion cost is a 3-tuple,
419	@(unsafe, poly, safe)@, where @unsafe@ the number of unsafe (narrorow conversion) from argument to parameter,
420	@poly@ is the number of polymorphic function parameter,
421	and @safe@ is sum of degree of safe (widening) conversion.
422	Sum of degree is a method to quantify C's integer and floating-point rank.
423	Every pair of widening conversion types has been assigned with a \newterm{distance}, and distance between the two same type is 0.
424	For example, the distance from char to int is 2, distance from integer to long is 1, and distance from int to long long int is 2.
425	The distance does not mirror C's rank system. For example, the rank of char and signed char are the same in C, but the distance from char to signed char is assigned with 1.
426	@safe@ cost is summing all pair of argument to parameter safe conversion distance.
427	Among the three in Bilson's model, @unsafe@ is the most significant cost and @safe@ is the least significant one, with an implication that \CFA always choose
428	a candidate with the lowest @unsafe@ if possible.
429
430	Assume the overloaded function @foo@ is called with two @int@ parameter. The cost for every overloaded @foo@ has been list along:
431	\begin{cfa}
432	void foo(char, char); // (2, 0, 0)
433	void foo(char, int); // (1, 0, 0)
434	forall(T, V) void foo(T, V); // (0, 2, 0)
435	forall(T) void foo(T, T); // (0, 2, 0)
436	forall(T) void foo(T, int); // (0, 1, 0)
437	void foo(long long, long); // (0, 0, 3)
438	void foo(long, long); // (0, 0, 2)
439	void foo(int, long); // (0, 0, 1)
440
441	int i, j; foo(i, j);
442	\end{cfa}
443	The overloaded instances are ordered from the highest to the lowest cost, and \CFA select the last candidate.
444
445	In the later iteration of \CFA, Schluntz and Aaron expanded conversion cost to a 7-tuple with 4 additional categories,
446	@@(unsafe, poly, safe, sign, vars, specialization, reference)@@.
447	with interpretation listed below:
448	\begin{itemize}
449	\item Unsafe
450	\item Poly
451	\item Safe
452	\item Sign is the number of sign/unsign variable conversion.
453	\item Vars is the number of polymorphics type variable.
454	\item Specialization is negative value of the number of type assertion.
455	\item Reference is number of reference-to-rvalue conversion.
456	\end{itemize}
457	The extended conversion cost models looks for candidates that are more specific and less generic.
458	@Var@s was introduced by Aaron to disambugate @forall(T, V) void foo(T, V)@ and @forall(T) void foo(T, T)@. The extra type parameter @V@
459	makes it more generic and less specific. More generic type means less constraints on types of its parameters. \CFA is in favor of candidates with more
460	restrictions on polymorphism so @forall(T) void foo(T, T)@ has lower cost. @Specialization@ is a value that always less than or equal to zero. For every type assertion in @forall@ clause, \CFA subtracts one from
461	@specialization@, starting from zero. More type assertions often means more constraints on argument type, and making the function less generic.
462
463	\CFA defines two special cost value: @zero@ and @infinite@ A conversion cost is @zero@ when argument and parameter has exact match, and a conversion cost is @infinite@ when
464	there is no defined conversion between two types. For example, the conversion cost from int to a struct type S is @infinite@.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: