Context Navigation

source: doc/theses/jiada_liang_MMath/background.tex @ e561551

Last change on this file since e561551 was e561551, checked in by JiadaL <j82liang@…>, 6 weeks ago
Save current progress for pull
Property mode set to `100644`
File size: 18.5 KB

Line
1	\chapter{Background}
2
3	\vspace*{-8pt}
4
5	\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
6	The following discussion covers C enumerations.
7
8	As discussed in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
9	\begin{clang}
10	#define Mon 0
11	static const int Mon = 0;
12	enum { Mon };
13	\end{clang}
14	\begin{enumerate}[leftmargin=*]
15	\item
16	For @#define@, the programmer has to explicitly manage the constant name and value.
17	Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
18	\item
19	The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
20	C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage.
21	\begin{clang}
22	$\$$ nm test.o
23	0000000000000018 r Mon
24	\end{clang}
25	\item
26	Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
27	\end{enumerate}
28
29
30	\section{C \lstinline{const}}
31	\label{s:Cconst}
32
33	C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
34	\begin{cquote}
35	\begin{tabular}{@{}l@{}l@{}}
36	\multicolumn{1}{@{}c@{}}{\textbf{static initialization}} & \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
37	\begin{clang}
38	static const int one = 0 + 1;
39	static const void * NIL = NULL;
40	static const double PI = 3.14159;
41	static const char Plus = '+';
42	static const char * Fred = "Fred";
43	static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44	Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45	\end{clang}
46	&
47	\begin{clang}
48	void foo() {
49	// auto scope only
50	const int r = random() % 100;
51	int va[r];
52	}
53
54
55	\end{clang}
56	\end{tabular}
57	\end{cquote}
58	However, statically initialized identifiers can not appear in constant-expression contexts, \eg @case@.
59	Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
60	Again, this form of aliasing is not an enumeration.
61
62
63	\section{C Enumeration}
64	\label{s:CEnumeration}
65
66	The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
67	\begin{clang}[identifierstyle=\linespread{0.9}\it]
68	$\it enum$-specifier:
69	enum identifier$\(_{opt}\)$ { enumerator-list }
70	enum identifier$\(_{opt}\)$ { enumerator-list , }
71	enum identifier
72	enumerator-list:
73	enumerator
74	enumerator-list , enumerator
75	enumerator:
76	enumeration-constant
77	enumeration-constant = constant-expression
78	\end{clang}
79	The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
80	The C enumeration semantics are discussed using examples.
81
82
83	\subsection{Type Name}
84	\label{s:TypeName}
85
86	An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
87	However, it is restricted to integral values.
88	\begin{clang}
89	enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
90	\end{clang}
91	Here, the aliased constants are: 20, 10, 20, 21, and -7.
92	Direct initialization is by a compile-time expression generating a constant value.
93	Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
94	Because multiple independent enumerators can be combined, enumerators with the same values can occur.
95	The enumerators are rvalues, so assignment is disallowed.
96	Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
97	For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
98
99	As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
100	While the semantics is misleading, this enumeration form matches with aggregate types:
101	\begin{cfa}
102	typedef struct @/* unnamed */@ { ... } S;
103	struct @/* unnamed */@ { ... } x, y, z; $\C{// questionable}$
104	struct S {
105	union @/* unnamed */@ { $\C{// unscoped fields}$
106	int i; double d ; char ch;
107	};
108	};
109	\end{cfa}
110	Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
111
112	A \emph{named} enumeration is an enumeration:
113	\begin{clang}
114	enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
115	\end{clang}
116	and adopts the same semantics with respect to direct and auto intialization.
117	For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
118	As well, initialization may occur in any order.
119	\begin{clang}
120	enum Week {
121	Thu@ = 10@, Fri, Sat, Sun,
122	Mon@ = 0@, Tue, Wed@,@ $\C{// terminating comma}$
123	};
124	\end{clang}
125	Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
126	A terminating comma appears in other C syntax, \eg the initializer list.}
127	This feature allow enumerator lines to be interchanged without moving a comma.
128	Named enumerators are also unscoped.
129
130
131	\subsection{Representation}
132
133	C standard specifies enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
134	In practice, C uses @int@ as the underlying type for enumeration variables, because of the restriction to integral constants, which have type @int@ (unless qualified with a size suffix).
135
136	\subsection{Usage}
137	\label{s:Usage}
138
139	C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type.
140	\begin{clang}
141	enum Week week = Mon; $\C{// week == 0}$
142	week = Fri; $\C{// week == 11}$
143	int i = Sun; $\C{// implicit conversion to int, i == 13}$
144	@week = 10000;@ $\C{// UNDEFINED! implicit conversion to Week}$
145	\end{clang}
146	While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
147
148	Enumerators can appear in @switch@ and looping statements.
149	\begin{cfa}
150	enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
151	switch ( week ) {
152	case Mon: case Tue: case Wed: case Thu: case Fri:
153	printf( "weekday\n" );
154	case Sat: case Sun:
155	printf( "weekend\n" );
156	}
157	for ( enum Week day = Mon; day <= Sun; day += 1 ) { // step of 1
158	printf( "day %d\n", day ); // 0-6
159	}
160	\end{cfa}
161	For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
162	For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
163	Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
164	For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
165
166	There is a C idiom to automatically compute the number of enumerators in an enumeration.
167	\begin{cfa}
168	enum E { A, B, C, D, @N@ }; // N == 4
169	for ( enum E e = A; e < @N@; e += 1 ) ...
170	\end{cfa}
171	Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
172	@N@ is often used as the dimension for an array assocated with the enumeration.
173	\begin{cfa}
174	E array[@N@];
175	for ( enum E e = A; e < N; e += 1 ) {
176	array[e] = e;
177	}
178	\end{cfa}
179	However, for typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
180
181	This idiom leads to another C idiom using an enumeration with matching companion information.
182	For example, an enumeration is linked with a companion array of printable strings.
183	\begin{cfa}
184	enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
185	char * Integral_Name[@NO_OF_ITYPES@] = {
186	"char", "signed char", "unsigned char",
187	"signed short int", "unsigned short int",
188	"signed int", "unsigned int", ...
189	};
190	enum Integral_Type integral_type = ...
191	printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
192	\end{cfa}
193	However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
194	The need to harmonize is at best indicated by a comment before the enumeration.
195	This issue is exacerbated if enumeration and companion array are in different translation units.
196
197	\bigskip
198	While C provides a true enumeration, it is restricted, has unsafe semantics, and does provide useful enumeration features in other programming languages.
199
200	\section{\CFA Polymorphism}
201	\subsection{Function Overloading}
202	Function overloading is programming languages feature wherein functions may share the same name, but with different function signatures. In both C++ and \CFA, function names can be overloaded
203	with different entities as long as they are different in terms of the number and type of parameters.
204
205	\begin{cfa}
206	void f(); // (1)
207	void f(int); // (2); Overloaded on the number of parameters
208	void f(char); // (3); Overloaded on parameter type
209
210	f('A');
211	\end{cfa}
212	In this case, the name f is overloaded with a nullity function and two arity functions with different parameters types. Exactly which precedures being executed
213	is determined based on the passing arguments. The last expression of the preceding example calls f with one arguments, narrowing the possible candidates down to (2) and (3).
214	Between those, function argument 'A' is an exact match to the parameter expected by (3), while needing an @implicit conversion@ to call (2). The compiler determines (3) is the better candidates among
215	and procedure (3) is being executed.
216
217	\begin{cfa}
218	int f(int); // (4); Overloaded on return type
219	[int, int] f(int); // (5) Overloaded on the number of return value
220	\end{cfa}
221	The function declarations (4) and (5) show the ability of \CFA functions overloaded with different return value, a feature that is not shared by C++.
222
223
224	\subsection{Operator Overloading}
225	Operators in \CFA are specialized function and are overloadable by with specially-named functions represents the syntax used to call the operator.
226	% For example, @bool ?==?T(T lhs, T rhs)@ overloads equality operator for type T, where @?@ is the placeholders for operands for the operator.
227	\begin{cfa}
228	enum Weekday { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday };
229	bool ?<?(const Weekday a, const Weekday b) {
230	return ((int)a + 1);
231	}
232	Monday < Sunday; // False
233	?<?( Monday, Sunday ); // Equivalent syntax
234	\end{cfa}
235	Unary operators are functions that takes one argument and have name @operator?@ or @?operator@, where @?@ is the placeholders for operands.
236	Binary operators are function with two parameters. They are overloadable with function name @?operator?@.
237
238	\subsection{Constructor and Destructor}
239	In \CFA, all objects are initialized by @constructors@ during its allocation, including basic types,
240	which are initialized by constructors default-generated by a compiler.
241
242	Constructors are overloadable functions with name @?{}@, return @void@, and have at least one parameter, which is a reference
243	to the object being constructored (Colloquially referred to "this" or "self" in other language).
244
245	\begin{cfa}
246	struct Employee {
247	const char * name;
248	double salary;
249	};
250
251	void ?{}( Employee& this, const char * name, double salary ) {
252	this.name = name;
253	this.salary = salary;
254	}
255
256	Employee Sara { "Sara Schmidt", 20.5 };
257	\end{cfa}
258	Like Python, the "self" reference is implicitly passed to a constructor. The Employee constructors takes two additional arugments used in its
259	field initialization.
260
261	A destructor in \CFA is a function that has name @^?{}@. It returns void, and take only one arugment as its "self".
262	\begin{cfa}
263	void ^?{}( Employee& this ) {
264	free(this.name);
265	this.name = 0p;
266	this.salary = 0;
267	}
268	\end{cfa}
269	Destructor can be explicitly evoked as a function call, or implicitly called at the end of the block in which the object is delcared.
270	\begin{cfa}
271	{
272	^Sara{};
273	Sara{ "Sara Craft", 20 };
274	} // ^Sara{}
275	\end{cfa}
276
277	\subsection{Variable Overloading}
278	C and C++ disallow more than one variable declared in the same scope with the same name. When a variable declare in a inner scope has the same name as
279	a variable in an outer scope, the outer scope variable is "shadowed" by the inner scope variable and cannot be accessed directly.
280
281	\CFA has variable overloading: multiple variables can share the same name in the same scope, as long as they have different type. Name shadowing only
282	happens when the inner scope variable and the outer scope ones have the same type.
283	\begin{cfa}
284	double i = 6.0;
285	int i = 5;
286	void foo( double i ) { sout \| i; } // 6.0
287	\end{cfa}
288
289	\subsection{Special Literals}
290	Literal 0 has special meanings within different contexts: it can means "nothing" or "empty", an additive identity in arithmetic, a default value as in C (null pointer),
291	or an initial state.
292	Awaring of its significance, \CFA provides a special type for the 0 literal, @zero_t@, to define the logical @zero@ for custom types.
293	\begin{cfa}
294	struct S { int i, j; };
295	void ?{}( S & this, @zero_t@ ) { this.i = 0; this.j = 0; } // zero_t, no parameter name allowed
296	S s0 = @0@;
297	\end{cfa}
298	Overloading @zero_t@ for S provides new definition for @zero@ of type S.
299
300	According to the C standard, @0@ is the @only@ false value. Any values compares equals to @0@ is false, and not euqals @0@ is true. As a consequence, control structure
301	such as @if()@ and @while()@ only runs it true clause when its predicate @not equals@ to @0@.
302
303	\CFA generalizes this concept and allows to logically overloads the boolean value for any type by overloading @not equal@ comparison against @zero_t@.
304	\begin{cfa}
305	int ?@!=@?( S this, @zero_t@ ) { return this.i != 0 && this.j != 0; }
306	\end{cfa}
307
308	% In C, the literal 0 represents the Boolean value false. The expression such as @if (x)@ is equivalent to @if (x != 0)@ .
309	% \CFA allows user to define the logical zero for a custom type by overloading the @!=@ operation against a special type, @zero_t@,
310	% so that an expression with the custom type can be used as a predicate without the need of conversion to the literal 0.
311	% \begin{cfa}
312	% struct S s;
313	% int ?!=?(S, zero_t);
314	% if (s) {}
315	% \end{cfa}
316	Literal 1 is also special. Particularly in C, the pre-increment operator and post-increment operator can be interpreted in terms of @+= 1@.
317	The logical @1@ in \CFA is represented by special type @one_t@.
318	\begin{cfa}
319	void ?{}( S & this, one_t ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed
320	S & ?+=?( S & this, one_t ) { this.i += 1; this.j += 1; return op; }
321	\end{cfa}
322	Without explictly overloaded by a user, \CFA uses the user-defined @+=(S&, one_t)@ to interpret @?++@ and @++?@, as both are polymorphic functions in \CFA.
323
324	\subsection{Polymorphics Functions}
325	Parametric-Polymorphics functions are the functions that applied to all types. \CFA functions are parametric-polymorphics when
326	they are written with the @forall@ clause.
327
328	\begin{cfa}
329	forall(T)
330	T identity(T x) { return x; }
331	identity(42);
332	\end{cfa}
333	The identity function accepts a value from any type as an arugment, and the type parameter @T@ is bounded to @int@ when the function
334	is called with 42.
335
336	The forall clause can takes @type assertions@ that restricts the polymorphics type.
337	\begin{cfa}
338	forall( T \| { void foo(T); } )
339	void bar(T t) { foo(t); }
340
341	struct S {} s;
342	void foo(struct S);
343
344	bar(s);
345	\end{cfa}
346	The assertion on @T@ restricts the range of types for bar to only those implements foo with the matching a signature, so that bar()
347	can call @foo@ in its body with type safe.
348	Calling on type with no mathcing @foo()@ implemented, such as int, causes a compile time type assertion error.
349
350	A @forall@ clause can asserts on multiple types and with multiple asserting functions. A common practice in \CFA is to group
351	the asserting functions in to a named @trait@ .
352
353	\begin{cfa}
354	trait Bird(T) {
355	int days_can_fly(T i);
356	void fly(T t);
357	};
358
359	forall(B \| Bird(B)) {
360	void bird_fly(int days_since_born, B bird) {
361	if (days_since_born > days_can_fly(bird)) {
362	fly(bird);
363	}
364	}
365	}
366
367	struct Robin {} r;
368	int days_can_fly(Robin r) { return 23; }
369	void fly(Robin r) {}
370
371	bird_fly( r );
372	\end{cfa}
373
374	Grouping type assertions into named trait effectively create a reusable interface for parametrics polymorphics types.
375
376	\section{Expression Resolution}
377
378	The overloading feature poses a challenge in \CFA expression resolution. Overloadeded identifiers can refer multiple
379	candidates, with multiples being simultaneously valid. The main task of \CFA resolver is to identity a best candidate that
380	involes less implicit conversion and polymorphism.
381
382	\subsection{Conversion Cost}
383	In C, functions argument and parameter type does not need to be exact match, and the compiler performs an @implicit conversion@ on argument.
384	\begin{cfa}
385	void foo(double i);
386	foo(42);
387	\end{cfa}
388	The implicit conversion in C is relatively simple because of the abscence of overloading, with the exception of binary operators, for which the
389	compiler needs to find a common type of both operands and the result. The pattern is known as "usual arithmetic conversions".
390
391	\CFA generalizes C implicit conversion to function overloading as a concept of @conversion cost@.
392	Initially designed by Bilson, conversion cost is a 3-tuple, @(unsafe, poly, safe)@, where unsafe is the number of narrowing conversion,
393	poly is the count of polymorphics type binding, and safe is the sum of the degree of widening conversion. Every
394	basic type in \CFA has been assigned with a @distance to Byte@, or @distance@, and the degree of widening conversion is the difference between two distances.
395
396	Aaron extends conversion cost to a 7-tuple,
397	@@(unsafe, poly, safe, sign, vars, specialization, reference)@@. The summary of Aaron's cost model is the following:
398	\begin{itemize}
399	\item Unsafe is the number of argument that implicitly convert to a type with high rank.
400	\item Poly accounts for number of polymorphics binding in the function declaration.
401	\item Safe is sum of distance (add reference/appendix later).
402	\item Sign is the number of sign/unsign variable conversion.
403	\item Vars is the number of polymorphics type declared in @forall@.
404	\item Specialization is opposite number of function declared in @forall@. More function declared implies more constraint on polymorphics type, and therefore has the lower cost.
405	\item Reference is number of lvalue-to-rvalue conversion.
406	\end{itemize}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: