source: doc/theses/jiada_liang_MMath/background.tex @ a03ed29

Last change on this file since a03ed29 was a03ed29, checked in by JiadaL <j82liang@…>, 5 days ago

conclude merge

  • Property mode set to 100644
File size: 19.8 KB
Line 
1\chapter{Background}
2
3\vspace*{-8pt}
4
5\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
6The following discussion covers C enumerations.
7
8As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
9\begin{clang}
10#define Mon 0
11static const int Mon = 0;
12enum { Mon };
13\end{clang}
14\begin{enumerate}[leftmargin=*]
15\item
16For @#define@, the programmer has to explicitly manage the constant name and value.
17Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
18\item
19The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
20C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage.
21\begin{clang}
22$\$$ nm test.o
230000000000000018 r Mon
24\end{clang}
25\item
26Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
27\end{enumerate}
28
29
30\section{C \lstinline{const}}
31\label{s:Cconst}
32
33C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
34\begin{cquote}
35\begin{tabular}{@{}ll@{}}
36\multicolumn{1}{@{}c}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
37\begin{clang}
38static const int one = 0 + 1;
39static const void * NIL = NULL;
40static const double PI = 3.14159;
41static const char Plus = '+';
42static const char * Fred = "Fred";
43static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44        Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45\end{clang}
46&
47\begin{clang}
48void foo() {
49        // auto scope only
50        const int r = random() % 100;
51        int va[r];
52}
53\end{clang}
54\end{tabular}
55\end{cquote}
56However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
57Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
58Again, this form of aliasing is not an enumeration.
59
60\section{C Enumeration}
61\label{s:CEnumeration}
62
63The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
64\begin{clang}[identifierstyle=\linespread{0.9}\it]
65$\it enum$-specifier:
66        enum identifier$\(_{opt}\)$ { enumerator-list }
67        enum identifier$\(_{opt}\)$ { enumerator-list , }
68        enum identifier
69enumerator-list:
70        enumerator
71        enumerator-list , enumerator
72enumerator:
73        enumeration-constant
74        enumeration-constant = constant-expression
75\end{clang}
76The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
77The C enumeration semantics are discussed using examples.
78
79
80\subsection{Type Name}
81\label{s:TypeName}
82
83An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
84However, it is restricted to integral values.
85\begin{clang}
86enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
87\end{clang}
88Here, the aliased constants are: 20, 10, 20, 21, and -7.
89Direct initialization is by a compile-time expression generating a constant value.
90Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
91Because multiple independent enumerators can be combined, enumerators with the same values can occur.
92The enumerators are rvalues, so assignment is disallowed.
93Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
94For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
95
96As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
97While the semantics is misleading, this enumeration form matches with aggregate types:
98\begin{cfa}
99typedef struct @/* unnamed */@  { ... } S;
100struct @/* unnamed */@  { ... } x, y, z;        $\C{// questionable}$
101struct S {
102        union @/* unnamed */@ {                                 $\C{// unscoped fields}$
103                int i;  double d ;  char ch;
104        };
105};
106\end{cfa}
107Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
108
109A \emph{named} enumeration is an enumeration:
110\begin{clang}
111enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
112\end{clang}
113and adopts the same semantics with respect to direct and auto intialization.
114For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
115As well, initialization may occur in any order.
116\begin{clang}
117enum Week {
118        Thu@ = 10@, Fri, Sat, Sun,
119        Mon@ = 0@, Tue, Wed@,@                  $\C{// terminating comma}$
120};
121\end{clang}
122Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
123A terminating comma appears in other C syntax, \eg the initializer list.}
124This feature allow enumerator lines to be interchanged without moving a comma.
125Named enumerators are also unscoped.
126
127
128\subsection{Implementation}
129\label{s:CenumImplementation}
130
131In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
132In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
133However, type @int@ is defined as:
134\begin{quote}
135A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
136\end{quote}
137Howeveer, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
138Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
139In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization.
140\begin{cfa}
141enum E { IMin = INT_MIN, IMax = INT_MAX,
142                         ILMin = LONG_MIN, ILMax = LONG_MAX,
143                         ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
144int main() {
145        printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
146                         sizeof(IMin), IMin, IMax,
147                         sizeof(ILMin), ILMin, ILMax,
148                         sizeof(ILLMin), ILLMin, ILLMax );
149}
1504 -2147483648 2147483647
1518 -9223372036854775808 9223372036854775807
1528 -9223372036854775808 9223372036854775807
153\end{cfa}
154Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes.
155
156\subsection{Usage}
157\label{s:Usage}
158
159C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type.
160\begin{clang}
161enum Week week = Mon;                           $\C{// week == 0}$
162week = Fri;                                                     $\C{// week == 11}$
163int i = Sun;                                            $\C{// implicit conversion to int, i == 13}$
164@week = 10000;@                                         $\C{// UNDEFINED! implicit conversion to Week}$
165\end{clang}
166While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
167
168Enumerators can appear in @switch@ and looping statements.
169\begin{cfa}
170enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
171switch ( week ) {
172        case Mon ... Fri:                               $\C{// gcc case range}$
173                printf( "weekday\n" );
174        case Sat: case Sun:
175                printf( "weekend\n" );
176}
177for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
178        printf( "day %d\n", day ); // 0-6
179}
180\end{cfa}
181For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
182For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
183Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
184For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
185
186There is a C idiom to automatically compute the number of enumerators in an enumeration.
187\begin{cfa}
188enum E { A, B, C, D, @N@ };  // N == 4
189for ( enum E e = A; e < @N@; e += 1 ) ...
190\end{cfa}
191Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
192@N@ is often used as the dimension for an array assocated with the enumeration.
193\begin{cfa}
194E array[@N@];
195for ( enum E e = A; e < N; e += 1 ) {
196        array[e] = e;
197}
198\end{cfa}
199However, for non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
200
201This idiom is used in another C idiom for matching companion information.
202For example, an enumeration is linked with a companion array of printable strings.
203\begin{cfa}
204enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
205char * Integral_Name[@NO_OF_ITYPES@] = {
206        "char", "signed char", "unsigned char",
207        "signed short int", "unsigned short int",
208        "signed int", "unsigned int", ...
209};
210enum Integral_Type integral_type = ...
211printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
212\end{cfa}
213However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
214The need to harmonize is at best indicated by a comment before the enumeration.
215This issue is exacerbated if enumeration and companion array are in different translation units.
216
217\bigskip
218While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages.
219
220\section{\CFA Polymorphism}
221\subsection{Function Overloading}
222Function overloading is programming languages feature wherein functions may share the same name, but with different function signatures. In both C++ and \CFA, function names can be overloaded
223with different entities as long as they are different in terms of the number and type of parameters.
224
225\begin{cfa}
226void f(); // (1)
227void f(int); // (2); Overloaded on the number of parameters
228void f(char); // (3); Overloaded on parameter type
229
230f('A');
231\end{cfa}
232In this case, the name f is overloaded with a nullity function and two arity functions with different parameters types. Exactly which precedures being executed
233is determined based on the passing arguments. The last expression of the preceding example calls f with one arguments, narrowing the possible candidates down to (2) and (3).
234Between those, function argument 'A' is an exact match to the parameter expected by (3), while needing an @implicit conversion@ to call (2). The compiler determines (3) is the better candidates among
235and procedure (3) is being executed.
236
237\begin{cfa}
238int f(int); // (4); Overloaded on return type
239[int, int] f(int); // (5) Overloaded on the number of return value
240\end{cfa}
241The function declarations (4) and (5) show the ability of \CFA functions overloaded with different return value, a feature that is not shared by C++.
242
243
244\subsection{Operator Overloading}
245Operators in \CFA are specialized function and are overloadable by with specially-named functions represents the syntax used to call the operator.
246% For example, @bool ?==?T(T lhs, T rhs)@ overloads equality operator for type T, where @?@ is the placeholders for operands for the operator.
247\begin{cfa}
248enum Weekday { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday };
249bool ?<?(const Weekday a, const Weekday b) {
250        return ((int)a + 1);
251}
252Monday < Sunday; // False
253?<?( Monday, Sunday ); // Equivalent syntax
254\end{cfa}
255Unary operators are functions that takes one argument and have name @operator?@ or @?operator@, where @?@ is the placeholders for operands.
256Binary operators are function with two parameters. They are overloadable with function name @?operator?@.
257
258\subsection{Constructor and Destructor}
259In \CFA, all objects are initialized by @constructors@ during its allocation, including basic types,
260which are initialized by auto-generated basic type constructors.
261
262Constructors are overloadable functions with name @?{}@, return @void@, and have at least one parameter, which is a reference
263to the object being constructored (Colloquially referred to "this" or "self" in other language).
264
265\begin{cfa}
266struct Employee {
267        const char * name;
268        double salary;
269};
270
271void ?{}( Employee& this, const char * name, double salary ) {
272    this.name = name;
273    this.salary = salary;
274}
275
276Employee Sara { "Sara Schmidt", 20.5 };
277\end{cfa}
278Like Python, the "self" reference is implicitly passed to a constructor. The Employee constructors takes two additional arugments used in its
279field initialization.
280
281A destructor in \CFA is a function that has name @^?{}@. It returns void, and take only one arugment as its "self".
282\begin{cfa}
283void ^?{}( Employee& this ) {
284    free(this.name);
285    this.name = 0p;
286    this.salary = 0;
287}
288\end{cfa}
289Destructor can be explicitly evoked as a function call, or implicitly called at the end of the block in which the object is delcared.
290\begin{cfa}
291{
292^Sara{};
293Sara{ "Sara Craft", 20 };
294} // ^Sara{}
295\end{cfa}
296
297\subsection{Variable Overloading}
298C and C++ disallow more than one variable declared in the same scope with the same name. When a variable declare in a inner scope has the same name as
299a variable in an outer scope, the outer scope variable is "shadowed" by the inner scope variable and cannot be accessed directly.
300
301\CFA has variable overloading: multiple variables can share the same name in the same scope, as long as they have different type. Name shadowing only
302happens when the inner scope variable and the outer scope ones have the same type.
303\begin{cfa}
304double i = 6.0;
305int i = 5;
306void foo( double i ) { sout | i; } // 6.0
307\end{cfa}
308
309\subsection{Special Literals}
310Literal 0 has special meanings within different contexts: it can means "nothing" or "empty", an additive identity in arithmetic, a default value as in C (null pointer),
311or an initial state.
312Awaring of its significance, \CFA provides a special type for the 0 literal, @zero_t@, to define the logical @zero@ for custom types.
313\begin{cfa}
314struct S { int i, j; };
315void ?{}( S & this, @zero_t@ ) { this.i = 0; this.j = 0; } // zero_t, no parameter name allowed
316S s0 = @0@;
317\end{cfa}
318Overloading @zero_t@ for S provides new definition for @zero@ of type S.
319
320According to the C standard, @0@ is the @only@ false value. Any values compares equals to @0@ is false, and not euqals @0@ is true. As a consequence, control structure
321such as @if()@ and @while()@ only runs it true clause when its predicate @not equals@ to @0@.
322
323\CFA generalizes this concept and allows to logically overloads the boolean value for any type by overloading @not equal@ comparison against @zero_t@.
324\begin{cfa}
325int ?@!=@?( S this, @zero_t@ ) { return this.i != 0 && this.j != 0; }
326\end{cfa}
327
328% In C, the literal 0 represents the Boolean value false. The expression such as @if (x)@ is equivalent to @if (x != 0)@ .
329% \CFA allows user to define the logical zero for a custom type by overloading the @!=@ operation against a special type, @zero_t@,
330% so that an expression with the custom type can be used as a predicate without the need of conversion to the literal 0.
331% \begin{cfa}
332% struct S s;
333% int ?!=?(S, zero_t);
334% if (s) {}
335% \end{cfa}
336Literal 1 is also special. Particularly in C, the pre-increment operator and post-increment operator can be interpreted in terms of @+= 1@.
337The logical @1@ in \CFA is represented by special type @one_t@.
338\begin{cfa}
339void ?{}( S & this, one_t ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed
340S & ?+=?( S & this, one_t ) { this.i += 1; this.j += 1; return op; }
341\end{cfa}
342Without explictly overloaded by a user, \CFA uses the user-defined @+=(S&, one_t)@ to interpret @?++@ and @++?@, as both are polymorphic functions in \CFA.
343
344\subsection{Polymorphics Functions}
345Parametric-Polymorphics functions are the functions that applied to all types. \CFA functions are parametric-polymorphics when
346they are written with the @forall@ clause.
347
348\begin{cfa}
349forall(T)
350T identity(T x) { return x; }
351identity(42);
352\end{cfa}
353The identity function accepts a value from any type as an arugment, and the type parameter @T@ is bounded to @int@ when the function
354is called with 42.
355
356The forall clause can takes @type assertions@ that restricts the polymorphics type.
357\begin{cfa}
358forall( T | { void foo(T); } )
359void bar(T t) { foo(t); }
360
361struct S {} s;
362void foo(struct S);
363
364bar(s);
365\end{cfa}
366The assertion on @T@ restricts the range of types for bar to only those implements foo with the matching a signature, so that bar() 
367can call @foo@ in its body with type safe.
368Calling on type with no mathcing @foo()@ implemented, such as int, causes a compile time type assertion error.
369
370A @forall@ clause can asserts on multiple types and with multiple asserting functions. A common practice in \CFA is to group
371the asserting functions in to a named @trait@ .
372
373\begin{cfa}
374trait Bird(T) {
375        int days_can_fly(T i);
376        void fly(T t);
377};
378
379forall(B | Bird(B)) {
380        void bird_fly(int days_since_born, B bird) {
381                if (days_since_born > days_can_fly(bird)) {
382                        fly(bird);
383                }
384        }
385}
386
387struct Robin {} r;
388int days_can_fly(Robin r) { return 23; }
389void fly(Robin r) {}
390
391bird_fly( r );
392\end{cfa}
393
394Grouping type assertions into named trait effectively create a reusable interface for parametrics polymorphics types.
395
396\section{Expression Resolution}
397
398The overloading feature poses a challenge in \CFA expression resolution. Overloadeded identifiers can refer multiple
399candidates, with multiples being simultaneously valid. The main task of \CFA resolver is to identity a best candidate that
400involes less implicit conversion and polymorphism.
401
402\subsection{Conversion Cost}
403In C, functions argument and parameter type does not need to be exact match, and the compiler performs an @implicit conversion@ on argument.
404\begin{cfa}
405void foo(double i);
406foo(42);
407\end{cfa}
408The implicit conversion in C is relatively simple because of the abscence of overloading, with the exception of binary operators, for which the
409compiler needs to find a common type of both operands and the result. The pattern is known as "usual arithmetic conversions".
410
411\CFA generalizes C implicit conversion to function overloading as a concept of @conversion cost@.
412Initially designed by Bilson, conversion cost is a 3-tuple, @(unsafe, poly, safe)@, where unsafe is the number of narrowing conversion,
413poly is the count of polymorphics type binding, and safe is the sum of the degree of widening conversion. Every
414basic type in \CFA has been assigned with a @distance to Byte@, or @distance@, and the degree of widening conversion is the difference between two distances.
415
416Aaron extends conversion cost to a 7-tuple,
417@@(unsafe, poly, safe, sign, vars, specialization, reference)@@. The summary of Aaron's cost model is the following:
418\begin{itemize}
419\item Unsafe is the number of argument that implicitly convert to a type with high rank.
420\item Poly accounts for number of polymorphics binding in the function declaration.
421\item Safe is sum of distance (add reference/appendix later).
422\item Sign is the number of sign/unsign variable conversion.
423\item Vars is the number of polymorphics type declared in @forall@.
424\item Specialization is opposite number of function declared in @forall@. More function declared implies more constraint on polymorphics type, and therefore has the lower cost.
425\item Reference is number of lvalue-to-rvalue conversion.
426\end{itemize}
Note: See TracBrowser for help on using the repository browser.