source: doc/theses/jiada_liang_MMath/background.tex @ 91b9e10

Last change on this file since 91b9e10 was 41c4b5e, checked in by Peter A. Buhr <pabuhr@…>, 6 months ago

updates related to introduction chapter changes

  • Property mode set to 100644
File size: 8.5 KB
Line 
1\chapter{Background}
2
3\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
4The following covers C enumerations.
5
6As discussed in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
7\begin{clang}
8#define Mon 0
9static const int Mon = 0;
10enum { Mon };
11\end{clang}
12\begin{enumerate}[leftmargin=*]
13\item
14For @#define@, the programmer has to explicitly manage the constant name and value.
15Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
16\item
17The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
18C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate operands of assembler instructions, and occupy storage.
19\begin{clang}
20$\$$ nm test.o
210000000000000018 r Mon
22\end{clang}
23\item
24Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and might not occupy storage..
25\end{enumerate}
26
27
28\section{C \lstinline{const}}
29\label{s:Cconst}
30
31C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
32\begin{clang}
33static const int one = 0 + 1;                   $\C{// static initialization}$
34static const void * NIL = NULL;
35static const double PI = 3.14159;
36static const char Plus = '+';
37static const char * Fred = "Fred";
38static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1, Thu = Wed + 1, Fri = Thu + 1,
39                                        Sat = Fri + 1, Sun = Sat + 1;
40void foo() {
41        const int r = random() % 100;           $\C{// dynamic intialization}$
42        int va[r];                                                      $\C{// VLA, auto scope only}$
43}
44\end{clang}
45Statically initialized identifiers may appear in any constant-expression context, \eg @case@.
46Dynamically initialized identifiers may appear as array dimensions in @g++@, which allows variable-sized arrays on the stack.
47Again, this form of aliasing is not an enumeration.
48
49
50\section{C Enumeration}
51\label{s:CEnumeration}
52
53The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
54\begin{clang}[identifierstyle=\linespread{0.9}\it]
55$\it enum$-specifier:
56        enum identifier$\(_{opt}\)$ { enumerator-list }
57        enum identifier$\(_{opt}\)$ { enumerator-list , }
58        enum identifier
59enumerator-list:
60        enumerator
61        enumerator-list , enumerator
62enumerator:
63        enumeration-constant
64        enumeration-constant = constant-expression
65\end{clang}
66The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
67The C enumeration semantics are discussed using examples.
68
69
70\subsection{Type Name}
71\label{s:TypeName}
72
73An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
74However, it is restricted to integral values.
75\begin{clang}
76enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
77\end{clang}
78Here, the aliased constants are: 20, 10, 20, 21, and -7.
79Direct initialization is by a compile-time expression generating a constant value.
80Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
81Because multiple independent enumerators can be combined, enumerators with the same values can occur.
82The enumerators are rvalues, so assignment is disallowed.
83Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
84For unnamed enumeration this semantic is required because there is no type name for scoped qualification.
85
86As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
87While the semantics is misleading, this enumeration form matches with aggregate types:
88\begin{cfa}
89typedef struct /* unnamed */  { ... } S;
90struct /* unnamed */  { ... } x, y, z;  $\C{// questionable}$
91struct S {
92        union /* unnamed */ {                           $\C{// unscoped fields}$
93                int i;  double d ;  char ch;
94        };
95};
96\end{cfa}
97Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
98
99A \emph{named} enumeration is an enumeration:
100\begin{clang}
101enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
102\end{clang}
103and adopts the same semantics with respect to direct and auto intialization.
104For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
105As well, initialization may occur in any order.
106\begin{clang}
107enum Week {
108        Thu@ = 10@, Fri, Sat, Sun,
109        Mon@ = 0@, Tue, Wed@,@                  $\C{// terminating comma}$
110};
111\end{clang}
112Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
113A terminating comma appears in other C syntax, \eg the initializer list.}
114This feature allow enumerator lines to be interchanged without moving a comma.
115Named enumerators are also unscoped.
116
117
118\subsection{Implementation}
119
120In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
121In practice, C uses @int@ as the underlying type for enumeration variables, because of the restriction to integral constants, which have type @int@ (unless qualified with a size suffix).
122
123
124\subsection{Usage}
125\label{s:Usage}
126
127C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type.
128\begin{clang}
129enum Week week = Mon;                           $\C{// week == 0}$
130week = Fri;                                                     $\C{// week == 11}$
131int i = Sun;                                            $\C{// implicit conversion to int, i == 13}$
132@week = 10000;@                                         $\C{// UNDEFINED! implicit conversion to Week}$
133\end{clang}
134While converting an enumerator to underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
135
136Enumerators can appear in @switch@ and looping statements.
137\begin{cfa}
138enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
139switch ( week ) {
140        case Mon: case Tue: case Wed: case Thu: case Fri:
141                printf( "weekday\n" );
142        case Sat: case Sun:
143                printf( "weekend\n" );
144}
145for ( enum Week day = Mon; day <= Sun; day += 1 ) {
146        printf( "day %d\n", day ); // 0-6
147}
148\end{cfa}
149For iterating, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
150Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
151For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
152
153There is a C idiom to automatically know the number of enumerators in an enumeration.
154\begin{cfa}
155enum E { A, B, C, D, @N@ };  // N == 4
156for ( enum E e = A; e < @N@; e += 1 ) ...
157\end{cfa}
158Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
159@N@ is often used as the dimension for an array assocated with the enumeration.
160\begin{cfa}
161E array[@N@];
162for ( enum E e = A; e < N; e += 1 ) {
163        array[e] = e;
164}
165\end{cfa}
166However, for typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
167
168This idiom leads to another C idiom using an enumeration with matching companion information.
169For example, an enumeration is linked with a companion array of printable strings.
170\begin{cfa}
171enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
172char * Integral_Name[@NO_OF_ITYPES@] = {
173        "char", "signed char", "unsigned char",
174        "signed short int", "unsigned short int",
175        "signed int", "unsigned int", ...
176};
177enum Integral_Type integral_type = ...
178printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
179\end{cfa}
180However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
181The need to harmonize is at best indicated by a comment before the enumeration.
182This issue is exacerbated if enumeration and companion array are in different translation units.
183
184\bigskip
185While C provides a true enumeration, it is restricted, has unsafe semantics, and does provide enumeration features in other programming languages.
Note: See TracBrowser for help on using the repository browser.