source: doc/theses/jiada_liang_MMath/background.tex @ d1276f8

Last change on this file since d1276f8 was 10a99d87, checked in by Peter A. Buhr <pabuhr@…>, 3 days ago

proofread last push of CFA enumerations

  • Property mode set to 100644
File size: 10.2 KB
Line 
1\chapter{Background}
2
3\vspace*{-8pt}
4
5\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
6The following discussion covers C enumerations.
7
8As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
9\begin{clang}
10#define Mon 0
11static const int Mon = 0;
12enum { Mon };
13\end{clang}
14\begin{enumerate}[leftmargin=*]
15\item
16For @#define@, the programmer has to explicitly manage the constant name and value.
17Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
18\item
19The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
20C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage.
21\begin{clang}
22$\$$ nm test.o
230000000000000018 r Mon
24\end{clang}
25\item
26Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
27\end{enumerate}
28
29
30\section{C \lstinline{const}}
31\label{s:Cconst}
32
33C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
34\begin{cquote}
35\begin{tabular}{@{}ll@{}}
36\multicolumn{1}{@{}c}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
37\begin{clang}
38static const int one = 0 + 1;
39static const void * NIL = NULL;
40static const double PI = 3.14159;
41static const char Plus = '+';
42static const char * Fred = "Fred";
43static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44        Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45\end{clang}
46&
47\begin{clang}
48void foo() {
49        // auto scope only
50        const int r = random() % 100;
51        int va[r];
52}
53
54
55\end{clang}
56\end{tabular}
57\end{cquote}
58However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
59Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
60Again, this form of aliasing is not an enumeration.
61
62
63\section{C Enumeration}
64\label{s:CEnumeration}
65
66The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
67\begin{clang}[identifierstyle=\linespread{0.9}\it]
68$\it enum$-specifier:
69        enum identifier$\(_{opt}\)$ { enumerator-list }
70        enum identifier$\(_{opt}\)$ { enumerator-list , }
71        enum identifier
72enumerator-list:
73        enumerator
74        enumerator-list , enumerator
75enumerator:
76        enumeration-constant
77        enumeration-constant = constant-expression
78\end{clang}
79The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
80The C enumeration semantics are discussed using examples.
81
82
83\subsection{Type Name}
84\label{s:TypeName}
85
86An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
87However, it is restricted to integral values.
88\begin{clang}
89enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
90\end{clang}
91Here, the aliased constants are: 20, 10, 20, 21, and -7.
92Direct initialization is by a compile-time expression generating a constant value.
93Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
94Because multiple independent enumerators can be combined, enumerators with the same values can occur.
95The enumerators are rvalues, so assignment is disallowed.
96Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
97For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
98
99As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
100While the semantics is misleading, this enumeration form matches with aggregate types:
101\begin{cfa}
102typedef struct @/* unnamed */@  { ... } S;
103struct @/* unnamed */@  { ... } x, y, z;        $\C{// questionable}$
104struct S {
105        union @/* unnamed */@ {                                 $\C{// unscoped fields}$
106                int i;  double d ;  char ch;
107        };
108};
109\end{cfa}
110Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
111
112A \emph{named} enumeration is an enumeration:
113\begin{clang}
114enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
115\end{clang}
116and adopts the same semantics with respect to direct and auto intialization.
117For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
118As well, initialization may occur in any order.
119\begin{clang}
120enum Week {
121        Thu@ = 10@, Fri, Sat, Sun,
122        Mon@ = 0@, Tue, Wed@,@                  $\C{// terminating comma}$
123};
124\end{clang}
125Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
126A terminating comma appears in other C syntax, \eg the initializer list.}
127This feature allow enumerator lines to be interchanged without moving a comma.
128Named enumerators are also unscoped.
129
130
131\subsection{Implementation}
132\label{s:CenumImplementation}
133
134In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
135In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
136However, type @int@ is defined as:
137\begin{quote}
138A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
139\end{quote}
140Howeveer, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
141Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
142In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization.
143\begin{cfa}
144enum E { IMin = INT_MIN, IMax = INT_MAX,
145                         ILMin = LONG_MIN, ILMax = LONG_MAX,
146                         ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
147int main() {
148        printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
149                         sizeof(IMin), IMin, IMax,
150                         sizeof(ILMin), ILMin, ILMax,
151                         sizeof(ILLMin), ILLMin, ILLMax );
152}
1534 -2147483648 2147483647
1548 -9223372036854775808 9223372036854775807
1558 -9223372036854775808 9223372036854775807
156\end{cfa}
157Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes.
158
159
160\subsection{Usage}
161\label{s:Usage}
162
163C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type.
164\begin{clang}
165enum Week week = Mon;                           $\C{// week == 0}$
166week = Fri;                                                     $\C{// week == 11}$
167int i = Sun;                                            $\C{// implicit conversion to int, i == 13}$
168@week = 10000;@                                         $\C{// UNDEFINED! implicit conversion to Week}$
169\end{clang}
170While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
171
172Enumerators can appear in @switch@ and looping statements.
173\begin{cfa}
174enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
175switch ( week ) {
176        case Mon ... Fri:                               $\C{// gcc case range}$
177                printf( "weekday\n" );
178        case Sat: case Sun:
179                printf( "weekend\n" );
180}
181for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
182        printf( "day %d\n", day ); // 0-6
183}
184\end{cfa}
185For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
186For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
187Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
188For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
189
190There is a C idiom to automatically compute the number of enumerators in an enumeration.
191\begin{cfa}
192enum E { A, B, C, D, @N@ };  // N == 4
193for ( enum E e = A; e < @N@; e += 1 ) ...
194\end{cfa}
195Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
196@N@ is often used as the dimension for an array assocated with the enumeration.
197\begin{cfa}
198E array[@N@];
199for ( enum E e = A; e < N; e += 1 ) {
200        array[e] = e;
201}
202\end{cfa}
203However, for non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
204
205This idiom is used in another C idiom for matching companion information.
206For example, an enumeration is linked with a companion array of printable strings.
207\begin{cfa}
208enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
209char * Integral_Name[@NO_OF_ITYPES@] = {
210        "char", "signed char", "unsigned char",
211        "signed short int", "unsigned short int",
212        "signed int", "unsigned int", ...
213};
214enum Integral_Type integral_type = ...
215printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
216\end{cfa}
217However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
218The need to harmonize is at best indicated by a comment before the enumeration.
219This issue is exacerbated if enumeration and companion array are in different translation units.
220
221\bigskip
222While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages.
Note: See TracBrowser for help on using the repository browser.