source: doc/theses/jiada_liang_MMath/background.tex @ caa3e2c

Last change on this file since caa3e2c was 10a99d87, checked in by Peter A. Buhr <pabuhr@…>, 4 months ago

proofread last push of CFA enumerations

  • Property mode set to 100644
File size: 10.2 KB
RevLine 
[18ebc28]1\chapter{Background}
[956299b]2
[736a38d]3\vspace*{-8pt}
4
[f632117]5\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
[736a38d]6The following discussion covers C enumerations.
[956299b]7
[10a99d87]8As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
[7d9a805b]9\begin{clang}
10#define Mon 0
11static const int Mon = 0;
12enum { Mon };
13\end{clang}
14\begin{enumerate}[leftmargin=*]
15\item
16For @#define@, the programmer has to explicitly manage the constant name and value.
[f632117]17Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
[7d9a805b]18\item
19The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
[736a38d]20C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage.
[7d9a805b]21\begin{clang}
22$\$$ nm test.o
230000000000000018 r Mon
24\end{clang}
25\item
[736a38d]26Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
[7d9a805b]27\end{enumerate}
28
29
30\section{C \lstinline{const}}
[f632117]31\label{s:Cconst}
[7d9a805b]32
[f632117]33C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
[736a38d]34\begin{cquote}
[10a99d87]35\begin{tabular}{@{}ll@{}}
36\multicolumn{1}{@{}c}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
[7d9a805b]37\begin{clang}
[736a38d]38static const int one = 0 + 1;
[7d9a805b]39static const void * NIL = NULL;
40static const double PI = 3.14159;
41static const char Plus = '+';
42static const char * Fred = "Fred";
[736a38d]43static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44        Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45\end{clang}
46&
47\begin{clang}
[7d9a805b]48void foo() {
[736a38d]49        // auto scope only
50        const int r = random() % 100;
51        int va[r];
[7d9a805b]52}
[736a38d]53
54
[7d9a805b]55\end{clang}
[736a38d]56\end{tabular}
57\end{cquote}
[10a99d87]58However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
[736a38d]59Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
[41c4b5e]60Again, this form of aliasing is not an enumeration.
[956299b]61
[7d9a805b]62
63\section{C Enumeration}
[4da9142]64\label{s:CEnumeration}
[7d9a805b]65
[4da9142]66The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
67\begin{clang}[identifierstyle=\linespread{0.9}\it]
68$\it enum$-specifier:
69        enum identifier$\(_{opt}\)$ { enumerator-list }
70        enum identifier$\(_{opt}\)$ { enumerator-list , }
71        enum identifier
72enumerator-list:
73        enumerator
74        enumerator-list , enumerator
75enumerator:
76        enumeration-constant
77        enumeration-constant = constant-expression
78\end{clang}
79The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
[f632117]80The C enumeration semantics are discussed using examples.
[4da9142]81
[f632117]82
83\subsection{Type Name}
84\label{s:TypeName}
85
86An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
87However, it is restricted to integral values.
[4da9142]88\begin{clang}
[ec20ab9]89enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
[4da9142]90\end{clang}
[f632117]91Here, the aliased constants are: 20, 10, 20, 21, and -7.
92Direct initialization is by a compile-time expression generating a constant value.
[ec20ab9]93Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
[f632117]94Because multiple independent enumerators can be combined, enumerators with the same values can occur.
95The enumerators are rvalues, so assignment is disallowed.
[caaf424]96Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
[736a38d]97For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
[f632117]98
99As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
100While the semantics is misleading, this enumeration form matches with aggregate types:
101\begin{cfa}
[736a38d]102typedef struct @/* unnamed */@  { ... } S;
103struct @/* unnamed */@  { ... } x, y, z;        $\C{// questionable}$
[f632117]104struct S {
[736a38d]105        union @/* unnamed */@ {                                 $\C{// unscoped fields}$
[f632117]106                int i;  double d ;  char ch;
107        };
108};
109\end{cfa}
110Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
[4da9142]111
[f632117]112A \emph{named} enumeration is an enumeration:
[7d9a805b]113\begin{clang}
[f632117]114enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
[7d9a805b]115\end{clang}
[f632117]116and adopts the same semantics with respect to direct and auto intialization.
[7d9a805b]117For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
[f632117]118As well, initialization may occur in any order.
[7d9a805b]119\begin{clang}
[f632117]120enum Week {
121        Thu@ = 10@, Fri, Sat, Sun,
[ec20ab9]122        Mon@ = 0@, Tue, Wed@,@                  $\C{// terminating comma}$
123};
[7d9a805b]124\end{clang}
[f632117]125Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
[f9da761]126A terminating comma appears in other C syntax, \eg the initializer list.}
[f632117]127This feature allow enumerator lines to be interchanged without moving a comma.
128Named enumerators are also unscoped.
129
130
131\subsection{Implementation}
[10a99d87]132\label{s:CenumImplementation}
[f632117]133
134In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
[10a99d87]135In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
136However, type @int@ is defined as:
137\begin{quote}
138A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
139\end{quote}
140Howeveer, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
141Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
142In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization.
143\begin{cfa}
144enum E { IMin = INT_MIN, IMax = INT_MAX,
145                         ILMin = LONG_MIN, ILMax = LONG_MAX,
146                         ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
147int main() {
148        printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
149                         sizeof(IMin), IMin, IMax,
150                         sizeof(ILMin), ILMin, ILMax,
151                         sizeof(ILLMin), ILLMin, ILLMax );
152}
1534 -2147483648 2147483647
1548 -9223372036854775808 9223372036854775807
1558 -9223372036854775808 9223372036854775807
156\end{cfa}
157Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes.
[f632117]158
[956299b]159
[f632117]160\subsection{Usage}
161\label{s:Usage}
162
163C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type.
[7d9a805b]164\begin{clang}
[f632117]165enum Week week = Mon;                           $\C{// week == 0}$
166week = Fri;                                                     $\C{// week == 11}$
167int i = Sun;                                            $\C{// implicit conversion to int, i == 13}$
168@week = 10000;@                                         $\C{// UNDEFINED! implicit conversion to Week}$
[7d9a805b]169\end{clang}
[736a38d]170While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
[f632117]171
172Enumerators can appear in @switch@ and looping statements.
173\begin{cfa}
174enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
175switch ( week ) {
[10a99d87]176        case Mon ... Fri:                               $\C{// gcc case range}$
[f632117]177                printf( "weekday\n" );
178        case Sat: case Sun:
179                printf( "weekend\n" );
180}
[10a99d87]181for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
[f632117]182        printf( "day %d\n", day ); // 0-6
183}
184\end{cfa}
[736a38d]185For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
186For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
[f632117]187Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
188For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
189
[736a38d]190There is a C idiom to automatically compute the number of enumerators in an enumeration.
[f632117]191\begin{cfa}
192enum E { A, B, C, D, @N@ };  // N == 4
193for ( enum E e = A; e < @N@; e += 1 ) ...
194\end{cfa}
195Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
196@N@ is often used as the dimension for an array assocated with the enumeration.
197\begin{cfa}
198E array[@N@];
199for ( enum E e = A; e < N; e += 1 ) {
200        array[e] = e;
201}
202\end{cfa}
[10a99d87]203However, for non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
[f632117]204
[10a99d87]205This idiom is used in another C idiom for matching companion information.
[f632117]206For example, an enumeration is linked with a companion array of printable strings.
207\begin{cfa}
208enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
209char * Integral_Name[@NO_OF_ITYPES@] = {
210        "char", "signed char", "unsigned char",
211        "signed short int", "unsigned short int",
212        "signed int", "unsigned int", ...
213};
214enum Integral_Type integral_type = ...
215printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
216\end{cfa}
217However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
218The need to harmonize is at best indicated by a comment before the enumeration.
219This issue is exacerbated if enumeration and companion array are in different translation units.
220
221\bigskip
[10a99d87]222While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages.
Note: See TracBrowser for help on using the repository browser.