source: doc/theses/jiada_liang_MMath/background.tex @ 1725989

Last change on this file since 1725989 was 736a38d, checked in by Peter A. Buhr <pabuhr@…>, 4 weeks ago

more proofreading of C background chapter

  • Property mode set to 100644
File size: 8.9 KB
RevLine 
[18ebc28]1\chapter{Background}
[956299b]2
[736a38d]3\vspace*{-8pt}
4
[f632117]5\CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations.
[736a38d]6The following discussion covers C enumerations.
[956299b]7
[f632117]8As discussed in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
[7d9a805b]9\begin{clang}
10#define Mon 0
11static const int Mon = 0;
12enum { Mon };
13\end{clang}
14\begin{enumerate}[leftmargin=*]
15\item
16For @#define@, the programmer has to explicitly manage the constant name and value.
[f632117]17Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
[7d9a805b]18\item
19The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
[736a38d]20C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage.
[7d9a805b]21\begin{clang}
22$\$$ nm test.o
230000000000000018 r Mon
24\end{clang}
25\item
[736a38d]26Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
[7d9a805b]27\end{enumerate}
28
29
30\section{C \lstinline{const}}
[f632117]31\label{s:Cconst}
[7d9a805b]32
[f632117]33C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
[736a38d]34\begin{cquote}
35\begin{tabular}{@{}l@{}l@{}}
36\multicolumn{1}{@{}c@{}}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\
[7d9a805b]37\begin{clang}
[736a38d]38static const int one = 0 + 1;
[7d9a805b]39static const void * NIL = NULL;
40static const double PI = 3.14159;
41static const char Plus = '+';
42static const char * Fred = "Fred";
[736a38d]43static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44        Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45\end{clang}
46&
47\begin{clang}
[7d9a805b]48void foo() {
[736a38d]49        // auto scope only
50        const int r = random() % 100;
51        int va[r];
[7d9a805b]52}
[736a38d]53
54
[7d9a805b]55\end{clang}
[736a38d]56\end{tabular}
57\end{cquote}
58However, statically initialized identifiers can not appear in constant-expression contexts, \eg @case@.
59Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
[41c4b5e]60Again, this form of aliasing is not an enumeration.
[956299b]61
[7d9a805b]62
63\section{C Enumeration}
[4da9142]64\label{s:CEnumeration}
[7d9a805b]65
[4da9142]66The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
67\begin{clang}[identifierstyle=\linespread{0.9}\it]
68$\it enum$-specifier:
69        enum identifier$\(_{opt}\)$ { enumerator-list }
70        enum identifier$\(_{opt}\)$ { enumerator-list , }
71        enum identifier
72enumerator-list:
73        enumerator
74        enumerator-list , enumerator
75enumerator:
76        enumeration-constant
77        enumeration-constant = constant-expression
78\end{clang}
79The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
[f632117]80The C enumeration semantics are discussed using examples.
[4da9142]81
[f632117]82
83\subsection{Type Name}
84\label{s:TypeName}
85
86An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
87However, it is restricted to integral values.
[4da9142]88\begin{clang}
[ec20ab9]89enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
[4da9142]90\end{clang}
[f632117]91Here, the aliased constants are: 20, 10, 20, 21, and -7.
92Direct initialization is by a compile-time expression generating a constant value.
[ec20ab9]93Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
[f632117]94Because multiple independent enumerators can be combined, enumerators with the same values can occur.
95The enumerators are rvalues, so assignment is disallowed.
[caaf424]96Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.
[736a38d]97For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
[f632117]98
99As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
100While the semantics is misleading, this enumeration form matches with aggregate types:
101\begin{cfa}
[736a38d]102typedef struct @/* unnamed */@  { ... } S;
103struct @/* unnamed */@  { ... } x, y, z;        $\C{// questionable}$
[f632117]104struct S {
[736a38d]105        union @/* unnamed */@ {                                 $\C{// unscoped fields}$
[f632117]106                int i;  double d ;  char ch;
107        };
108};
109\end{cfa}
110Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
[4da9142]111
[f632117]112A \emph{named} enumeration is an enumeration:
[7d9a805b]113\begin{clang}
[f632117]114enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
[7d9a805b]115\end{clang}
[f632117]116and adopts the same semantics with respect to direct and auto intialization.
[7d9a805b]117For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
[f632117]118As well, initialization may occur in any order.
[7d9a805b]119\begin{clang}
[f632117]120enum Week {
121        Thu@ = 10@, Fri, Sat, Sun,
[ec20ab9]122        Mon@ = 0@, Tue, Wed@,@                  $\C{// terminating comma}$
123};
[7d9a805b]124\end{clang}
[f632117]125Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
[f9da761]126A terminating comma appears in other C syntax, \eg the initializer list.}
[f632117]127This feature allow enumerator lines to be interchanged without moving a comma.
128Named enumerators are also unscoped.
129
130
131\subsection{Implementation}
132
133In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
134In practice, C uses @int@ as the underlying type for enumeration variables, because of the restriction to integral constants, which have type @int@ (unless qualified with a size suffix).
135
[956299b]136
[f632117]137\subsection{Usage}
138\label{s:Usage}
139
140C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type.
[7d9a805b]141\begin{clang}
[f632117]142enum Week week = Mon;                           $\C{// week == 0}$
143week = Fri;                                                     $\C{// week == 11}$
144int i = Sun;                                            $\C{// implicit conversion to int, i == 13}$
145@week = 10000;@                                         $\C{// UNDEFINED! implicit conversion to Week}$
[7d9a805b]146\end{clang}
[736a38d]147While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error.
[f632117]148
149Enumerators can appear in @switch@ and looping statements.
150\begin{cfa}
151enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
152switch ( week ) {
153        case Mon: case Tue: case Wed: case Thu: case Fri:
154                printf( "weekday\n" );
155        case Sat: case Sun:
156                printf( "weekend\n" );
157}
[736a38d]158for ( enum Week day = Mon; day <= Sun; day += 1 ) { // step of 1
[f632117]159        printf( "day %d\n", day ); // 0-6
160}
161\end{cfa}
[736a38d]162For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
163For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
[f632117]164Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
165For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
166
[736a38d]167There is a C idiom to automatically compute the number of enumerators in an enumeration.
[f632117]168\begin{cfa}
169enum E { A, B, C, D, @N@ };  // N == 4
170for ( enum E e = A; e < @N@; e += 1 ) ...
171\end{cfa}
172Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
173@N@ is often used as the dimension for an array assocated with the enumeration.
174\begin{cfa}
175E array[@N@];
176for ( enum E e = A; e < N; e += 1 ) {
177        array[e] = e;
178}
179\end{cfa}
180However, for typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
181
182This idiom leads to another C idiom using an enumeration with matching companion information.
183For example, an enumeration is linked with a companion array of printable strings.
184\begin{cfa}
185enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
186char * Integral_Name[@NO_OF_ITYPES@] = {
187        "char", "signed char", "unsigned char",
188        "signed short int", "unsigned short int",
189        "signed int", "unsigned int", ...
190};
191enum Integral_Type integral_type = ...
192printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
193\end{cfa}
194However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
195The need to harmonize is at best indicated by a comment before the enumeration.
196This issue is exacerbated if enumeration and companion array are in different translation units.
197
198\bigskip
[736a38d]199While C provides a true enumeration, it is restricted, has unsafe semantics, and does provide useful enumeration features in other programming languages.
Note: See TracBrowser for help on using the repository browser.