1 | \chapter{Introduction} |
---|
2 | |
---|
3 | All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @0xff@, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}}, \etc. |
---|
4 | Con\-stants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types. |
---|
5 | (In \CFA, the primary constants @0@ and @1@ can be overloaded for any type.) |
---|
6 | Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types. |
---|
7 | In theory, there are an infinite set of primary names per type. |
---|
8 | |
---|
9 | \Newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (mega byte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc. |
---|
10 | Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name. |
---|
11 | Its common purpose is to eliminate duplication of the primary constant throughout a program. |
---|
12 | For example, the secondary name replaces its primary name, thereafter changing the binding of the secondary to primary name automatically distributes the rebinding throughout the program. |
---|
13 | In some cases, secondary naming is \Newterm{opaque}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@. |
---|
14 | Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{ |
---|
15 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. |
---|
16 | |
---|
17 | Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc. |
---|
18 | Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}. |
---|
19 | \begin{quote} |
---|
20 | enumerate (verb, transitive). |
---|
21 | To count, ascertain the number of; |
---|
22 | more usually, to mention (a number of things or persons) separately, as if for the purpose of counting; |
---|
23 | to specify as in a list or catalogue.~\cite{OEDenumerate} |
---|
24 | \end{quote} |
---|
25 | Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are \emph{often} restricted to hold only the secondary names. |
---|
26 | It is possible to enumerate among set names without having an ordering among the set elements. |
---|
27 | For example, the week, the weekdays, the weekend, and every second day of the week. |
---|
28 | \begin{cfa}[morekeywords={in}] |
---|
29 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$ |
---|
30 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$ |
---|
31 | for ( cursor in Sat, Sun } ... $\C{// weekend}$ |
---|
32 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$ |
---|
33 | \end{cfa} |
---|
34 | This independence from internal representation allows multiple names to have the same representation (eighth note, quaver), giving synonyms. |
---|
35 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after. |
---|
36 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg |
---|
37 | \begin{cfa} |
---|
38 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ... |
---|
39 | \end{cfa} |
---|
40 | Here the internal representations for the secondary names are logically \emph{generated} rather than listing a subset of names. |
---|
41 | |
---|
42 | Hence, the fundamental aspects of an enumeration are: |
---|
43 | \begin{enumerate} |
---|
44 | \item |
---|
45 | It defines a type from which instants can be generated. |
---|
46 | \item |
---|
47 | The type lists a finite set of secondary names, which become its primary constants. |
---|
48 | This differentiates an enumeration from general types with an infinite number of primary constants. |
---|
49 | \item |
---|
50 | An enumeration's secondary names represent constants, which follows from their binding (aliasing) to primary names, which are constants. |
---|
51 | \item |
---|
52 | For safety, an enumeration instance is restricted to hold only its type's secondary names. |
---|
53 | \item |
---|
54 | There is a mechanism for \emph{enumerating} over the secondary names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically. |
---|
55 | \end{enumerate} |
---|
56 | |
---|
57 | |
---|
58 | \section{Terminology} |
---|
59 | \label{s:Terminology} |
---|
60 | |
---|
61 | The term \Newterm{enumeration} defines a type with a set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration}}. |
---|
62 | As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}. |
---|
63 | \begin{cquote} |
---|
64 | \sf\setlength{\tabcolsep}{3pt} |
---|
65 | \begin{tabular}{rcccccccr} |
---|
66 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\ |
---|
67 | $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\ |
---|
68 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\ |
---|
69 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\ |
---|
70 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\ |
---|
71 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 & |
---|
72 | \end{tabular} |
---|
73 | \end{cquote} |
---|
74 | Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@. |
---|
75 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values. |
---|
76 | The value is the constant represented by the secondary name, which can be implicitly or explicitly set. |
---|
77 | |
---|
78 | Specifying complex ordering is possible: |
---|
79 | \begin{cfa} |
---|
80 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$ |
---|
81 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$ |
---|
82 | \end{cfa} |
---|
83 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@. |
---|
84 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@. |
---|
85 | Only flat total-ordering among enumerators is considered in this work. |
---|
86 | |
---|
87 | |
---|
88 | \section{Motivation} |
---|
89 | |
---|
90 | Some programming languages only provide direct secondary renaming. |
---|
91 | \begin{cfa} |
---|
92 | const Size = 20, Pi = 3.14159, Name = "Jane"; |
---|
93 | \end{cfa} |
---|
94 | Here, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful. |
---|
95 | |
---|
96 | Secondary renaming can simulate an enumeration, but with extra effort. |
---|
97 | \begin{cfa} |
---|
98 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7; |
---|
99 | \end{cfa} |
---|
100 | Any reordering of the enumerators requires manual renumbering. |
---|
101 | \begin{cfa} |
---|
102 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7; |
---|
103 | \end{cfa} |
---|
104 | Finally, there is no type to create a type-checked instance or iterator cursor. |
---|
105 | Hence, there is only a weak equivalence between secondary naming and enumerations, justifying a seperate enumeration type in a programming language. |
---|
106 | |
---|
107 | A variant (algebraic) type is often promoted as a kind of enumeration, \ie a variant type can simulate an enumeration. |
---|
108 | Fundamentally, a variant type is a tagged-union, where the tag is normally opaque and the types are usually heterogeneous. |
---|
109 | \begin{cfa}[morekeywords={variant}] |
---|
110 | @variant@ Variant { |
---|
111 | @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S |
---|
112 | @union {@ // implicit |
---|
113 | case int i; |
---|
114 | case double d; |
---|
115 | case struct S { int i, j; } s; |
---|
116 | @};@ |
---|
117 | }; |
---|
118 | \end{cfa} |
---|
119 | Crucially, the union implies instance storage is shared by all the variant types, and therefore, before a variant type can be used in a statically-typed expression, it must be dynamically discriminated to its current contained type. |
---|
120 | Hence, knowing which type is in a variant instance is crucial for correctness. |
---|
121 | Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary. |
---|
122 | Otherwise, a tag is required to denote the particular type in the variant, and the tag is discriminated at runtime using some form of type pattern-matching, after which the value can be used in a statically-typed expression. |
---|
123 | |
---|
124 | A less frequent variant case is multiple variants with the same type, which normally requires explicit naming of the tag to disambiguate among the common types. |
---|
125 | \begin{cquote} |
---|
126 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}} |
---|
127 | \begin{cfa}[morekeywords={variant}] |
---|
128 | variant VariantCT { |
---|
129 | case @car@: int i; // explicitly typed |
---|
130 | case @boat@: int i; |
---|
131 | case @bridge@: int i; |
---|
132 | }; |
---|
133 | \end{cfa} |
---|
134 | & |
---|
135 | \begin{cfa}[morekeywords={variant}] |
---|
136 | variant VariantCU { |
---|
137 | case @car@: ; // empty or unit type |
---|
138 | case @boat@: ; |
---|
139 | case @bridge@: ; |
---|
140 | }; |
---|
141 | \end{cfa} |
---|
142 | \end{tabular} |
---|
143 | \end{cquote} |
---|
144 | Here, the explicit tag name is used to give different meaning to the values in the common @int@ type, \eg the value 3 has different interpretations depending on the tag name. |
---|
145 | It is even possible to remove the type or use the empty @unit@ type (@struct unit {}@). |
---|
146 | It is this tag naming that is used to simulate an enumeration. |
---|
147 | |
---|
148 | Normally, the variant tag is implicitly set by the compiler based on type, but with common types, a tag name is required to resolve type ambiguity. |
---|
149 | \begin{cfa} |
---|
150 | Variant v = 3; $\C{// implicitly set tag to 0 based on type of 3}$ |
---|
151 | VariantCT ve = boats.3; $\C{// explicitly set tag to 1 using tag name}$ |
---|
152 | \end{cfa} |
---|
153 | Type pattern-matching is then used to dynamically test the tag and branch to a section of statically-typed code to safely manipulate the value, \eg: |
---|
154 | \begin{cquote} |
---|
155 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}} |
---|
156 | \begin{cfa}[morekeywords={match}] |
---|
157 | @match@( v ) { // know type implicitly or test tag |
---|
158 | case int { /* only access i field */ } |
---|
159 | case double { /* only access d field */ } |
---|
160 | case S { /* only access s field */ } |
---|
161 | } |
---|
162 | \end{cfa} |
---|
163 | & |
---|
164 | \begin{cfa}[morekeywords={match}] |
---|
165 | @match@( ve ) { |
---|
166 | case car: int { /* car interpretation */ } |
---|
167 | case boat: int { /* boat interpretation */ } |
---|
168 | case bridge: int { /* bridge interpretation */ } |
---|
169 | } |
---|
170 | \end{cfa} |
---|
171 | \end{tabular} |
---|
172 | \end{cquote} |
---|
173 | For safety, some languages require all variant types to be listed or a @default@ case with no field accesses. |
---|
174 | |
---|
175 | To further strengthen the simulate for an enumeration with different values, each variant type can be a @const@ type or the tag becomes non-opaque, possibly taking advantage of the opaque auto-numbering. |
---|
176 | \begin{cquote} |
---|
177 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}} |
---|
178 | \begin{cfa} |
---|
179 | variant Week { |
---|
180 | case Mon: const int = 0; |
---|
181 | ... |
---|
182 | case Sat: const int = 5; |
---|
183 | case Sun: const int = 10; |
---|
184 | }; |
---|
185 | \end{cfa} |
---|
186 | & |
---|
187 | \begin{cfa} |
---|
188 | variant Week { |
---|
189 | case Mon: ; // tag auto-numbering |
---|
190 | ... |
---|
191 | case Sat: ; |
---|
192 | case @Sun = 10@: ; // directly set tag value |
---|
193 | }; |
---|
194 | \end{cfa} |
---|
195 | \end{tabular} |
---|
196 | \end{cquote} |
---|
197 | Directly setting the tag implies restrictions, like unique values. |
---|
198 | In both cases, instances of @Week@ are @const@ (immutable). |
---|
199 | However, usage between these two types becomes complex. |
---|
200 | \begin{cfa} |
---|
201 | Week day = Week.Mon; // sets value or tag depending on type |
---|
202 | if ( day == Week.Mon ) // dereference value or tag ? |
---|
203 | \end{cfa} |
---|
204 | Here, the dereference of @day@ should return the value of the type stored in the variant, never the tag. |
---|
205 | If it does return the tag, some special meaning must be given to the empty/unit type, especially if a variant contains both regular and unit types. |
---|
206 | |
---|
207 | |
---|
208 | In general, the enumeration simulation and the variant extensions to support it, are deviating from the normal use of a variant (union) type. |
---|
209 | As well, the enumeration operations are limited to the available tag operations, \eg pattern matching. |
---|
210 | While enumerating among tag names is possible: |
---|
211 | \begin{cfa}[morekeywords={in}] |
---|
212 | for ( cursor in Week.Mon, Week.Wed, Week.Fri, Week.Sun ) ... |
---|
213 | \end{cfa} |
---|
214 | what is the type of @cursor@? |
---|
215 | If it the tag type (@int@), how is this value used? |
---|
216 | If it is the variant type, where is the instance variable, which only contains one value. |
---|
217 | Hence, either enumerating with a variant enumeration is disallowed or some unusual typing rule must be invented to make it work but only in restricted contexts. |
---|
218 | |
---|
219 | While functional programming systems regularly re-purposing variant types into enumeration types, this process seems contrived and confusing. |
---|
220 | A variant tag is not an enumeration, it is a discriminant among a restricted set of types stored in a storage block. |
---|
221 | Hence, there is only a weak equivalence between an enumeration and variant type, justifying a seperate enumeration type in a programming language. |
---|
222 | |
---|
223 | |
---|
224 | \section{Contributions} |
---|
225 | |
---|
226 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C. |
---|
227 | On the surface, enumerations seem like a simple type. |
---|
228 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation. |
---|
229 | |
---|
230 | \begin{enumerate} |
---|
231 | \item |
---|
232 | overloading |
---|
233 | \item |
---|
234 | scoping |
---|
235 | \item |
---|
236 | typing |
---|
237 | \item |
---|
238 | subset |
---|
239 | \item |
---|
240 | inheritance |
---|
241 | \end{enumerate} |
---|