1 | \chapter{Introduction} |
---|
2 | |
---|
3 | All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @12345@, \etc. |
---|
4 | Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types. |
---|
5 | Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types. |
---|
6 | In theory, there are an infinite set of primary names per type. |
---|
7 | |
---|
8 | \Newterm{Secondary naming} is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc. |
---|
9 | Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name. |
---|
10 | In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@. |
---|
11 | (The names the thing.) |
---|
12 | Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{ |
---|
13 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. |
---|
14 | |
---|
15 | Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc. |
---|
16 | Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}. |
---|
17 | \begin{quote} |
---|
18 | enumerate (verb, transitive). |
---|
19 | To count, ascertain the number of; |
---|
20 | \emph{more |
---|
21 | usually, to mention (a number of things or persons) separately, as if for the |
---|
22 | purpose of counting}; |
---|
23 | to specify as in a list or catalogue.~\cite{OED} |
---|
24 | \end{quote} |
---|
25 | Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to hold only the secondary names. |
---|
26 | It is possible to enumerate among set names without having an ordering among the set elements. |
---|
27 | For example, the week, the weekdays, the weekend, and every second day of the week. |
---|
28 | \begin{cfa}[morekeywords={in}] |
---|
29 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$ |
---|
30 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$ |
---|
31 | for ( cursor in Thu, Fri } ... $\C{// weekend}$ |
---|
32 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$ |
---|
33 | \end{cfa} |
---|
34 | This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms. |
---|
35 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after. |
---|
36 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg |
---|
37 | \begin{cfa} |
---|
38 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ... |
---|
39 | \end{cfa} |
---|
40 | Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names. |
---|
41 | |
---|
42 | |
---|
43 | \section{Terminology} |
---|
44 | |
---|
45 | The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name. |
---|
46 | As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}. |
---|
47 | \begin{cquote} |
---|
48 | \sf\setlength{\tabcolsep}{3pt} |
---|
49 | \begin{tabular}{rcccccccr} |
---|
50 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\ |
---|
51 | $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\ |
---|
52 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\ |
---|
53 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\ |
---|
54 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\ |
---|
55 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 & |
---|
56 | \end{tabular} |
---|
57 | \end{cquote} |
---|
58 | Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@. |
---|
59 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values. |
---|
60 | The value is the constant represented by the secondary name, which can be implicitly or explicitly set. |
---|
61 | |
---|
62 | Specifying complex ordering is possible: |
---|
63 | \begin{cfa} |
---|
64 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$ |
---|
65 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$ |
---|
66 | \end{cfa} |
---|
67 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@. |
---|
68 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@. |
---|
69 | Only flat total-ordering among enumerators is considered in this work. |
---|
70 | |
---|
71 | |
---|
72 | \section{Motivation} |
---|
73 | |
---|
74 | Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering. |
---|
75 | \begin{cfa} |
---|
76 | const Size = 20, Pi = 3.14159; |
---|
77 | enum { Size = 20, Pi = 3.14159 }; // unnamed enumeration $\(\Rightarrow\)$ no ordering |
---|
78 | \end{cfa} |
---|
79 | In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful; |
---|
80 | however, without an enumeration type-name, it is impossible to create an iterator cursor. |
---|
81 | |
---|
82 | Secondary renaming can similate an enumeration, but with extra effort. |
---|
83 | \begin{cfa} |
---|
84 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7; |
---|
85 | \end{cfa} |
---|
86 | Furthermore, reordering the enumerators requires manual renumbering. |
---|
87 | \begin{cfa} |
---|
88 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7; |
---|
89 | \end{cfa} |
---|
90 | Finally, there is no common type to create a type-checked instance or iterator cursor. |
---|
91 | Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language. |
---|
92 | |
---|
93 | A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration. |
---|
94 | A variant type is a tagged-union, where the possible types may be heterogeneous. |
---|
95 | \begin{cfa} |
---|
96 | @variant@ Variant { |
---|
97 | @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S |
---|
98 | @union {@ // implicit |
---|
99 | case int i; |
---|
100 | case double d; |
---|
101 | case struct S { int i, j; } s; |
---|
102 | @};@ |
---|
103 | }; |
---|
104 | \end{cfa} |
---|
105 | Crucially, the union implies instance storage is shared by all of the variant types. |
---|
106 | Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}. |
---|
107 | Knowing which type is in a variant instance is crucial for correctness. |
---|
108 | Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary; |
---|
109 | otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching. |
---|
110 | |
---|
111 | The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer. |
---|
112 | Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg: |
---|
113 | \begin{cfa}[morekeywords={match}] |
---|
114 | Variant v = 3; // implicitly set tag to 0 |
---|
115 | @match@( v ) { // know the type or test the tag |
---|
116 | case int { /* only access i field in v */ } |
---|
117 | case double { /* only access d field in v */ } |
---|
118 | case S { /* only access s field in v */ } |
---|
119 | } |
---|
120 | \end{cfa} |
---|
121 | For safety, either all variant types must be listed or a @default@ case must exist with no field accesses. |
---|
122 | |
---|
123 | To simulate an enumeration with a variant, the tag is \emph{re-purposed} for either ordering or value and the variant types are omitted. |
---|
124 | \begin{cfa} |
---|
125 | variant Weekday { |
---|
126 | int tag; // implicit 0 => Mon, ..., 6 => Sun |
---|
127 | @case Mon;@ // no type |
---|
128 | ... |
---|
129 | @case Sun;@ |
---|
130 | }; |
---|
131 | \end{cfa} |
---|
132 | The type system ensures tag setting and testing are correctly done. |
---|
133 | However, the enumeration operations are limited to the available tag operations, \eg pattern matching. |
---|
134 | \begin{cfa} |
---|
135 | Week week = Mon; |
---|
136 | if ( @dynamic_cast(Mon)@week ) ... // test tag == Mon |
---|
137 | \end{cfa} |
---|
138 | While enumerating among tag names is possible: |
---|
139 | \begin{cfa}[morekeywords={in}] |
---|
140 | for ( cursor in Mon, Wed, Fri, Sun ) ... |
---|
141 | \end{cfa} |
---|
142 | ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@. |
---|
143 | |
---|
144 | However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values. |
---|
145 | Iterating using tag ordering and heterogeneous types, also requires pattern matching. |
---|
146 | \begin{cfa}[morekeywords={match}] |
---|
147 | for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) { |
---|
148 | match( cursor ) { |
---|
149 | case Mon { /* access special type for Mon */ } |
---|
150 | ... |
---|
151 | case Fri { /* access special type for Fri */ } |
---|
152 | default |
---|
153 | } |
---|
154 | } |
---|
155 | \end{cfa} |
---|
156 | If the variant type is changed by adding/removing types or the loop range changes, the pattern matching must be adjusted. |
---|
157 | As well, if the start/stop values are dynamic, it may be impossible to statically determine if all variant types are listed. |
---|
158 | |
---|
159 | Re-purposing the notion of enumerating into variant types is ill formed and confusing. |
---|
160 | Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language. |
---|
161 | |
---|
162 | |
---|
163 | \section{Contributions} |
---|
164 | |
---|
165 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C. |
---|
166 | On the surface, enumerations seem like a simple type. |
---|
167 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation. |
---|
168 | |
---|
169 | \begin{enumerate} |
---|
170 | \item |
---|
171 | overloading |
---|
172 | \item |
---|
173 | scoping |
---|
174 | \item |
---|
175 | typing |
---|
176 | \item |
---|
177 | subset |
---|
178 | \item |
---|
179 | inheritance |
---|
180 | \end{enumerate} |
---|