| 1 | \chapter{Introduction}
|
|---|
| 2 |
|
|---|
| 3 | All types in a programming language must have a set of constants, and these constants have primary names, \eg integral types have constants @-1@, @17@, @12345@, \etc.
|
|---|
| 4 | Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
|
|---|
| 5 | Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
|
|---|
| 6 | In theory, there are an infinite set of primary names per type.
|
|---|
| 7 |
|
|---|
| 8 | Secondary naming is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
|
|---|
| 9 | Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
|
|---|
| 10 | In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
|
|---|
| 11 | Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
|
|---|
| 12 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
|
|---|
| 13 |
|
|---|
| 14 | Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
|
|---|
| 15 | Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
|
|---|
| 16 | \begin{quote}
|
|---|
| 17 | enumerate (verb, transitive).
|
|---|
| 18 | To count, ascertain the number of;
|
|---|
| 19 | \emph{more
|
|---|
| 20 | usually, to mention (a number of things or persons) separately, as if for the
|
|---|
| 21 | purpose of counting};
|
|---|
| 22 | to specify as in a list or catalogue.~\cite{OED}
|
|---|
| 23 | \end{quote}
|
|---|
| 24 | Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to its secondary names.
|
|---|
| 25 | It is possible to enumerate among set names without having an ordering among the set elements.
|
|---|
| 26 | For example, the week, the weekdays, the weekend, and every second day of the week.
|
|---|
| 27 | \begin{cfa}[morekeywords={in}]
|
|---|
| 28 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
|
|---|
| 29 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$
|
|---|
| 30 | for ( cursor in Thu, Fri } ... $\C{// weekend}$
|
|---|
| 31 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$
|
|---|
| 32 | \end{cfa}
|
|---|
| 33 | This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms.
|
|---|
| 34 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
|
|---|
| 35 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
|
|---|
| 36 | \begin{cfa}
|
|---|
| 37 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
|
|---|
| 38 | \end{cfa}
|
|---|
| 39 | Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names.
|
|---|
| 40 |
|
|---|
| 41 |
|
|---|
| 42 | \section{Terminology}
|
|---|
| 43 |
|
|---|
| 44 | The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name.
|
|---|
| 45 | As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
|
|---|
| 46 | \begin{cquote}
|
|---|
| 47 | \sf\setlength{\tabcolsep}{3pt}
|
|---|
| 48 | \begin{tabular}{rcccccccr}
|
|---|
| 49 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
|
|---|
| 50 | $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\
|
|---|
| 51 | @enum@ Weekday \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\
|
|---|
| 52 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
|
|---|
| 53 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
|
|---|
| 54 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 &
|
|---|
| 55 | \end{tabular}
|
|---|
| 56 | \end{cquote}
|
|---|
| 57 | Here, the enumeration @Weekday@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
|
|---|
| 58 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
|
|---|
| 59 | The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
|
|---|
| 60 |
|
|---|
| 61 | Specifying complex ordering is possible:
|
|---|
| 62 | \begin{cfa}
|
|---|
| 63 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
|
|---|
| 64 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
|
|---|
| 65 | \end{cfa}
|
|---|
| 66 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
|
|---|
| 67 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
|
|---|
| 68 | Only flat total-ordering among enumerators is considered in this work.
|
|---|
| 69 |
|
|---|
| 70 |
|
|---|
| 71 | \section{Motivation}
|
|---|
| 72 |
|
|---|
| 73 | Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering.
|
|---|
| 74 | \begin{cfa}
|
|---|
| 75 | const Size = 20, Pi = 3.14159;
|
|---|
| 76 | enum { Size = 20, Pi = 3.14159 }; // unnamed enumeration $\(\Rightarrow\)$ no ordering
|
|---|
| 77 | \end{cfa}
|
|---|
| 78 | In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful;
|
|---|
| 79 | however, without an enumeration type-name, it is impossible to create an iterator cursor.
|
|---|
| 80 |
|
|---|
| 81 | Secondary renaming can similate an enumeration, but with extra effort.
|
|---|
| 82 | \begin{cfa}
|
|---|
| 83 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
|
|---|
| 84 | \end{cfa}
|
|---|
| 85 | Furthermore, reordering the enumerators requires manual renumbering.
|
|---|
| 86 | \begin{cfa}
|
|---|
| 87 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
|
|---|
| 88 | \end{cfa}
|
|---|
| 89 | Finally, there is no common type to create a type-checked instance or iterator cursor.
|
|---|
| 90 | Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language.
|
|---|
| 91 |
|
|---|
| 92 | A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration.
|
|---|
| 93 | A variant type is a tagged-union, where the possible types may be heterogeneous.
|
|---|
| 94 | \begin{cfa}
|
|---|
| 95 | @variant@ Variant {
|
|---|
| 96 | @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S
|
|---|
| 97 | @union {@ // implicit
|
|---|
| 98 | case int i;
|
|---|
| 99 | case double d;
|
|---|
| 100 | case struct S { int i, j; } s;
|
|---|
| 101 | @};@
|
|---|
| 102 | };
|
|---|
| 103 | \end{cfa}
|
|---|
| 104 | Crucially, the union implies instance storage is shared by all of the variant types.
|
|---|
| 105 | Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}.
|
|---|
| 106 | Knowing which type is in a variant instance is crucial for correctness.
|
|---|
| 107 | Occasionally, it is possible to statically determine, all regions where each variant type is used, so a tag and runtime checking is unnecessary;
|
|---|
| 108 | otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching.
|
|---|
| 109 |
|
|---|
| 110 | The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer.
|
|---|
| 111 | Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg:
|
|---|
| 112 | \begin{cfa}[morekeywords={match}]
|
|---|
| 113 | Variant v = 3; // implicitly set tag to 0
|
|---|
| 114 | @match@( v ) { // know the type or test the tag
|
|---|
| 115 | case int { /* only access i field in v */ }
|
|---|
| 116 | case double { /* only access d field in v */ }
|
|---|
| 117 | case S { /* only access s field in v */ }
|
|---|
| 118 | }
|
|---|
| 119 | \end{cfa}
|
|---|
| 120 | For safety, either all variant types must be listed or a @default@ case must exist with no field accesses.
|
|---|
| 121 |
|
|---|
| 122 | To simulate an enumeration with a variant, the tag is re-purposed for either ordering or value and the variant types are omitted.
|
|---|
| 123 | \begin{cfa}
|
|---|
| 124 | variant Weekday {
|
|---|
| 125 | int tag; // implicit 0 => Mon, ..., 6 => Sun
|
|---|
| 126 | @case Mon;@ // no type
|
|---|
| 127 | ...
|
|---|
| 128 | @case Sun;@
|
|---|
| 129 | };
|
|---|
| 130 | \end{cfa}
|
|---|
| 131 | The type system ensures tag setting and testing are correct.
|
|---|
| 132 | However, the enumeration operations are limited to the available tag operations, \eg pattern matching.
|
|---|
| 133 | \begin{cfa}
|
|---|
| 134 | Weekday weekday = Mon;
|
|---|
| 135 | if ( @dynamic_cast(Mon)@weekday ) ... // test tag == Mon
|
|---|
| 136 | \end{cfa}
|
|---|
| 137 | While enumerating among tag names is possible:
|
|---|
| 138 | \begin{cfa}[morekeywords={in}]
|
|---|
| 139 | for ( cursor in Mon, Wed, Fri, Sun ) ...
|
|---|
| 140 | \end{cfa}
|
|---|
| 141 | ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@.
|
|---|
| 142 |
|
|---|
| 143 | However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values.
|
|---|
| 144 | Iterating using tag ordering and heterogeneous types, also requires pattern matching.
|
|---|
| 145 | \begin{cfa}
|
|---|
| 146 | for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) {
|
|---|
| 147 | switch( cursor ) {
|
|---|
| 148 | case Mon { /* access special type for Mon */ }
|
|---|
| 149 | ...
|
|---|
| 150 | case Fri { /* access special type for Fri */ }
|
|---|
| 151 | }
|
|---|
| 152 | }
|
|---|
| 153 | \end{cfa}
|
|---|
| 154 | If the variant type adds/removes types or the loop range changes, the pattern matching must be adjusted.
|
|---|
| 155 | As well, if the start/stop values are dynamic, it is impossible to statically determine if all variant types are listed.
|
|---|
| 156 |
|
|---|
| 157 | Forcing the notion of enumerating into variant types is ill formed and confusing.
|
|---|
| 158 | Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language.
|
|---|
| 159 |
|
|---|
| 160 |
|
|---|
| 161 | \section{Contributions}
|
|---|
| 162 |
|
|---|
| 163 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
|
|---|
| 164 | On the surface, enumerations seem like a simple type.
|
|---|
| 165 | However, when extended with advanced features, enumerations become complex for both the type system and the implementation.
|
|---|
| 166 |
|
|---|
| 167 | \begin{enumerate}
|
|---|
| 168 | \item
|
|---|
| 169 | overloading
|
|---|
| 170 | \item
|
|---|
| 171 | scoping
|
|---|
| 172 | \item
|
|---|
| 173 | typing
|
|---|
| 174 | \item
|
|---|
| 175 | subset
|
|---|
| 176 | \item
|
|---|
| 177 | inheritance
|
|---|
| 178 | \end{enumerate}
|
|---|