[18ebc28] | 1 | \chapter{Introduction}
|
---|
| 2 |
|
---|
[41fb996] | 3 | All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @12345@, \etc.
|
---|
[48b76d03] | 4 | Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
|
---|
[caf2cba] | 5 | Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
|
---|
| 6 | In theory, there are an infinite set of primary names per type.
|
---|
[18ebc28] | 7 |
|
---|
[41fb996] | 8 | \Newterm{Secondary naming} is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
|
---|
[caf2cba] | 9 | Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
|
---|
| 10 | In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
|
---|
[41fb996] | 11 | (The names the thing.)
|
---|
[caf2cba] | 12 | Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
|
---|
| 13 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
|
---|
| 14 |
|
---|
[48b76d03] | 15 | Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
|
---|
[caf2cba] | 16 | Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
|
---|
| 17 | \begin{quote}
|
---|
| 18 | enumerate (verb, transitive).
|
---|
| 19 | To count, ascertain the number of;
|
---|
| 20 | \emph{more
|
---|
| 21 | usually, to mention (a number of things or persons) separately, as if for the
|
---|
| 22 | purpose of counting};
|
---|
| 23 | to specify as in a list or catalogue.~\cite{OED}
|
---|
| 24 | \end{quote}
|
---|
[41fb996] | 25 | Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to hold only the secondary names.
|
---|
[caf2cba] | 26 | It is possible to enumerate among set names without having an ordering among the set elements.
|
---|
[48b76d03] | 27 | For example, the week, the weekdays, the weekend, and every second day of the week.
|
---|
| 28 | \begin{cfa}[morekeywords={in}]
|
---|
| 29 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
|
---|
| 30 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$
|
---|
| 31 | for ( cursor in Thu, Fri } ... $\C{// weekend}$
|
---|
| 32 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$
|
---|
| 33 | \end{cfa}
|
---|
[caf2cba] | 34 | This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms.
|
---|
[48b76d03] | 35 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
|
---|
[caf2cba] | 36 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
|
---|
| 37 | \begin{cfa}
|
---|
| 38 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
|
---|
| 39 | \end{cfa}
|
---|
[48b76d03] | 40 | Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names.
|
---|
| 41 |
|
---|
[956299b] | 42 |
|
---|
[48b76d03] | 43 | \section{Terminology}
|
---|
| 44 |
|
---|
| 45 | The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name.
|
---|
| 46 | As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
|
---|
[956299b] | 47 | \begin{cquote}
|
---|
[48b76d03] | 48 | \sf\setlength{\tabcolsep}{3pt}
|
---|
[caf2cba] | 49 | \begin{tabular}{rcccccccr}
|
---|
| 50 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
|
---|
| 51 | $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\
|
---|
[41fb996] | 52 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\
|
---|
[48b76d03] | 53 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
|
---|
| 54 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
|
---|
| 55 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 &
|
---|
[956299b] | 56 | \end{tabular}
|
---|
| 57 | \end{cquote}
|
---|
[41fb996] | 58 | Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
|
---|
[caf2cba] | 59 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
|
---|
[48b76d03] | 60 | The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
|
---|
| 61 |
|
---|
[caf2cba] | 62 | Specifying complex ordering is possible:
|
---|
| 63 | \begin{cfa}
|
---|
| 64 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
|
---|
| 65 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
|
---|
| 66 | \end{cfa}
|
---|
| 67 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
|
---|
| 68 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
|
---|
| 69 | Only flat total-ordering among enumerators is considered in this work.
|
---|
[7d9a805b] | 70 |
|
---|
| 71 |
|
---|
[48b76d03] | 72 | \section{Motivation}
|
---|
[caf2cba] | 73 |
|
---|
| 74 | Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering.
|
---|
| 75 | \begin{cfa}
|
---|
| 76 | const Size = 20, Pi = 3.14159;
|
---|
| 77 | enum { Size = 20, Pi = 3.14159 }; // unnamed enumeration $\(\Rightarrow\)$ no ordering
|
---|
| 78 | \end{cfa}
|
---|
[48b76d03] | 79 | In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful;
|
---|
| 80 | however, without an enumeration type-name, it is impossible to create an iterator cursor.
|
---|
[caf2cba] | 81 |
|
---|
| 82 | Secondary renaming can similate an enumeration, but with extra effort.
|
---|
| 83 | \begin{cfa}
|
---|
| 84 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
|
---|
| 85 | \end{cfa}
|
---|
| 86 | Furthermore, reordering the enumerators requires manual renumbering.
|
---|
| 87 | \begin{cfa}
|
---|
| 88 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
|
---|
| 89 | \end{cfa}
|
---|
[48b76d03] | 90 | Finally, there is no common type to create a type-checked instance or iterator cursor.
|
---|
[caf2cba] | 91 | Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language.
|
---|
| 92 |
|
---|
| 93 | A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration.
|
---|
| 94 | A variant type is a tagged-union, where the possible types may be heterogeneous.
|
---|
| 95 | \begin{cfa}
|
---|
| 96 | @variant@ Variant {
|
---|
| 97 | @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S
|
---|
| 98 | @union {@ // implicit
|
---|
| 99 | case int i;
|
---|
| 100 | case double d;
|
---|
| 101 | case struct S { int i, j; } s;
|
---|
| 102 | @};@
|
---|
| 103 | };
|
---|
| 104 | \end{cfa}
|
---|
| 105 | Crucially, the union implies instance storage is shared by all of the variant types.
|
---|
| 106 | Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}.
|
---|
| 107 | Knowing which type is in a variant instance is crucial for correctness.
|
---|
[41fb996] | 108 | Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary;
|
---|
[48b76d03] | 109 | otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching.
|
---|
| 110 |
|
---|
| 111 | The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer.
|
---|
| 112 | Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg:
|
---|
| 113 | \begin{cfa}[morekeywords={match}]
|
---|
| 114 | Variant v = 3; // implicitly set tag to 0
|
---|
| 115 | @match@( v ) { // know the type or test the tag
|
---|
[caf2cba] | 116 | case int { /* only access i field in v */ }
|
---|
| 117 | case double { /* only access d field in v */ }
|
---|
| 118 | case S { /* only access s field in v */ }
|
---|
| 119 | }
|
---|
| 120 | \end{cfa}
|
---|
| 121 | For safety, either all variant types must be listed or a @default@ case must exist with no field accesses.
|
---|
| 122 |
|
---|
[41fb996] | 123 | To simulate an enumeration with a variant, the tag is \emph{re-purposed} for either ordering or value and the variant types are omitted.
|
---|
[caf2cba] | 124 | \begin{cfa}
|
---|
| 125 | variant Weekday {
|
---|
| 126 | int tag; // implicit 0 => Mon, ..., 6 => Sun
|
---|
| 127 | @case Mon;@ // no type
|
---|
| 128 | ...
|
---|
| 129 | @case Sun;@
|
---|
| 130 | };
|
---|
| 131 | \end{cfa}
|
---|
[41fb996] | 132 | The type system ensures tag setting and testing are correctly done.
|
---|
[caf2cba] | 133 | However, the enumeration operations are limited to the available tag operations, \eg pattern matching.
|
---|
| 134 | \begin{cfa}
|
---|
[41fb996] | 135 | Week week = Mon;
|
---|
| 136 | if ( @dynamic_cast(Mon)@week ) ... // test tag == Mon
|
---|
[caf2cba] | 137 | \end{cfa}
|
---|
| 138 | While enumerating among tag names is possible:
|
---|
[48b76d03] | 139 | \begin{cfa}[morekeywords={in}]
|
---|
| 140 | for ( cursor in Mon, Wed, Fri, Sun ) ...
|
---|
[caf2cba] | 141 | \end{cfa}
|
---|
| 142 | ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@.
|
---|
| 143 |
|
---|
| 144 | However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values.
|
---|
[48b76d03] | 145 | Iterating using tag ordering and heterogeneous types, also requires pattern matching.
|
---|
[41fb996] | 146 | \begin{cfa}[morekeywords={match}]
|
---|
[caf2cba] | 147 | for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) {
|
---|
[41fb996] | 148 | match( cursor ) {
|
---|
[caf2cba] | 149 | case Mon { /* access special type for Mon */ }
|
---|
| 150 | ...
|
---|
| 151 | case Fri { /* access special type for Fri */ }
|
---|
[41fb996] | 152 | default
|
---|
[caf2cba] | 153 | }
|
---|
| 154 | }
|
---|
| 155 | \end{cfa}
|
---|
[41fb996] | 156 | If the variant type is changed by adding/removing types or the loop range changes, the pattern matching must be adjusted.
|
---|
| 157 | As well, if the start/stop values are dynamic, it may be impossible to statically determine if all variant types are listed.
|
---|
[caf2cba] | 158 |
|
---|
[41fb996] | 159 | Re-purposing the notion of enumerating into variant types is ill formed and confusing.
|
---|
[48b76d03] | 160 | Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language.
|
---|
[caf2cba] | 161 |
|
---|
[f9da761] | 162 |
|
---|
| 163 | \section{Contributions}
|
---|
[7d9a805b] | 164 |
|
---|
[caf2cba] | 165 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
|
---|
| 166 | On the surface, enumerations seem like a simple type.
|
---|
[41fb996] | 167 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
|
---|
[48b76d03] | 168 |
|
---|
| 169 | \begin{enumerate}
|
---|
| 170 | \item
|
---|
| 171 | overloading
|
---|
| 172 | \item
|
---|
| 173 | scoping
|
---|
| 174 | \item
|
---|
| 175 | typing
|
---|
| 176 | \item
|
---|
| 177 | subset
|
---|
| 178 | \item
|
---|
| 179 | inheritance
|
---|
| 180 | \end{enumerate}
|
---|