| 1 | \chapter{Introduction}
|
|---|
| 2 |
|
|---|
| 3 | All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @0xff@, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}}, \etc.
|
|---|
| 4 | Con\-stants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
|
|---|
| 5 | (In \CFA, the primary constants @0@ and @1@ can be overloaded for any type.)
|
|---|
| 6 | Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
|
|---|
| 7 | In theory, there are an infinite set of primary names per type.
|
|---|
| 8 |
|
|---|
| 9 | \Newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (mega byte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
|
|---|
| 10 | Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
|
|---|
| 11 | Its common purpose is to eliminate duplication of the primary constant throughout a program.
|
|---|
| 12 | For example, the secondary name replaces its primary name, thereafter changing the binding of the secondary to primary name automatically distributes the rebinding throughout the program.
|
|---|
| 13 | In some cases, secondary naming is \Newterm{opaque}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
|
|---|
| 14 | Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
|
|---|
| 15 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
|
|---|
| 16 |
|
|---|
| 17 | Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
|
|---|
| 18 | Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
|
|---|
| 19 | \begin{quote}
|
|---|
| 20 | enumerate (verb, transitive).
|
|---|
| 21 | To count, ascertain the number of;
|
|---|
| 22 | more usually, to mention (a number of things or persons) separately, as if for the purpose of counting;
|
|---|
| 23 | to specify as in a list or catalogue.~\cite{OEDenumerate}
|
|---|
| 24 | \end{quote}
|
|---|
| 25 | Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are \emph{often} restricted to hold only the secondary names.
|
|---|
| 26 | It is possible to enumerate among set names without having an ordering among the set elements.
|
|---|
| 27 | For example, the week, the weekdays, the weekend, and every second day of the week.
|
|---|
| 28 | \begin{cfa}[morekeywords={in}]
|
|---|
| 29 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
|
|---|
| 30 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$
|
|---|
| 31 | for ( cursor in Sat, Sun } ... $\C{// weekend}$
|
|---|
| 32 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$
|
|---|
| 33 | \end{cfa}
|
|---|
| 34 | This independence from internal representation allows multiple names to have the same representation (eighth note, quaver), giving synonyms.
|
|---|
| 35 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
|
|---|
| 36 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
|
|---|
| 37 | \begin{cfa}
|
|---|
| 38 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
|
|---|
| 39 | \end{cfa}
|
|---|
| 40 | Here the internal representations for the secondary names are logically \emph{generated} rather than listing a subset of names.
|
|---|
| 41 |
|
|---|
| 42 | Hence, the fundamental aspects of an enumeration are:
|
|---|
| 43 | \begin{enumerate}
|
|---|
| 44 | \item
|
|---|
| 45 | It defines a type from which instants can be generated.
|
|---|
| 46 | \item
|
|---|
| 47 | The type lists a finite set of secondary names, which become its primary constants.
|
|---|
| 48 | This differentiates an enumeration from general types with an infinite number of primary constants.
|
|---|
| 49 | \item
|
|---|
| 50 | An enumeration's secondary names represent constants, which follows from their binding (aliasing) to primary names, which are constants.
|
|---|
| 51 | \item
|
|---|
| 52 | For safety, an enumeration instance is restricted to hold only its type's secondary names.
|
|---|
| 53 | \item
|
|---|
| 54 | There is a mechanism for \emph{enumerating} over the secondary names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
|
|---|
| 55 | \end{enumerate}
|
|---|
| 56 |
|
|---|
| 57 |
|
|---|
| 58 | \section{Terminology}
|
|---|
| 59 | \label{s:Terminology}
|
|---|
| 60 |
|
|---|
| 61 | The term \Newterm{enumeration} defines a type with a set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration}}.
|
|---|
| 62 | As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
|
|---|
| 63 | \begin{cquote}
|
|---|
| 64 | \sf\setlength{\tabcolsep}{3pt}
|
|---|
| 65 | \begin{tabular}{rcccccccr}
|
|---|
| 66 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
|
|---|
| 67 | $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\
|
|---|
| 68 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\
|
|---|
| 69 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
|
|---|
| 70 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
|
|---|
| 71 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 &
|
|---|
| 72 | \end{tabular}
|
|---|
| 73 | \end{cquote}
|
|---|
| 74 | Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
|
|---|
| 75 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
|
|---|
| 76 | The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
|
|---|
| 77 |
|
|---|
| 78 | Specifying complex ordering is possible:
|
|---|
| 79 | \begin{cfa}
|
|---|
| 80 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
|
|---|
| 81 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
|
|---|
| 82 | \end{cfa}
|
|---|
| 83 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
|
|---|
| 84 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
|
|---|
| 85 | Only flat total-ordering among enumerators is considered in this work.
|
|---|
| 86 |
|
|---|
| 87 |
|
|---|
| 88 | \section{Motivation}
|
|---|
| 89 |
|
|---|
| 90 | Some programming languages only provide direct secondary renaming.
|
|---|
| 91 | \begin{cfa}
|
|---|
| 92 | const Size = 20, Pi = 3.14159, Name = "Jane";
|
|---|
| 93 | \end{cfa}
|
|---|
| 94 | Here, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful.
|
|---|
| 95 |
|
|---|
| 96 | Secondary renaming can simulate an enumeration, but with extra effort.
|
|---|
| 97 | \begin{cfa}
|
|---|
| 98 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
|
|---|
| 99 | \end{cfa}
|
|---|
| 100 | Any reordering of the enumerators requires manual renumbering.
|
|---|
| 101 | \begin{cfa}
|
|---|
| 102 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
|
|---|
| 103 | \end{cfa}
|
|---|
| 104 | Finally, there is no type to create a type-checked instance or iterator cursor.
|
|---|
| 105 | Hence, there is only a weak equivalence between secondary naming and enumerations, justifying a seperate enumeration type in a programming language.
|
|---|
| 106 |
|
|---|
| 107 | A variant (algebraic) type is often promoted as a kind of enumeration, \ie a variant type can simulate an enumeration.
|
|---|
| 108 | Fundamentally, a variant type is a tagged-union, where the tag is normally opaque and the types are usually heterogeneous.
|
|---|
| 109 | \begin{cfa}[morekeywords={variant}]
|
|---|
| 110 | @variant@ Variant {
|
|---|
| 111 | @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S
|
|---|
| 112 | @union {@ // implicit
|
|---|
| 113 | case int i;
|
|---|
| 114 | case double d;
|
|---|
| 115 | case struct S { int i, j; } s;
|
|---|
| 116 | @};@
|
|---|
| 117 | };
|
|---|
| 118 | \end{cfa}
|
|---|
| 119 | Crucially, the union implies instance storage is shared by all the variant types, and therefore, before a variant type can be used in a statically-typed expression, it must be dynamically discriminated to its current contained type.
|
|---|
| 120 | Hence, knowing which type is in a variant instance is crucial for correctness.
|
|---|
| 121 | Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary.
|
|---|
| 122 | Otherwise, a tag is required to denote the particular type in the variant, and the tag is discriminated at runtime using some form of type pattern-matching, after which the value can be used in a statically-typed expression.
|
|---|
| 123 |
|
|---|
| 124 | A less frequent variant case is multiple variants with the same type, which normally requires explicit naming of the tag to disambiguate among the common types.
|
|---|
| 125 | \begin{cquote}
|
|---|
| 126 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
|
|---|
| 127 | \begin{cfa}[morekeywords={variant}]
|
|---|
| 128 | variant VariantCT {
|
|---|
| 129 | case @car@: int i; // explicitly typed
|
|---|
| 130 | case @boat@: int i;
|
|---|
| 131 | case @bridge@: int i;
|
|---|
| 132 | };
|
|---|
| 133 | \end{cfa}
|
|---|
| 134 | &
|
|---|
| 135 | \begin{cfa}[morekeywords={variant}]
|
|---|
| 136 | variant VariantCU {
|
|---|
| 137 | case @car@: ; // empty or unit type
|
|---|
| 138 | case @boat@: ;
|
|---|
| 139 | case @bridge@: ;
|
|---|
| 140 | };
|
|---|
| 141 | \end{cfa}
|
|---|
| 142 | \end{tabular}
|
|---|
| 143 | \end{cquote}
|
|---|
| 144 | Here, the explicit tag name is used to give different meaning to the values in the common @int@ type, \eg the value 3 has different interpretations depending on the tag name.
|
|---|
| 145 | It is even possible to remove the type or use the empty @unit@ type (@struct unit {}@).
|
|---|
| 146 | It is this tag naming that is used to simulate an enumeration.
|
|---|
| 147 |
|
|---|
| 148 | Normally, the variant tag is implicitly set by the compiler based on type, but with common types, a tag name is required to resolve type ambiguity.
|
|---|
| 149 | \begin{cfa}
|
|---|
| 150 | Variant v = 3; $\C{// implicitly set tag to 0 based on type of 3}$
|
|---|
| 151 | VariantCT ve = boats.3; $\C{// explicitly set tag to 1 using tag name}$
|
|---|
| 152 | \end{cfa}
|
|---|
| 153 | Type pattern-matching is then used to dynamically test the tag and branch to a section of statically-typed code to safely manipulate the value, \eg:
|
|---|
| 154 | \begin{cquote}
|
|---|
| 155 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
|
|---|
| 156 | \begin{cfa}[morekeywords={match}]
|
|---|
| 157 | @match@( v ) { // know type implicitly or test tag
|
|---|
| 158 | case int { /* only access i field */ }
|
|---|
| 159 | case double { /* only access d field */ }
|
|---|
| 160 | case S { /* only access s field */ }
|
|---|
| 161 | }
|
|---|
| 162 | \end{cfa}
|
|---|
| 163 | &
|
|---|
| 164 | \begin{cfa}[morekeywords={match}]
|
|---|
| 165 | @match@( ve ) {
|
|---|
| 166 | case car: int { /* car interpretation */ }
|
|---|
| 167 | case boat: int { /* boat interpretation */ }
|
|---|
| 168 | case bridge: int { /* bridge interpretation */ }
|
|---|
| 169 | }
|
|---|
| 170 | \end{cfa}
|
|---|
| 171 | \end{tabular}
|
|---|
| 172 | \end{cquote}
|
|---|
| 173 | For safety, some languages require all variant types to be listed or a @default@ case with no field accesses.
|
|---|
| 174 |
|
|---|
| 175 | To further strengthen the simulate for an enumeration with different values, each variant type can be a @const@ type or the tag becomes non-opaque, possibly taking advantage of the opaque auto-numbering.
|
|---|
| 176 | \begin{cquote}
|
|---|
| 177 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
|
|---|
| 178 | \begin{cfa}
|
|---|
| 179 | variant Week {
|
|---|
| 180 | case Mon: const int = 0;
|
|---|
| 181 | ...
|
|---|
| 182 | case Sat: const int = 5;
|
|---|
| 183 | case Sun: const int = 10;
|
|---|
| 184 | };
|
|---|
| 185 | \end{cfa}
|
|---|
| 186 | &
|
|---|
| 187 | \begin{cfa}
|
|---|
| 188 | variant Week {
|
|---|
| 189 | case Mon: ; // tag auto-numbering
|
|---|
| 190 | ...
|
|---|
| 191 | case Sat: ;
|
|---|
| 192 | case @Sun = 10@: ; // directly set tag value
|
|---|
| 193 | };
|
|---|
| 194 | \end{cfa}
|
|---|
| 195 | \end{tabular}
|
|---|
| 196 | \end{cquote}
|
|---|
| 197 | Directly setting the tag implies restrictions, like unique values.
|
|---|
| 198 | In both cases, instances of @Week@ are @const@ (immutable).
|
|---|
| 199 | However, usage between these two types becomes complex.
|
|---|
| 200 | \begin{cfa}
|
|---|
| 201 | Week day = Week.Mon; // sets value or tag depending on type
|
|---|
| 202 | if ( day == Week.Mon ) // dereference value or tag ?
|
|---|
| 203 | \end{cfa}
|
|---|
| 204 | Here, the dereference of @day@ should return the value of the type stored in the variant, never the tag.
|
|---|
| 205 | If it does return the tag, some special meaning must be given to the empty/unit type, especially if a variant contains both regular and unit types.
|
|---|
| 206 |
|
|---|
| 207 |
|
|---|
| 208 | In general, the enumeration simulation and the variant extensions to support it, are deviating from the normal use of a variant (union) type.
|
|---|
| 209 | As well, the enumeration operations are limited to the available tag operations, \eg pattern matching.
|
|---|
| 210 | While enumerating among tag names is possible:
|
|---|
| 211 | \begin{cfa}[morekeywords={in}]
|
|---|
| 212 | for ( cursor in Week.Mon, Week.Wed, Week.Fri, Week.Sun ) ...
|
|---|
| 213 | \end{cfa}
|
|---|
| 214 | what is the type of @cursor@?
|
|---|
| 215 | If it the tag type (@int@), how is this value used?
|
|---|
| 216 | If it is the variant type, where is the instance variable, which only contains one value.
|
|---|
| 217 | Hence, either enumerating with a variant enumeration is disallowed or some unusual typing rule must be invented to make it work but only in restricted contexts.
|
|---|
| 218 |
|
|---|
| 219 | While functional programming systems regularly re-purposing variant types into enumeration types, this process seems contrived and confusing.
|
|---|
| 220 | A variant tag is not an enumeration, it is a discriminant among a restricted set of types stored in a storage block.
|
|---|
| 221 | Hence, there is only a weak equivalence between an enumeration and variant type, justifying a seperate enumeration type in a programming language.
|
|---|
| 222 |
|
|---|
| 223 |
|
|---|
| 224 | \section{Contributions}
|
|---|
| 225 |
|
|---|
| 226 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
|
|---|
| 227 | On the surface, enumerations seem like a simple type.
|
|---|
| 228 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
|
|---|
| 229 |
|
|---|
| 230 | \begin{enumerate}
|
|---|
| 231 | \item
|
|---|
| 232 | overloading
|
|---|
| 233 | \item
|
|---|
| 234 | scoping
|
|---|
| 235 | \item
|
|---|
| 236 | typing
|
|---|
| 237 | \item
|
|---|
| 238 | subset
|
|---|
| 239 | \item
|
|---|
| 240 | inheritance
|
|---|
| 241 | \end{enumerate}
|
|---|