| [18ebc28] | 1 | \chapter{Introduction}
|
|---|
| 2 |
|
|---|
| [caaf424] | 3 | All types in a programming language must have a set of constants, and these constants have \newterm{primary names}, \eg integral types have constants @-1@, @17@, @0xff@, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}}, \etc.
|
|---|
| [4da9142] | 4 | Con\-stants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
|
|---|
| 5 | (In \CFA, the primary constants @0@ and @1@ can be overloaded for any type.)
|
|---|
| [caf2cba] | 6 | Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
|
|---|
| [314c9d8] | 7 | In theory, there are an infinite set of primary constant names per type.
|
|---|
| [18ebc28] | 8 |
|
|---|
| [caaf424] | 9 | \newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (megabyte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
|
|---|
| 10 | Many programming languages capture this important software-engineering capability through a mechanism called \newterm{constant} or \newterm{literal} naming, where a secondary name is aliased to a primary name.
|
|---|
| [314c9d8] | 11 | Its purpose is for readability and to eliminate duplication of the primary constant throughout a program.
|
|---|
| 12 | For example, a meaningful secondary name replaces a primary name throughout a program;
|
|---|
| 13 | thereafter, changing the binding of the secondary to primary name automatically distributes the rebinding, preventing errors.
|
|---|
| [caaf424] | 14 | In some cases, secondary naming is \newterm{opaque}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
|
|---|
| 15 | Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \newterm{rvalue}\footnote{
|
|---|
| [caf2cba] | 16 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
|
|---|
| 17 |
|
|---|
| [314c9d8] | 18 | Secondary names can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
|
|---|
| [caaf424] | 19 | Many programming languages capture these groupings through a mechanism called an \newterm{enumeration}.
|
|---|
| [caf2cba] | 20 | \begin{quote}
|
|---|
| 21 | enumerate (verb, transitive).
|
|---|
| 22 | To count, ascertain the number of;
|
|---|
| [4da9142] | 23 | more usually, to mention (a number of things or persons) separately, as if for the purpose of counting;
|
|---|
| 24 | to specify as in a list or catalogue.~\cite{OEDenumerate}
|
|---|
| [caf2cba] | 25 | \end{quote}
|
|---|
| [4da9142] | 26 | Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are \emph{often} restricted to hold only the secondary names.
|
|---|
| [caf2cba] | 27 | It is possible to enumerate among set names without having an ordering among the set elements.
|
|---|
| [48b76d03] | 28 | For example, the week, the weekdays, the weekend, and every second day of the week.
|
|---|
| 29 | \begin{cfa}[morekeywords={in}]
|
|---|
| 30 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
|
|---|
| 31 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$
|
|---|
| [4da9142] | 32 | for ( cursor in Sat, Sun } ... $\C{// weekend}$
|
|---|
| [48b76d03] | 33 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$
|
|---|
| 34 | \end{cfa}
|
|---|
| [4da9142] | 35 | This independence from internal representation allows multiple names to have the same representation (eighth note, quaver), giving synonyms.
|
|---|
| [48b76d03] | 36 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
|
|---|
| [314c9d8] | 37 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg:
|
|---|
| [caf2cba] | 38 | \begin{cfa}
|
|---|
| 39 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
|
|---|
| 40 | \end{cfa}
|
|---|
| [314c9d8] | 41 | Here the internal representation for the secondary names are logically \emph{generated} rather than listing a subset of names.
|
|---|
| [4da9142] | 42 |
|
|---|
| 43 | Hence, the fundamental aspects of an enumeration are:
|
|---|
| 44 | \begin{enumerate}
|
|---|
| 45 | \item
|
|---|
| [314c9d8] | 46 | \begin{sloppypar}
|
|---|
| 47 | It provides a finite set of secondary names, which become its primary constants.
|
|---|
| 48 | This differentiates an enumeration from general types with an infinite set
|
|---|
| 49 | of primary constants.
|
|---|
| 50 | \end{sloppypar}
|
|---|
| [4da9142] | 51 | \item
|
|---|
| [314c9d8] | 52 | The secondary names are constants, which follows transitively from their binding (aliasing) to primary names, which are constants.
|
|---|
| [4da9142] | 53 | \item
|
|---|
| [314c9d8] | 54 | Defines a type for generating instants (variables).
|
|---|
| [4da9142] | 55 | \item
|
|---|
| [314c9d8] | 56 | For safety, an enumeration instance should be restricted to hold only its type's secondary names.
|
|---|
| [4da9142] | 57 | \item
|
|---|
| 58 | There is a mechanism for \emph{enumerating} over the secondary names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
|
|---|
| 59 | \end{enumerate}
|
|---|
| [48b76d03] | 60 |
|
|---|
| [956299b] | 61 |
|
|---|
| [48b76d03] | 62 | \section{Terminology}
|
|---|
| [4da9142] | 63 | \label{s:Terminology}
|
|---|
| [48b76d03] | 64 |
|
|---|
| [caaf424] | 65 | The term \newterm{enumeration} defines a type with a set of secondary names, and the term \newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration} for the name derivation}.
|
|---|
| 66 | As well, an enumerated type can have three fundamental properties, \newterm{label}, \newterm{order}, and \newterm{value}.
|
|---|
| [956299b] | 67 | \begin{cquote}
|
|---|
| [48b76d03] | 68 | \sf\setlength{\tabcolsep}{3pt}
|
|---|
| [caf2cba] | 69 | \begin{tabular}{rcccccccr}
|
|---|
| 70 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
|
|---|
| [314c9d8] | 71 | $\downarrow$\hspace*{15pt} & \multicolumn{8}{c}{$\downarrow$} \\
|
|---|
| 72 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun {\color{red}= 42} & \}; \\
|
|---|
| [48b76d03] | 73 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
|
|---|
| 74 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
|
|---|
| [314c9d8] | 75 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & {\color{red}42} &
|
|---|
| [956299b] | 76 | \end{tabular}
|
|---|
| 77 | \end{cquote}
|
|---|
| [41fb996] | 78 | Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
|
|---|
| [caf2cba] | 79 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
|
|---|
| [48b76d03] | 80 | The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
|
|---|
| 81 |
|
|---|
| [caf2cba] | 82 | Specifying complex ordering is possible:
|
|---|
| 83 | \begin{cfa}
|
|---|
| 84 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
|
|---|
| 85 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
|
|---|
| 86 | \end{cfa}
|
|---|
| 87 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
|
|---|
| 88 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
|
|---|
| 89 | Only flat total-ordering among enumerators is considered in this work.
|
|---|
| [7d9a805b] | 90 |
|
|---|
| 91 |
|
|---|
| [48b76d03] | 92 | \section{Motivation}
|
|---|
| [caf2cba] | 93 |
|
|---|
| [314c9d8] | 94 | Many programming languages provide an enumeration-like mechanism, which may or may not cover the previous five fundamental enumeration aspects.
|
|---|
| 95 | Hence, the term \emph{enumeration} can be confusing and misunderstood.
|
|---|
| 96 | Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which featuring is being used.
|
|---|
| 97 | This section discusses some language features that are sometimes called an enumeration but do not provide all enumeration aspects.
|
|---|
| 98 |
|
|---|
| 99 |
|
|---|
| 100 | \subsection{Aliasing}
|
|---|
| [f6321173] | 101 | \label{s:Aliasing}
|
|---|
| [314c9d8] | 102 |
|
|---|
| 103 | Some languages provide simple secondary aliasing (renaming), \eg:
|
|---|
| [caf2cba] | 104 | \begin{cfa}
|
|---|
| [4da9142] | 105 | const Size = 20, Pi = 3.14159, Name = "Jane";
|
|---|
| [caf2cba] | 106 | \end{cfa}
|
|---|
| [314c9d8] | 107 | The secondary name is logically replaced in the program text by its corresponding primary name.
|
|---|
| 108 | Therefore, it is possible to compare the secondary names, \eg @Size < Pi@, only because the primary constants allow it, whereas \eg @Pi < Name@ might be disallowed depending on the language.
|
|---|
| [caf2cba] | 109 |
|
|---|
| [314c9d8] | 110 | Aliasing is not macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language.
|
|---|
| 111 | With aliasing, each secondary name is part of the language, and hence, participates fully, such as name overloading in the type system.
|
|---|
| 112 | Aliasing is not an immutable variable, \eg:
|
|---|
| 113 | \begin{cfa}
|
|---|
| 114 | extern @const@ int Size = 20;
|
|---|
| 115 | extern void foo( @const@ int @&@ size );
|
|---|
| 116 | foo( Size ); // take the address of (reference) Size
|
|---|
| 117 | \end{cfa}
|
|---|
| [caaf424] | 118 | Taking the address of an immutable variable makes it an \newterm{lvalue}, which implies it has storage.
|
|---|
| [314c9d8] | 119 | With separate compilation, it is necessary to choose one translation unit to perform the initialization.
|
|---|
| 120 | If aliasing does require storage, its address and initialization are opaque (compiler only), similar to \CC rvalue reference @&&@.
|
|---|
| 121 |
|
|---|
| 122 | Aliasing does provide readability and automatic resubstitution.
|
|---|
| 123 | It also provides simple enumeration properties, but with extra effort.
|
|---|
| [caf2cba] | 124 | \begin{cfa}
|
|---|
| 125 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
|
|---|
| 126 | \end{cfa}
|
|---|
| [4da9142] | 127 | Any reordering of the enumerators requires manual renumbering.
|
|---|
| [caf2cba] | 128 | \begin{cfa}
|
|---|
| 129 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
|
|---|
| 130 | \end{cfa}
|
|---|
| [314c9d8] | 131 | For these reasons, aliasing is sometimes called an enumeration.
|
|---|
| 132 | However, there is no type to create a type-checked instance or iterator cursor, so there is no ability for enumerating.
|
|---|
| 133 | Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language.
|
|---|
| 134 |
|
|---|
| 135 |
|
|---|
| 136 | \subsection{Algebraic Data Type}
|
|---|
| [ec20ab9] | 137 | \label{s:AlgebraicDataType}
|
|---|
| [4da9142] | 138 |
|
|---|
| [314c9d8] | 139 | An algebraic data type (ADT)\footnote{ADT is overloaded with abstract data type.} is another language feature often linked with enumeration, where an ADT conjoins an arbitrary type, possibly a \lstinline[language=C++]{class} or @union@, and a named constructor.
|
|---|
| 140 | For example, in Haskell:
|
|---|
| 141 | \begin{haskell}
|
|---|
| 142 | data S = S { i::Int, d::Double } $\C{// structure}$
|
|---|
| 143 | data @Foo@ = A Int | B Double | C S $\C{// ADT, composed of three types}$
|
|---|
| 144 | foo = A 3; $\C{// type Foo is inferred}$
|
|---|
| 145 | bar = B 3.5
|
|---|
| 146 | baz = C S{ i = 7, d = 7.5 }
|
|---|
| 147 | \end{haskell}
|
|---|
| 148 | the ADT has three variants (constructors), @A@, @B@, @C@ with associated types @Int@, @Double@, and @S@.
|
|---|
| 149 | The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@.
|
|---|
| 150 | Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to access the value using dynamic pattern-matching.
|
|---|
| [4da9142] | 151 | \begin{cquote}
|
|---|
| [314c9d8] | 152 | \setlength{\tabcolsep}{15pt}
|
|---|
| 153 | \begin{tabular}{@{}ll@{}}
|
|---|
| 154 | \begin{haskell}
|
|---|
| 155 | prtfoo val = -- function
|
|---|
| 156 | -- pattern match on constructor
|
|---|
| 157 | case val of
|
|---|
| 158 | @A@ a -> print a
|
|---|
| 159 | @B@ b -> print b
|
|---|
| 160 | @C@ (S i d) -> do
|
|---|
| 161 | print i
|
|---|
| 162 | print d
|
|---|
| 163 | \end{haskell}
|
|---|
| [4da9142] | 164 | &
|
|---|
| [314c9d8] | 165 | \begin{haskell}
|
|---|
| 166 | main = do
|
|---|
| 167 | prtfoo foo
|
|---|
| 168 | prtfoo bar
|
|---|
| 169 | prtfoo baz
|
|---|
| 170 | 3
|
|---|
| 171 | 3.5
|
|---|
| 172 | 7
|
|---|
| 173 | 7.5
|
|---|
| 174 | \end{haskell}
|
|---|
| [4da9142] | 175 | \end{tabular}
|
|---|
| 176 | \end{cquote}
|
|---|
| [314c9d8] | 177 | For safety, most languages require all assocaited types to be listed or a default case with no field accesses.
|
|---|
| 178 |
|
|---|
| 179 | A less frequent case is multiple constructors with the same type.
|
|---|
| 180 | \begin{haskell}
|
|---|
| 181 | data Bar = X Int | Y Int | Z Int;
|
|---|
| 182 | foo = X 3;
|
|---|
| 183 | bar = Y 3;
|
|---|
| 184 | baz = Z 5;
|
|---|
| 185 | \end{haskell}
|
|---|
| 186 | Here, the constructor name gives different meaning to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching.
|
|---|
| 187 |
|
|---|
| [caaf424] | 188 | Note, the term \newterm{variant} is often associated with ADTs.
|
|---|
| [314c9d8] | 189 | However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}.
|
|---|
| 190 | In these languages, the variant is often a union using RTTI tags, which cannot be used to simulate an enumeration.
|
|---|
| 191 | Hence, in this work the term variant is not a synonym for ADT.
|
|---|
| 192 |
|
|---|
| 193 | % https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.19.1.0-179c/GHC-Enum.html
|
|---|
| 194 | % https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-Enum.html
|
|---|
| [48b76d03] | 195 |
|
|---|
| [314c9d8] | 196 | The association between ADT and enumeration occurs if all the constructors have a unit (empty) type, \eg @struct unit {}@.
|
|---|
| 197 | Note, the unit type is not the same as \lstinline{void}, \eg:
|
|---|
| [4da9142] | 198 | \begin{cfa}
|
|---|
| [314c9d8] | 199 | void foo( void );
|
|---|
| 200 | struct unit {} u; // empty type
|
|---|
| 201 | unit bar( unit );
|
|---|
| 202 | foo( foo() ); // void argument does not match with void parameter
|
|---|
| 203 | bar( bar( u ) ); // unit argument does match with unit parameter
|
|---|
| [4da9142] | 204 | \end{cfa}
|
|---|
| [caf2cba] | 205 |
|
|---|
| [314c9d8] | 206 | For example, in the Haskell ADT:
|
|---|
| 207 | \begin{haskell}
|
|---|
| 208 | data Week = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving(Enum, Eq, Show)
|
|---|
| 209 | \end{haskell}
|
|---|
| 210 | the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other type, @Eq@ allows equality comparison, and @Show@ is for printing.
|
|---|
| 211 | The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provides enumerating operations @succ@, @pred@, @enumFrom@ @enumFromTo@.
|
|---|
| 212 | \VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating).
|
|---|
| 213 |
|
|---|
| 214 | \begin{figure}
|
|---|
| [4da9142] | 215 | \begin{cquote}
|
|---|
| [314c9d8] | 216 | \setlength{\tabcolsep}{15pt}
|
|---|
| 217 | \begin{tabular}{@{}ll@{}}
|
|---|
| 218 | \begin{haskell}
|
|---|
| 219 | day = Tue
|
|---|
| 220 | main = do
|
|---|
| 221 | if day == Tue then
|
|---|
| 222 | print day
|
|---|
| 223 | else
|
|---|
| 224 | putStr "not Tue"
|
|---|
| 225 | print (enumFrom Mon) -- week
|
|---|
| 226 | print (enumFromTo Mon Fri) -- weekday
|
|---|
| 227 | print (enumFromTo Sat Sun) -- weekend
|
|---|
| 228 | \end{haskell}
|
|---|
| [4da9142] | 229 | &
|
|---|
| [314c9d8] | 230 | \begin{haskell}
|
|---|
| 231 | Tue
|
|---|
| 232 | [Mon,Tue,Wed,Thu,Fri,Sat,Sun]
|
|---|
| 233 | [Mon,Tue,Wed,Thu,Fri]
|
|---|
| 234 | [Sat,Sun]
|
|---|
| [caf2cba] | 235 |
|
|---|
| 236 |
|
|---|
| [4da9142] | 237 |
|
|---|
| [314c9d8] | 238 |
|
|---|
| 239 |
|
|---|
| 240 | \end{haskell}
|
|---|
| 241 | \end{tabular}
|
|---|
| 242 | \end{cquote}
|
|---|
| 243 | \caption{Haskell Enumeration}
|
|---|
| 244 | \label{f:HaskellEnumeration}
|
|---|
| 245 | \end{figure}
|
|---|
| 246 |
|
|---|
| 247 | The key observation is the dichotomy between an ADT and enumeration: the ADT uses the associated type resulting in a union-like data structure, and the enumeration does not use the associated type, and hence, is not a union.
|
|---|
| 248 | While the enumeration is constructed using the ADT mechanism, it is so restricted it is not really an ADT.
|
|---|
| 249 | Furthermore, a general ADT cannot be an enumeration because the constructors generate different values making enumerating meaningless.
|
|---|
| 250 | While functional programming languages regularly repurpose the ADT type into an enumeration type, this process seems contrived and confusing.
|
|---|
| 251 | Hence, there is only a weak equivalence between an enumeration and ADT, justifying a separate enumeration type in a programming language.
|
|---|
| [caf2cba] | 252 |
|
|---|
| [f9da761] | 253 |
|
|---|
| 254 | \section{Contributions}
|
|---|
| [7d9a805b] | 255 |
|
|---|
| [314c9d8] | 256 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C.
|
|---|
| [caf2cba] | 257 | On the surface, enumerations seem like a simple type.
|
|---|
| [41fb996] | 258 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
|
|---|
| [48b76d03] | 259 |
|
|---|
| [314c9d8] | 260 | The contribution of this work are:
|
|---|
| [48b76d03] | 261 | \begin{enumerate}
|
|---|
| 262 | \item
|
|---|
| 263 | overloading
|
|---|
| 264 | \item
|
|---|
| 265 | scoping
|
|---|
| 266 | \item
|
|---|
| 267 | typing
|
|---|
| 268 | \item
|
|---|
| [314c9d8] | 269 | subseting
|
|---|
| [48b76d03] | 270 | \item
|
|---|
| 271 | inheritance
|
|---|
| 272 | \end{enumerate}
|
|---|