[18ebc28] | 1 | \chapter{Introduction} |
---|
| 2 | |
---|
[ccfbfd9] | 3 | All types in a programming language have a set of constants (symbols), and these constants represent values, \eg integer types have constants @-1@, @17@, @0xff@ representing whole numbers, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@ representing real numbers, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}} representing (human readable) text, \etc. |
---|
| 4 | Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integer and floating-point types. |
---|
| 5 | (In \CFA, the constants @0@ and @1@ can be overloaded for any type.) |
---|
| 6 | A constant's symbolic name is dictated by language syntax related to types. |
---|
| 7 | In general, the representation of a constant's value is \newterm{opaque}, so the internal representation can be chosen arbitrarily. |
---|
| 8 | In theory, there are an infinite set of constant names per type representing an infinite set of values. |
---|
| 9 | |
---|
| 10 | It is common in mathematics, engineering and computer science to alias new constants to existing constants so they have the same value, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), K(k), M, G, T for powers of 2\footnote{Overloaded with SI powers of 10.} often prefixing bits (b) or bytes (B), \eg Gb, MB, and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc. |
---|
| 11 | An alias can bind to another alias, which transitively binds it to the specified constant. |
---|
| 12 | Multiple aliases can represent the same value, \eg eighth note and quaver, giving synonyms. |
---|
| 13 | |
---|
| 14 | Many programming languages capture this important software-engineering capability through a mechanism called \newterm{constant} or \newterm{literal} naming, where a new constant is aliased to an existing constant. |
---|
| 15 | Its purpose is for readability, replacing a constant name that directly represents a value with a name that is more symbolic and meaningful in the context of the program. |
---|
| 16 | Thereafter, changing the aliasing of the new constant to another constant automatically distributes the rebinding, preventing errors. |
---|
| 17 | % and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@. |
---|
| 18 | Because an aliased name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \newterm{rvalue}\footnote{ |
---|
[caf2cba] | 19 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. |
---|
[ccfbfd9] | 20 | In theory, there are an infinite set of possible aliasing, in practice, the number of aliasing per program is finite and small. |
---|
[caf2cba] | 21 | |
---|
[ccfbfd9] | 22 | Aliased constants can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc. |
---|
| 23 | In this case, the binding between a constant name and value can be implicit, where the values are chosen to support any set operations. |
---|
| 24 | Many programming languages capture the aliasing and ordering through a mechanism called an \newterm{enumeration}. |
---|
[caf2cba] | 25 | \begin{quote} |
---|
| 26 | enumerate (verb, transitive). |
---|
| 27 | To count, ascertain the number of; |
---|
[4da9142] | 28 | more usually, to mention (a number of things or persons) separately, as if for the purpose of counting; |
---|
| 29 | to specify as in a list or catalogue.~\cite{OEDenumerate} |
---|
[caf2cba] | 30 | \end{quote} |
---|
[ccfbfd9] | 31 | Within an enumeration set, the enumeration names (aliases) must be unique, and instances of an enumerated type are \emph{often} restricted to hold only these names. |
---|
| 32 | |
---|
| 33 | It is possible to enumerate among set names without having an ordering among the set values. |
---|
[48b76d03] | 34 | For example, the week, the weekdays, the weekend, and every second day of the week. |
---|
| 35 | \begin{cfa}[morekeywords={in}] |
---|
| 36 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$ |
---|
| 37 | for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$ |
---|
[4da9142] | 38 | for ( cursor in Sat, Sun } ... $\C{// weekend}$ |
---|
[48b76d03] | 39 | for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$ |
---|
| 40 | \end{cfa} |
---|
| 41 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after. |
---|
[314c9d8] | 42 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg: |
---|
[caf2cba] | 43 | \begin{cfa} |
---|
| 44 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ... |
---|
| 45 | \end{cfa} |
---|
[ccfbfd9] | 46 | Here the values for the set names are logically \emph{generated} rather than listing a subset of names. |
---|
[4da9142] | 47 | |
---|
| 48 | Hence, the fundamental aspects of an enumeration are: |
---|
| 49 | \begin{enumerate} |
---|
| 50 | \item |
---|
[314c9d8] | 51 | \begin{sloppypar} |
---|
[ccfbfd9] | 52 | It provides a finite set of new constants, which are implicitly or explicitly assigned values that must be appropriate for any set operations. |
---|
| 53 | This aspect differentiates an enumeration from general types with an infinite set of constants. |
---|
[314c9d8] | 54 | \end{sloppypar} |
---|
[4da9142] | 55 | \item |
---|
[ccfbfd9] | 56 | The alias names are constants, which follows transitively from their binding to other constants. |
---|
[4da9142] | 57 | \item |
---|
[314c9d8] | 58 | Defines a type for generating instants (variables). |
---|
[4da9142] | 59 | \item |
---|
[ccfbfd9] | 60 | For safety, an enumeration instance should be restricted to hold only its constant names. |
---|
[4da9142] | 61 | \item |
---|
[ccfbfd9] | 62 | There is a mechanism for \emph{enumerating} over the enumeration names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically. |
---|
[4da9142] | 63 | \end{enumerate} |
---|
[48b76d03] | 64 | |
---|
[956299b] | 65 | |
---|
[48b76d03] | 66 | \section{Terminology} |
---|
[4da9142] | 67 | \label{s:Terminology} |
---|
[48b76d03] | 68 | |
---|
[ccfbfd9] | 69 | The term \newterm{enumeration} defines a type with a set of new constants, and the term \newterm{enumerator} represents an arbitrary alias name \see{\VRef{s:CEnumeration} for the name derivation}. |
---|
[caaf424] | 70 | As well, an enumerated type can have three fundamental properties, \newterm{label}, \newterm{order}, and \newterm{value}. |
---|
[956299b] | 71 | \begin{cquote} |
---|
[48b76d03] | 72 | \sf\setlength{\tabcolsep}{3pt} |
---|
[caf2cba] | 73 | \begin{tabular}{rcccccccr} |
---|
| 74 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\ |
---|
[314c9d8] | 75 | $\downarrow$\hspace*{15pt} & \multicolumn{8}{c}{$\downarrow$} \\ |
---|
| 76 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun {\color{red}= 42} & \}; \\ |
---|
[48b76d03] | 77 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\ |
---|
| 78 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\ |
---|
[314c9d8] | 79 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & {\color{red}42} & |
---|
[956299b] | 80 | \end{tabular} |
---|
| 81 | \end{cquote} |
---|
[ccfbfd9] | 82 | Here, the enumeration @Week@ defines the enumerator constant @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@. |
---|
[caf2cba] | 83 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values. |
---|
[ccfbfd9] | 84 | The value is the implicitly/explicitly assigned constant to support any enumeration operations; |
---|
| 85 | the value may be hidden (opaque) or visible. |
---|
[48b76d03] | 86 | |
---|
[caf2cba] | 87 | Specifying complex ordering is possible: |
---|
| 88 | \begin{cfa} |
---|
| 89 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$ |
---|
| 90 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$ |
---|
| 91 | \end{cfa} |
---|
| 92 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@. |
---|
| 93 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@. |
---|
| 94 | Only flat total-ordering among enumerators is considered in this work. |
---|
[7d9a805b] | 95 | |
---|
| 96 | |
---|
[48b76d03] | 97 | \section{Motivation} |
---|
[caf2cba] | 98 | |
---|
[314c9d8] | 99 | Many programming languages provide an enumeration-like mechanism, which may or may not cover the previous five fundamental enumeration aspects. |
---|
| 100 | Hence, the term \emph{enumeration} can be confusing and misunderstood. |
---|
[ccfbfd9] | 101 | Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which feature is being used. |
---|
[314c9d8] | 102 | This section discusses some language features that are sometimes called an enumeration but do not provide all enumeration aspects. |
---|
| 103 | |
---|
| 104 | |
---|
| 105 | \subsection{Aliasing} |
---|
[f632117] | 106 | \label{s:Aliasing} |
---|
[314c9d8] | 107 | |
---|
[ccfbfd9] | 108 | Some languages provide simple aliasing (renaming), \eg: |
---|
[caf2cba] | 109 | \begin{cfa} |
---|
[4da9142] | 110 | const Size = 20, Pi = 3.14159, Name = "Jane"; |
---|
[caf2cba] | 111 | \end{cfa} |
---|
[ccfbfd9] | 112 | The alias name is logically replaced in the program text by its matching constant. |
---|
| 113 | It is possible to compare aliases, if the constants allow it, \eg @Size < Pi@; |
---|
| 114 | whereas \eg @Pi < Name@ might be disallowed depending on the language. |
---|
[caf2cba] | 115 | |
---|
[314c9d8] | 116 | Aliasing is not macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language. |
---|
[ccfbfd9] | 117 | With aliasing, each new name is part of the language, and hence, participates fully, such as name overloading in the type system. |
---|
[314c9d8] | 118 | Aliasing is not an immutable variable, \eg: |
---|
| 119 | \begin{cfa} |
---|
| 120 | extern @const@ int Size = 20; |
---|
| 121 | extern void foo( @const@ int @&@ size ); |
---|
| 122 | foo( Size ); // take the address of (reference) Size |
---|
| 123 | \end{cfa} |
---|
[caaf424] | 124 | Taking the address of an immutable variable makes it an \newterm{lvalue}, which implies it has storage. |
---|
[314c9d8] | 125 | With separate compilation, it is necessary to choose one translation unit to perform the initialization. |
---|
| 126 | If aliasing does require storage, its address and initialization are opaque (compiler only), similar to \CC rvalue reference @&&@. |
---|
| 127 | |
---|
| 128 | Aliasing does provide readability and automatic resubstitution. |
---|
[ccfbfd9] | 129 | It also provides simple enumeration properties, but with effort. |
---|
[caf2cba] | 130 | \begin{cfa} |
---|
| 131 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7; |
---|
| 132 | \end{cfa} |
---|
[4da9142] | 133 | Any reordering of the enumerators requires manual renumbering. |
---|
[caf2cba] | 134 | \begin{cfa} |
---|
| 135 | const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7; |
---|
| 136 | \end{cfa} |
---|
[314c9d8] | 137 | For these reasons, aliasing is sometimes called an enumeration. |
---|
| 138 | However, there is no type to create a type-checked instance or iterator cursor, so there is no ability for enumerating. |
---|
| 139 | Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language. |
---|
| 140 | |
---|
| 141 | |
---|
| 142 | \subsection{Algebraic Data Type} |
---|
[ec20ab9] | 143 | \label{s:AlgebraicDataType} |
---|
[4da9142] | 144 | |
---|
[314c9d8] | 145 | An algebraic data type (ADT)\footnote{ADT is overloaded with abstract data type.} is another language feature often linked with enumeration, where an ADT conjoins an arbitrary type, possibly a \lstinline[language=C++]{class} or @union@, and a named constructor. |
---|
| 146 | For example, in Haskell: |
---|
| 147 | \begin{haskell} |
---|
| 148 | data S = S { i::Int, d::Double } $\C{// structure}$ |
---|
| 149 | data @Foo@ = A Int | B Double | C S $\C{// ADT, composed of three types}$ |
---|
| 150 | foo = A 3; $\C{// type Foo is inferred}$ |
---|
| 151 | bar = B 3.5 |
---|
| 152 | baz = C S{ i = 7, d = 7.5 } |
---|
| 153 | \end{haskell} |
---|
| 154 | the ADT has three variants (constructors), @A@, @B@, @C@ with associated types @Int@, @Double@, and @S@. |
---|
| 155 | The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@. |
---|
[ccfbfd9] | 156 | Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to intialize and access the value using dynamic pattern-matching. |
---|
[4da9142] | 157 | \begin{cquote} |
---|
[314c9d8] | 158 | \setlength{\tabcolsep}{15pt} |
---|
| 159 | \begin{tabular}{@{}ll@{}} |
---|
| 160 | \begin{haskell} |
---|
| 161 | prtfoo val = -- function |
---|
| 162 | -- pattern match on constructor |
---|
| 163 | case val of |
---|
| 164 | @A@ a -> print a |
---|
| 165 | @B@ b -> print b |
---|
| 166 | @C@ (S i d) -> do |
---|
| 167 | print i |
---|
| 168 | print d |
---|
| 169 | \end{haskell} |
---|
[4da9142] | 170 | & |
---|
[314c9d8] | 171 | \begin{haskell} |
---|
| 172 | main = do |
---|
| 173 | prtfoo foo |
---|
| 174 | prtfoo bar |
---|
| 175 | prtfoo baz |
---|
| 176 | 3 |
---|
| 177 | 3.5 |
---|
| 178 | 7 |
---|
| 179 | 7.5 |
---|
| 180 | \end{haskell} |
---|
[4da9142] | 181 | \end{tabular} |
---|
| 182 | \end{cquote} |
---|
[ccfbfd9] | 183 | For safety, most languages require all associated types to be listed or a default case with no field accesses. |
---|
[314c9d8] | 184 | |
---|
| 185 | A less frequent case is multiple constructors with the same type. |
---|
| 186 | \begin{haskell} |
---|
| 187 | data Bar = X Int | Y Int | Z Int; |
---|
| 188 | foo = X 3; |
---|
| 189 | bar = Y 3; |
---|
| 190 | baz = Z 5; |
---|
| 191 | \end{haskell} |
---|
| 192 | Here, the constructor name gives different meaning to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching. |
---|
| 193 | |
---|
[caaf424] | 194 | Note, the term \newterm{variant} is often associated with ADTs. |
---|
[314c9d8] | 195 | However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}. |
---|
[ccfbfd9] | 196 | In these languages, the variant is often a union using RTTI tags for discrimination, which cannot be used to simulate an enumeration. |
---|
[314c9d8] | 197 | Hence, in this work the term variant is not a synonym for ADT. |
---|
| 198 | |
---|
| 199 | % https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.19.1.0-179c/GHC-Enum.html |
---|
| 200 | % https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-Enum.html |
---|
[48b76d03] | 201 | |
---|
[314c9d8] | 202 | The association between ADT and enumeration occurs if all the constructors have a unit (empty) type, \eg @struct unit {}@. |
---|
| 203 | Note, the unit type is not the same as \lstinline{void}, \eg: |
---|
[4da9142] | 204 | \begin{cfa} |
---|
[314c9d8] | 205 | void foo( void ); |
---|
| 206 | struct unit {} u; // empty type |
---|
| 207 | unit bar( unit ); |
---|
| 208 | foo( foo() ); // void argument does not match with void parameter |
---|
| 209 | bar( bar( u ) ); // unit argument does match with unit parameter |
---|
[4da9142] | 210 | \end{cfa} |
---|
[caf2cba] | 211 | |
---|
[314c9d8] | 212 | For example, in the Haskell ADT: |
---|
| 213 | \begin{haskell} |
---|
| 214 | data Week = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving(Enum, Eq, Show) |
---|
| 215 | \end{haskell} |
---|
[ccfbfd9] | 216 | the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other associated types, @Eq@ allows equality comparison, and @Show@ is for printing. |
---|
[314c9d8] | 217 | The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provides enumerating operations @succ@, @pred@, @enumFrom@ @enumFromTo@. |
---|
| 218 | \VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating). |
---|
| 219 | |
---|
| 220 | \begin{figure} |
---|
[4da9142] | 221 | \begin{cquote} |
---|
[314c9d8] | 222 | \setlength{\tabcolsep}{15pt} |
---|
| 223 | \begin{tabular}{@{}ll@{}} |
---|
| 224 | \begin{haskell} |
---|
| 225 | day = Tue |
---|
| 226 | main = do |
---|
| 227 | if day == Tue then |
---|
| 228 | print day |
---|
| 229 | else |
---|
| 230 | putStr "not Tue" |
---|
| 231 | print (enumFrom Mon) -- week |
---|
| 232 | print (enumFromTo Mon Fri) -- weekday |
---|
| 233 | print (enumFromTo Sat Sun) -- weekend |
---|
| 234 | \end{haskell} |
---|
[4da9142] | 235 | & |
---|
[314c9d8] | 236 | \begin{haskell} |
---|
| 237 | Tue |
---|
| 238 | [Mon,Tue,Wed,Thu,Fri,Sat,Sun] |
---|
| 239 | [Mon,Tue,Wed,Thu,Fri] |
---|
| 240 | [Sat,Sun] |
---|
[caf2cba] | 241 | |
---|
| 242 | |
---|
[4da9142] | 243 | |
---|
[314c9d8] | 244 | |
---|
| 245 | |
---|
| 246 | \end{haskell} |
---|
| 247 | \end{tabular} |
---|
| 248 | \end{cquote} |
---|
| 249 | \caption{Haskell Enumeration} |
---|
| 250 | \label{f:HaskellEnumeration} |
---|
| 251 | \end{figure} |
---|
| 252 | |
---|
| 253 | The key observation is the dichotomy between an ADT and enumeration: the ADT uses the associated type resulting in a union-like data structure, and the enumeration does not use the associated type, and hence, is not a union. |
---|
[ccfbfd9] | 254 | While the enumeration is constructed using the ADT mechanism, it is so restricted it is not an ADT. |
---|
[314c9d8] | 255 | Furthermore, a general ADT cannot be an enumeration because the constructors generate different values making enumerating meaningless. |
---|
| 256 | While functional programming languages regularly repurpose the ADT type into an enumeration type, this process seems contrived and confusing. |
---|
| 257 | Hence, there is only a weak equivalence between an enumeration and ADT, justifying a separate enumeration type in a programming language. |
---|
[caf2cba] | 258 | |
---|
[f9da761] | 259 | |
---|
| 260 | \section{Contributions} |
---|
[7d9a805b] | 261 | |
---|
[314c9d8] | 262 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C. |
---|
[caf2cba] | 263 | On the surface, enumerations seem like a simple type. |
---|
[41fb996] | 264 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation. |
---|
[48b76d03] | 265 | |
---|
[314c9d8] | 266 | The contribution of this work are: |
---|
[48b76d03] | 267 | \begin{enumerate} |
---|
| 268 | \item |
---|
| 269 | overloading |
---|
| 270 | \item |
---|
| 271 | scoping |
---|
| 272 | \item |
---|
| 273 | typing |
---|
| 274 | \item |
---|
[314c9d8] | 275 | subseting |
---|
[48b76d03] | 276 | \item |
---|
| 277 | inheritance |
---|
| 278 | \end{enumerate} |
---|