[18ebc28] | 1 | \chapter{Introduction}
|
---|
| 2 |
|
---|
[ab11ab1] | 3 | All basic types in a programming language have a set of constants (symbols), and these constants represent computable values, \eg integer types have constants @-1@, @17@, @0xff@ representing whole numbers, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@ representing real numbers, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}} representing (human readable) text, \etc.
|
---|
[ccfbfd9] | 4 | Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integer and floating-point types.
|
---|
| 5 | (In \CFA, the constants @0@ and @1@ can be overloaded for any type.)
|
---|
[433e2c3] | 6 | Higher-level types compose constants from the basic constants.
|
---|
| 7 | \begin{cfa}
|
---|
| 8 | struct S { int i, j, k; } s;
|
---|
| 9 | s = (S){ 1, 2, 3 }; $\C[2in]{// structure constant}$
|
---|
| 10 | int x[5] = { 1, 2, 3, 4, 5 }; $\C{// array constant}\CRT$
|
---|
| 11 | \end{cfa}
|
---|
[46651fb] | 12 | A constant's symbolic name is dictated by language syntax related to types, \eg @5.@ (double), @5.0f@ (float), @5l@ (long double).
|
---|
[433e2c3] | 13 | In general, the representation of a constant's value is \newterm{opaque}, so the internal representation can be chosen arbitrarily, \eg two's complement, IEEE floating-point.
|
---|
[ab11ab1] | 14 | In theory, there is an infinite set of constant names per type representing an infinite set of values.
|
---|
[ccfbfd9] | 15 |
|
---|
[4c8f29ff] | 16 | It is common in mathematics, engineering, and computer science to alias new constants to existing constants so they have the same value, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), K(k), M, G, T for powers of 2\footnote{Overloaded with SI powers of 10.} often prefixing bits (b) or bytes (B), \eg Gb, MB, and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
|
---|
[ccfbfd9] | 17 | An alias can bind to another alias, which transitively binds it to the specified constant.
|
---|
| 18 | Multiple aliases can represent the same value, \eg eighth note and quaver, giving synonyms.
|
---|
| 19 |
|
---|
| 20 | Many programming languages capture this important software-engineering capability through a mechanism called \newterm{constant} or \newterm{literal} naming, where a new constant is aliased to an existing constant.
|
---|
[433e2c3] | 21 | Its purpose is for readability: replacing constant values in a program with symbolic names that are more meaningful to programmers in the context of the application.
|
---|
| 22 | Thereafter, associating a name to a different value automatically distributes this rebinding, preventing errors.
|
---|
[ccfbfd9] | 23 | Because an aliased name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \newterm{rvalue}\footnote{
|
---|
[caf2cba] | 24 | The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
|
---|
[ab11ab1] | 25 | In theory, there is an infinite set of possible aliasing; in practice, the number of aliasing per program is finite and small.
|
---|
[caf2cba] | 26 |
|
---|
[ccfbfd9] | 27 | Aliased constants can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
|
---|
| 28 | In this case, the binding between a constant name and value can be implicit, where the values are chosen to support any set operations.
|
---|
| 29 | Many programming languages capture the aliasing and ordering through a mechanism called an \newterm{enumeration}.
|
---|
[caf2cba] | 30 | \begin{quote}
|
---|
| 31 | enumerate (verb, transitive).
|
---|
| 32 | To count, ascertain the number of;
|
---|
[4da9142] | 33 | more usually, to mention (a number of things or persons) separately, as if for the purpose of counting;
|
---|
| 34 | to specify as in a list or catalogue.~\cite{OEDenumerate}
|
---|
[caf2cba] | 35 | \end{quote}
|
---|
[ccfbfd9] | 36 | Within an enumeration set, the enumeration names (aliases) must be unique, and instances of an enumerated type are \emph{often} restricted to hold only these names.
|
---|
| 37 |
|
---|
| 38 | It is possible to enumerate among set names without having an ordering among the set values.
|
---|
[48b76d03] | 39 | For example, the week, the weekdays, the weekend, and every second day of the week.
|
---|
| 40 | \begin{cfa}[morekeywords={in}]
|
---|
[dcfcf368] | 41 | for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun ) ... $\C[3.75in]{// week}$
|
---|
| 42 | for ( cursor in Mon, Tue, Wed, Thu, Fri ) ... $\C{// weekday}$
|
---|
| 43 | for ( cursor in Sat, Sun ) ... $\C{// weekend}$
|
---|
| 44 | for ( cursor in Mon, Wed, Fri, Sun ) ... $\C{// every second day of week}\CRT$
|
---|
[48b76d03] | 45 | \end{cfa}
|
---|
[433e2c3] | 46 | A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Tuesday and Tuesday is after.
|
---|
[314c9d8] | 47 | Ordering allows iterating among the enumeration set using relational operators and advancement, \eg:
|
---|
[caf2cba] | 48 | \begin{cfa}
|
---|
[433e2c3] | 49 | for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ... // weekdays
|
---|
[caf2cba] | 50 | \end{cfa}
|
---|
[ccfbfd9] | 51 | Here the values for the set names are logically \emph{generated} rather than listing a subset of names.
|
---|
[4da9142] | 52 |
|
---|
| 53 | Hence, the fundamental aspects of an enumeration are:
|
---|
| 54 | \begin{enumerate}
|
---|
| 55 | \item
|
---|
[314c9d8] | 56 | \begin{sloppypar}
|
---|
[433e2c3] | 57 | It provides a finite set of new constants, which are implicitly or explicitly assigned values that must be appropriate for any set operations, \eg increasing order.
|
---|
[ccfbfd9] | 58 | This aspect differentiates an enumeration from general types with an infinite set of constants.
|
---|
[314c9d8] | 59 | \end{sloppypar}
|
---|
[4da9142] | 60 | \item
|
---|
[ab11ab1] | 61 | The alias names are constants, which follow transitively from their binding to other constants.
|
---|
[4da9142] | 62 | \item
|
---|
[29c8675] | 63 | Defines a type for generating instances (variables).
|
---|
[4da9142] | 64 | \item
|
---|
[ccfbfd9] | 65 | For safety, an enumeration instance should be restricted to hold only its constant names.
|
---|
[4da9142] | 66 | \item
|
---|
[ccfbfd9] | 67 | There is a mechanism for \emph{enumerating} over the enumeration names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
|
---|
[4da9142] | 68 | \end{enumerate}
|
---|
[48b76d03] | 69 |
|
---|
[956299b] | 70 |
|
---|
[48b76d03] | 71 | \section{Terminology}
|
---|
[4da9142] | 72 | \label{s:Terminology}
|
---|
[48b76d03] | 73 |
|
---|
[46651fb] | 74 | The term \newterm{enumeration} defines a type with a set of new constants, and the term \newterm{enumerator} represents an arbitrary alias name \see{\VRef{s:CEnumeration} for the name derivations}.
|
---|
[dcfcf368] | 75 | An enumerated type can have the following properties: \newterm{label} (name), \newterm{order} (position), and \newterm{value} (payload).
|
---|
[956299b] | 76 | \begin{cquote}
|
---|
[48b76d03] | 77 | \sf\setlength{\tabcolsep}{3pt}
|
---|
[caf2cba] | 78 | \begin{tabular}{rcccccccr}
|
---|
| 79 | \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
|
---|
[314c9d8] | 80 | $\downarrow$\hspace*{15pt} & \multicolumn{8}{c}{$\downarrow$} \\
|
---|
| 81 | @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun {\color{red}= 42} & \}; \\
|
---|
[48b76d03] | 82 | \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
|
---|
| 83 | \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
|
---|
[314c9d8] | 84 | \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & {\color{red}42} &
|
---|
[956299b] | 85 | \end{tabular}
|
---|
| 86 | \end{cquote}
|
---|
[4c8f29ff] | 87 | Here, the enumeration @Week@ defines the enumerator constants @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@, and @Sun@.
|
---|
[caf2cba] | 88 | The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
|
---|
[ccfbfd9] | 89 | The value is the implicitly/explicitly assigned constant to support any enumeration operations;
|
---|
| 90 | the value may be hidden (opaque) or visible.
|
---|
[48b76d03] | 91 |
|
---|
[caf2cba] | 92 | Specifying complex ordering is possible:
|
---|
| 93 | \begin{cfa}
|
---|
| 94 | enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
|
---|
| 95 | enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
|
---|
| 96 | \end{cfa}
|
---|
| 97 | For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
|
---|
| 98 | For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
|
---|
| 99 | Only flat total-ordering among enumerators is considered in this work.
|
---|
[7d9a805b] | 100 |
|
---|
| 101 |
|
---|
[48b76d03] | 102 | \section{Motivation}
|
---|
[caf2cba] | 103 |
|
---|
[314c9d8] | 104 | Many programming languages provide an enumeration-like mechanism, which may or may not cover the previous five fundamental enumeration aspects.
|
---|
| 105 | Hence, the term \emph{enumeration} can be confusing and misunderstood.
|
---|
[ccfbfd9] | 106 | Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which feature is being used.
|
---|
[c1c0efdb] | 107 | This section discusses some language features that are sometimes called an enumeration but do not provide all enumeration aspects.
|
---|
[314c9d8] | 108 |
|
---|
| 109 |
|
---|
| 110 | \subsection{Aliasing}
|
---|
[f6321173] | 111 | \label{s:Aliasing}
|
---|
[314c9d8] | 112 |
|
---|
[433e2c3] | 113 | Some languages provide simple aliasing (renaming).
|
---|
[caf2cba] | 114 | \begin{cfa}
|
---|
[4da9142] | 115 | const Size = 20, Pi = 3.14159, Name = "Jane";
|
---|
[caf2cba] | 116 | \end{cfa}
|
---|
[ccfbfd9] | 117 | The alias name is logically replaced in the program text by its matching constant.
|
---|
[46651fb] | 118 | It is possible to compare aliases, if the constants allow it, \eg @Size < Pi@, whereas @Pi < Name@ might be disallowed depending on the language.
|
---|
[caf2cba] | 119 |
|
---|
[433e2c3] | 120 | Aliasing is \emph{not} macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language.
|
---|
[ccfbfd9] | 121 | With aliasing, each new name is part of the language, and hence, participates fully, such as name overloading in the type system.
|
---|
[433e2c3] | 122 | Aliasing is not an immutable variable.
|
---|
[314c9d8] | 123 | \begin{cfa}
|
---|
| 124 | extern @const@ int Size = 20;
|
---|
| 125 | extern void foo( @const@ int @&@ size );
|
---|
| 126 | foo( Size ); // take the address of (reference) Size
|
---|
| 127 | \end{cfa}
|
---|
[caaf424] | 128 | Taking the address of an immutable variable makes it an \newterm{lvalue}, which implies it has storage.
|
---|
[314c9d8] | 129 | With separate compilation, it is necessary to choose one translation unit to perform the initialization.
|
---|
[433e2c3] | 130 | If aliasing requires storage, its address and initialization are opaque (compiler only), similar to \CC rvalue reference @&&@.
|
---|
[314c9d8] | 131 |
|
---|
| 132 | Aliasing does provide readability and automatic resubstitution.
|
---|
[ccfbfd9] | 133 | It also provides simple enumeration properties, but with effort.
|
---|
[caf2cba] | 134 | \begin{cfa}
|
---|
| 135 | const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
|
---|
| 136 | \end{cfa}
|
---|
[4da9142] | 137 | Any reordering of the enumerators requires manual renumbering.
|
---|
[caf2cba] | 138 | \begin{cfa}
|
---|
[46651fb] | 139 | const @Sun = 1@, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
|
---|
[caf2cba] | 140 | \end{cfa}
|
---|
[314c9d8] | 141 | For these reasons, aliasing is sometimes called an enumeration.
|
---|
[ab11ab1] | 142 | However, there is no type to create a type-checked instance or iterator cursor, so there is no ability to enumerate.
|
---|
[314c9d8] | 143 | Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language.
|
---|
| 144 |
|
---|
| 145 |
|
---|
| 146 | \subsection{Algebraic Data Type}
|
---|
[ec20ab9] | 147 | \label{s:AlgebraicDataType}
|
---|
[4da9142] | 148 |
|
---|
[314c9d8] | 149 | An algebraic data type (ADT)\footnote{ADT is overloaded with abstract data type.} is another language feature often linked with enumeration, where an ADT conjoins an arbitrary type, possibly a \lstinline[language=C++]{class} or @union@, and a named constructor.
|
---|
| 150 | For example, in Haskell:
|
---|
| 151 | \begin{haskell}
|
---|
| 152 | data S = S { i::Int, d::Double } $\C{// structure}$
|
---|
| 153 | data @Foo@ = A Int | B Double | C S $\C{// ADT, composed of three types}$
|
---|
| 154 | foo = A 3; $\C{// type Foo is inferred}$
|
---|
| 155 | bar = B 3.5
|
---|
| 156 | baz = C S{ i = 7, d = 7.5 }
|
---|
| 157 | \end{haskell}
|
---|
[433e2c3] | 158 | the ADT has three variants (constructors), @A@, @B@, @C@, with associated types @Int@, @Double@, and @S@.
|
---|
[314c9d8] | 159 | The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@.
|
---|
[ab11ab1] | 160 | Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to initialize and access the value using dynamic pattern-matching.
|
---|
[4da9142] | 161 | \begin{cquote}
|
---|
[433e2c3] | 162 | \setlength{\tabcolsep}{20pt}
|
---|
[314c9d8] | 163 | \begin{tabular}{@{}ll@{}}
|
---|
| 164 | \begin{haskell}
|
---|
| 165 | prtfoo val = -- function
|
---|
[433e2c3] | 166 | -- pattern match on constructor
|
---|
| 167 | case val of
|
---|
| 168 | @A@ a -> print a
|
---|
| 169 | @B@ b -> print b
|
---|
| 170 | @C@ (S i d) -> do
|
---|
| 171 | print i
|
---|
| 172 | print d
|
---|
[314c9d8] | 173 | \end{haskell}
|
---|
[4da9142] | 174 | &
|
---|
[314c9d8] | 175 | \begin{haskell}
|
---|
| 176 | main = do
|
---|
[433e2c3] | 177 | prtfoo foo
|
---|
| 178 | prtfoo bar
|
---|
| 179 | prtfoo baz
|
---|
[314c9d8] | 180 | 3
|
---|
| 181 | 3.5
|
---|
| 182 | 7
|
---|
| 183 | 7.5
|
---|
| 184 | \end{haskell}
|
---|
[4da9142] | 185 | \end{tabular}
|
---|
| 186 | \end{cquote}
|
---|
[ccfbfd9] | 187 | For safety, most languages require all associated types to be listed or a default case with no field accesses.
|
---|
[314c9d8] | 188 |
|
---|
| 189 | A less frequent case is multiple constructors with the same type.
|
---|
| 190 | \begin{haskell}
|
---|
| 191 | data Bar = X Int | Y Int | Z Int;
|
---|
| 192 | foo = X 3;
|
---|
| 193 | bar = Y 3;
|
---|
| 194 | baz = Z 5;
|
---|
| 195 | \end{haskell}
|
---|
[ab11ab1] | 196 | Here, the constructor name gives different meanings to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching.
|
---|
[314c9d8] | 197 |
|
---|
[caaf424] | 198 | Note, the term \newterm{variant} is often associated with ADTs.
|
---|
[314c9d8] | 199 | However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}.
|
---|
[ab11ab1] | 200 | Here, the type (and possibly the position for equivalent types) is used to discriminate the specific \emph{variant} within the variant instance.
|
---|
[433e2c3] | 201 | For example, \VRef[Figure]{f:C++variant} shows the \CC equivalent of the two Haskell ADT types using variant types.
|
---|
| 202 | In these languages, the variant cannot be used to simulate an enumeration.
|
---|
[314c9d8] | 203 | Hence, in this work the term variant is not a synonym for ADT.
|
---|
| 204 |
|
---|
[433e2c3] | 205 | \begin{figure}
|
---|
| 206 | \begin{c++}
|
---|
| 207 | struct S { char s[32]; };
|
---|
| 208 | variant< int, double, S > vd;
|
---|
| 209 | variant< int, int, int > vs;
|
---|
| 210 |
|
---|
| 211 | // discrimination based on type
|
---|
| 212 | vd = 3;
|
---|
| 213 | if ( holds_alternative<int>(vd) ) cout << "int " << get<int>(vd ) << endl;
|
---|
| 214 | vd = 3.5;
|
---|
| 215 | if ( holds_alternative<double>(vd) ) cout << "double " << get<double>(vd) << endl;
|
---|
| 216 | vd = (S){ "abc" };
|
---|
| 217 | if ( holds_alternative<S>(vd) ) cout << "S.s " << get<S>(vd).s << endl;
|
---|
| 218 |
|
---|
| 219 | // discrimination based on type and position within type
|
---|
| 220 | vs = (variant<int,int,int>){ in_place_index<0>, 12 };
|
---|
| 221 | if ( vs.index() == 0 ) cout << "posn 0 " << get<0>(vs) << endl;
|
---|
| 222 | vs = (variant<int,int,int>){ in_place_index<1>, 4 };
|
---|
| 223 | if ( vs.index() == 1 ) cout << "posn 1 " << get<1>(vs) << endl;
|
---|
| 224 | vs = (variant<int,int,int>){ in_place_index<2>, 5 };
|
---|
| 225 | if ( vs.index() == 2 ) cout << "posn 2 " << get<2>(vs) << endl;
|
---|
| 226 | \end{c++}
|
---|
| 227 | \caption{\CC \lstinline[language=C++]{variant} Discrimination Using RTTI/Position}
|
---|
| 228 | \label{f:C++variant}
|
---|
| 229 | \end{figure}
|
---|
| 230 |
|
---|
[314c9d8] | 231 | % https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.19.1.0-179c/GHC-Enum.html
|
---|
| 232 | % https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-Enum.html
|
---|
[48b76d03] | 233 |
|
---|
[29c8675] | 234 | % The association between ADT and enumeration occurs if all the constructors have a unit (empty) type, \eg @struct unit {}@.
|
---|
| 235 | % Note, the unit type is not the same as \lstinline{void}.
|
---|
[dcfcf368] | 236 | In terms of functional programming linguistics, enumerations often refer to a @unit type@ ADT, which is a set with the @nil@ value carrying no information.
|
---|
| 237 | The unit type is different from type @void@ in C, because @void@ has no value, which is an empty set.
|
---|
| 238 | Hence, @void@ is a C annotation that nothing is expected in this place.
|
---|
| 239 | For example, a function that takes a @void@ parameter and returns a @void@ is a function that expects no parameters and returns nothing.
|
---|
| 240 | \begin{cfa}
|
---|
| 241 | void foo( void );
|
---|
| 242 | foo(); $\C{// no arguments and no result}$
|
---|
| 243 | \end{cfa}
|
---|
| 244 | Because of this distinction, it is impossible to have a variable of type @void@, to assign a @void@ value, or have a function taking and returning multiple @void@s.
|
---|
| 245 | \begin{cfa}
|
---|
| 246 | void v; $\C{// disallowed}$
|
---|
| 247 | v = void;
|
---|
| 248 | [ void, void ] bar( void, void );
|
---|
| 249 | \end{cfa}
|
---|
| 250 | Programming languages often use an empty parameter list to imply no value and no return type for empty return.
|
---|
| 251 | \begin{cfa}
|
---|
| 252 | [] bar(); $\C{// \CFA empty/empty prototype}$
|
---|
| 253 | \end{cfa}
|
---|
| 254 | However, C is saddled with an empty parameter list meaning a list of unknown type parameters, \ie @var_arg@, which is changed to @void@ in \CC/\CFA.
|
---|
| 255 | As a result, a function that returns @void@ cannot be used as a parameter of a function that expects no parameter.
|
---|
[a35e342] | 256 | \begin{cfa}
|
---|
| 257 | void foo( void );
|
---|
| 258 | foo( @foo()@ ); $\C{// void argument does not match with void parameter}$
|
---|
| 259 | \end{cfa}
|
---|
| 260 |
|
---|
[dcfcf368] | 261 | This issue arose when simulating an ADT using a \CC @variant@: @void@ cannot be used as an empty variant.
|
---|
| 262 | To solve this problem, \CC introduced @std::monstate@~\cite{C++monstate}, a type that can be instantiated as a value but holds no information.
|
---|
| 263 | A similar approximation in C is to define a @struct@ type with no fields.
|
---|
[a35e342] | 264 | \begin{cfa}
|
---|
[dcfcf368] | 265 | struct Unit {} e; $\C{// empty type}$
|
---|
| 266 | Unit bar( Unit );
|
---|
| 267 | bar( @bar( e )@ );
|
---|
[a35e342] | 268 | \end{cfa}
|
---|
[dcfcf368] | 269 | Because @std::monostate@ and @Unit@ are user-defined types versus part of the type system, they are only an approximation to @unit@ because other @unit@ types can be defined.
|
---|
[caf2cba] | 270 |
|
---|
[a35e342] | 271 | In the Haskell ADT:
|
---|
[314c9d8] | 272 | \begin{haskell}
|
---|
| 273 | data Week = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving(Enum, Eq, Show)
|
---|
| 274 | \end{haskell}
|
---|
[ab11ab1] | 275 | the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other associated types. The @Eq@ allows equality comparison, and @Show@ is for printing.
|
---|
| 276 | The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provide enumerating operations @succ@, @pred@, @enumFrom@, @enumFromTo@.
|
---|
[314c9d8] | 277 | \VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating).
|
---|
| 278 |
|
---|
| 279 | \begin{figure}
|
---|
[4da9142] | 280 | \begin{cquote}
|
---|
[433e2c3] | 281 | \setlength{\tabcolsep}{40pt}
|
---|
[314c9d8] | 282 | \begin{tabular}{@{}ll@{}}
|
---|
| 283 | \begin{haskell}
|
---|
| 284 | day = Tue
|
---|
| 285 | main = do
|
---|
[433e2c3] | 286 | if day == Tue then
|
---|
| 287 | print day
|
---|
| 288 | else
|
---|
| 289 | putStr "not Tue"
|
---|
| 290 | print (enumFrom Mon) $\C[2.25in]{-- week}$
|
---|
| 291 | print (enumFromTo Mon Fri) $\C{-- weekday}$
|
---|
| 292 | print (enumFromTo Sat Sun) $\C{-- weekend}\CRT$
|
---|
[314c9d8] | 293 | \end{haskell}
|
---|
[4da9142] | 294 | &
|
---|
[314c9d8] | 295 | \begin{haskell}
|
---|
| 296 | Tue
|
---|
| 297 | [Mon,Tue,Wed,Thu,Fri,Sat,Sun]
|
---|
| 298 | [Mon,Tue,Wed,Thu,Fri]
|
---|
| 299 | [Sat,Sun]
|
---|
[caf2cba] | 300 |
|
---|
| 301 |
|
---|
[4da9142] | 302 |
|
---|
[314c9d8] | 303 |
|
---|
| 304 |
|
---|
| 305 | \end{haskell}
|
---|
| 306 | \end{tabular}
|
---|
| 307 | \end{cquote}
|
---|
| 308 | \caption{Haskell Enumeration}
|
---|
| 309 | \label{f:HaskellEnumeration}
|
---|
| 310 | \end{figure}
|
---|
| 311 |
|
---|
| 312 | The key observation is the dichotomy between an ADT and enumeration: the ADT uses the associated type resulting in a union-like data structure, and the enumeration does not use the associated type, and hence, is not a union.
|
---|
[433e2c3] | 313 | In contrast, an enumeration may be constructed using the ADT mechanism, but it is so restricted it is not an ADT.
|
---|
[314c9d8] | 314 | Furthermore, a general ADT cannot be an enumeration because the constructors generate different values making enumerating meaningless.
|
---|
| 315 | While functional programming languages regularly repurpose the ADT type into an enumeration type, this process seems contrived and confusing.
|
---|
| 316 | Hence, there is only a weak equivalence between an enumeration and ADT, justifying a separate enumeration type in a programming language.
|
---|
[caf2cba] | 317 |
|
---|
[f9da761] | 318 |
|
---|
| 319 | \section{Contributions}
|
---|
[7d9a805b] | 320 |
|
---|
[314c9d8] | 321 | The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C.
|
---|
[caf2cba] | 322 | On the surface, enumerations seem like a simple type.
|
---|
[41fb996] | 323 | However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
|
---|
[48b76d03] | 324 |
|
---|
[ab11ab1] | 325 | The contributions of this work are:
|
---|
[48b76d03] | 326 | \begin{enumerate}
|
---|
| 327 | \item
|
---|
[0c51c8b4] | 328 | safety: Define a safe enumeration conversion scheme, both for C and \CFA, and replace ad-hoc C idioms with safer software-engineering approaches.
|
---|
[48b76d03] | 329 | \item
|
---|
[0c51c8b4] | 330 | overloading: Provide a pattern to overload functions, literals, and variables for polymorphic enumerations using the \CFA type system.
|
---|
[48b76d03] | 331 | \item
|
---|
[ab11ab1] | 332 | scoping: Add a namespace for enumerations and qualified access into the namespace to deal with the naming problem.
|
---|
[48b76d03] | 333 | \item
|
---|
[0c51c8b4] | 334 | generalization: Support all language types for enumerators with associated values providing enumeration constants for any type.
|
---|
[48b76d03] | 335 | \item
|
---|
[3b10778] | 336 | reuse: Implement subset and containment inheritance for enumerations.
|
---|
[0c51c8b4] | 337 | \item
|
---|
| 338 | control flow: Extend control-flow structures making it safer and easier to enumerate over an enumeration.
|
---|
| 339 | \item
|
---|
| 340 | I/O: Provide input and output of enumerations based on enumerator names.
|
---|
[48b76d03] | 341 | \end{enumerate}
|
---|
[4c8f29ff] | 342 |
|
---|
| 343 |
|
---|
| 344 | \begin{comment}
|
---|
| 345 | Date: Wed, 1 May 2024 13:41:58 -0400
|
---|
| 346 | Subject: Re: Enumeration
|
---|
| 347 | To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
|
---|
| 348 | From: Gregor Richards <gregor.richards@uwaterloo.ca>
|
---|
| 349 |
|
---|
| 350 | I think I have only one comment and one philosophical quibble to make:
|
---|
| 351 |
|
---|
| 352 | Comment: I really can't agree with putting MB in the same category as the
|
---|
| 353 | others. MB is both a quantity and a unit, and the suggestion that MB *is* one
|
---|
| 354 | million evokes the rather disgusting comparison 1MB = 1000km. Unit types are
|
---|
| 355 | not in the scope of this work.
|
---|
| 356 |
|
---|
| 357 | Philosophical quibble: Pi *is* 3.14159...etc. Monday is not 0; associating
|
---|
| 358 | Monday with 0 is just a consequence of the language. The way this is written
|
---|
| 359 | suggests that the intentional part is subordinate to the implementation detail,
|
---|
| 360 | which seems backwards to me. Calling the number "primary" and the name
|
---|
| 361 | "secondary" feels like you're looking out from inside of the compiler, instead
|
---|
| 362 | of looking at the language from the outside. And, calling secondary values
|
---|
| 363 | without visible primary values "opaque"-which yes, I realize is my own term
|
---|
| 364 | ;)-suggests that you insist that the primary value is a part of the design, or
|
---|
| 365 | at least mental model, of the program. Although as a practical matter there is
|
---|
| 366 | some system value associated with the constructor/tag of an ADT, that value is
|
---|
| 367 | not part of the mental model, and so calling it "primary" and calling the name
|
---|
| 368 | "secondary" and "opaque" seems either (a) very odd or (b) very C-biased. Or
|
---|
| 369 | both.
|
---|
| 370 |
|
---|
| 371 | With valediction,
|
---|
| 372 | - Gregor Richards
|
---|
| 373 |
|
---|
| 374 |
|
---|
| 375 | Date: Thu, 30 May 2024 23:15:23 -0400
|
---|
| 376 | Subject: Re: Meaning?
|
---|
| 377 | To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
|
---|
| 378 | CC: <ajbeach@uwaterloo.ca>, <j82liang@uwaterloo.ca>
|
---|
| 379 | From: Gregor Richards <gregor.richards@uwaterloo.ca>
|
---|
| 380 |
|
---|
| 381 | I have to disagree with this being agreeing to disagree, since we agree
|
---|
| 382 | here. My core point was that it doesn't matter whether you enumerate over the
|
---|
| 383 | names or the values. This is a distinction without a difference in any case
|
---|
| 384 | that matters. If any of the various ways of looking at it are actually
|
---|
| 385 | different from each other, then that's because the enumeration has failed to be
|
---|
| 386 | an enumeration in some other way, not because of the actual process of
|
---|
| 387 | enumeration. Your flag enum is a 1-to-1 map of names and values, so whether you
|
---|
| 388 | walk through names or walk through values is not an actual distinction. It
|
---|
| 389 | could be distinct in the *order* that it walks through, but that doesn't
|
---|
| 390 | actually matter, it's just a choice that has to be made. Walking through entire
|
---|
| 391 | range of machine values, including ones that aren't part of the enumeration,
|
---|
| 392 | would be bizarre in any case.
|
---|
| 393 |
|
---|
| 394 | Writing these out has crystallized some thoughts, albeit perhaps not in a way
|
---|
| 395 | that's any help to y'all. An enumeration is a set of names; ideally an ordered
|
---|
| 396 | set of names. The state of enumerations in programming languages muddies things
|
---|
| 397 | because they often expose the machine value underlying those names, resulting
|
---|
| 398 | in a possibly ordered set of names and a definitely ordered set of values. And,
|
---|
| 399 | muddying things further, because those underlying values are exposed, enums are
|
---|
| 400 | used in ways that *depend* on the underlying values being exposed, making that
|
---|
| 401 | a part of the definition. But, an enumeration is conceptually just *one* set,
|
---|
| 402 | not both. So much of the difficulty is that you're trying to find a way to make
|
---|
| 403 | a concept that should be a single set agree with an implementation that's two
|
---|
| 404 | sets. If those sets have a 1-to-1 mapping, then who cares, they're just
|
---|
| 405 | aliases. It's the possibility of the map being surjective (having multiple
|
---|
| 406 | names for the same underlying values) that breaks everything. Personally, I
|
---|
| 407 | think that an enum with aliases isn't an enumeration anyway, so who cares about
|
---|
| 408 | the rest; if you're not wearing the gourd as a shoe, then it's not an
|
---|
| 409 | enumeration.
|
---|
| 410 |
|
---|
| 411 | With valediction,
|
---|
| 412 | - Gregor Richards
|
---|
| 413 | \end{comment}
|
---|