Changeset caf2cba


Ignore:
Timestamp:
Mar 23, 2024, 7:02:11 PM (9 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
48b76d03
Parents:
e00b10d
Message:

justification for enumerations

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/jiada_liang_MMath/intro.tex

    re00b10d rcaf2cba  
    11\chapter{Introduction}
    22
    3 Naming values is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), etc.
    4 Naming is also commonly used to represent many other numerical phenomenon, such as days of the week, months of a year, floors of a building (basement), specific times (noon, New Years).
    5 Many programming languages capture this important software engineering capability through a mechanism called an \Newterm{enumeration}.
    6 An enumeration is similar to other programming-language types by providing a set of constrained values, but adds the ability to name \emph{all} the values in the set.
    7 Note, all enumeration names must be unique but different names can represent the same value (eight note, quaver), which are synonyms.
     3All types in a programming language must have a set of constants, and these constants have primary names, \eg integral types have constants @-1@, @17@, @12345@, \etc.
     4Constants can be overloaded among types, \eg @0@ is a null pointer, and zero for integral and floating-point types.
     5Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
     6In theory, there are an infinite set of primary names per type.
    87
    9 Specifically, an enumerated type restricts its values to a fixed set of named constants.
    10 While all types are restricted to a fixed set of values because of the underlying von Neumann architecture, and hence, to a corresponding set of constants, \eg @3@, @3.5@, @3.5+2.1i@, @'c'@, @"abc"@, etc., these values are not named, other than the programming-language supplied constant names.
     8Secondary naming is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
     9Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
     10In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
     11Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
     12The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
    1113
    12 Fundamentally, all enumeration systems have an \Newterm{enumeration} type with an associated set of \Newterm{enumerator} names.
    13 An enumeration has three universal attributes, \Newterm{label}, \Newterm{position}, and \Newterm{value}, as shown by this representative enumeration, where position and value can be different.
     14Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a fan deck (gator-belly blue, sky-blue pink), \etc.
     15Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
     16\begin{quote}
     17enumerate (verb, transitive).
     18To count, ascertain the number of;
     19\emph{more
     20usually, to mention (a number of things or persons) separately, as if for the
     21purpose of counting};
     22to specify as in a list or catalogue.~\cite{OED}
     23\end{quote}
     24Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to its secondary names.
     25It is possible to enumerate among set names without having an ordering among the set elements.
     26For example, the days of the week are enumerated Monday to Sunday, the weekdays are Monday to Friday, the weekend is Saturday and Sunday, and every second weekday is Monday, Wednesday, Friday, and Sunday.
     27This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms.
     28A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday comes before Friday and Friday comes after.
     29Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
     30\begin{cfa}
     31for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
     32\end{cfa}
     33rather than listing a proper subset of enumeration names.
     34
     35In this work, the term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name.
     36An enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}, as shown by this representative enumeration.
    1437\begin{cquote}
    1538\small\sf\setlength{\tabcolsep}{3pt}
    16 \begin{tabular}{rccccccccccc}
    17 \it\color{red}enumeration & \multicolumn{7}{c}{\it\color{red}enumerators}       \\
    18 $\downarrow$\hspace*{25pt} & \multicolumn{7}{c}{$\downarrow$}                           \\
    19 @enum@ Weekday \{                               & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun \};       \\
    20 \it\color{red}label                             & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun           \\
    21 \it\color{red}position                  & 0             & 1             & 2             & 3             & 4             & 5             & 6                     \\
    22 \it\color{red}value                             & 0             & 1             & 2             & 3             & 4             & 5             & 6
     39\begin{tabular}{rcccccccr}
     40\it\color{red}enumeration       & \multicolumn{8}{c}{\it\color{red}enumerators} \\
     41$\downarrow$\hspace*{25pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
     42@enum@ Weekday \{                       & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun   & \};   \\
     43\it\color{red}label                     & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun   &               \\
     44\it\color{red}order                     & 0             & 1             & 2             & 3             & 4             & 5             & 6             &               \\
     45\it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & 6             &
    2346\end{tabular}
    2447\end{cquote}
    25 Here, the \Newterm{enumeration} @Weekday@ defines the ordered \Newterm{enumerator}s @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
    26 By convention, the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of the associated enumerator constant values, implying an ordering among the enumerators.
    27 As well, the value can be explicitly set so it is different from the position.
    28 Because an enumerator is a constant, it cannot appear in a mutable context, \eg @Mon = Sun@ is meaningless, and an enumerator has no address, \ie it is an \Newterm{rvalue}\footnote{
    29 The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
     48Here, the enumeration @Weekday@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
     49The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
     50Specifying complex ordering is possible:
     51\begin{cfa}
     52enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
     53enum E2 { {A, {B, C} }, { {D, E}, F };  $\C{// nesting}$
     54\end{cfa}
     55For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
     56For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
     57Only flat total-ordering among enumerators is considered in this work.
     58Finally, the values are the same as the ordering, but they can be set independently and be of any type.
    3059
    31 On the surface, enumerations seem like a simple type.
    32 However, when extended with features available in other language types, enumerations become a complex.
    3360
    34 The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
     61\section{Enumeration Motivation}
     62
     63Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering.
     64\begin{cfa}
     65const Size = 20, Pi = 3.14159;
     66enum { Size = 20, Pi = 3.14159 };   // unnamed enumeration $\(\Rightarrow\)$ no ordering
     67\end{cfa}
     68Without a type name, it is impossible to create an iterating cursor.
     69It is still possible to compare the internal representations, if that is meaningful, \eg @Size < Pi@.
     70
     71Secondary renaming can similate an enumeration, but with extra effort.
     72\begin{cfa}
     73const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
     74\end{cfa}
     75Furthermore, reordering the enumerators requires manual renumbering.
     76\begin{cfa}
     77const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
     78\end{cfa}
     79Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language.
     80
     81A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration.
     82A variant type is a tagged-union, where the possible types may be heterogeneous.
     83\begin{cfa}
     84@variant@ Variant {
     85        @int tag;@  // optional/implicit: 0 => int, 1 => double, 2 => S
     86        @union {@ // implicit
     87                case int i;
     88                case double d;
     89                case struct S { int i, j; } s;
     90        @};@
     91};
     92\end{cfa}
     93Crucially, the union implies instance storage is shared by all of the variant types.
     94Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}.
     95Knowing which type is in a variant instance is crucial for correctness.
     96Occasionally, it is possible to statically determine, all regions where each variant type is used, so no runtime checking is necessary.
     97Otherwise, a tag is required to denote the particular type in the variant.
     98The tag can be implicitly set by the compiler on assignment, or explicitly set by the programmer.
     99Finally, some mechanism is necessary to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg type pattern-matching.
     100\begin{cfa}
     101Variant v = 3; // implicitly set tag to 0
     102switch( v ) { // know the type or test the tag
     103        case int { /* only access i field in v */ }
     104        case double { /* only access d field in v */ }
     105        case S { /* only access s field in v */ }
     106}
     107\end{cfa}
     108For safety, either all variant types must be listed or a @default@ case must exist with no field accesses.
     109
     110To simulate an enumeration with a variant, the tag is re-purposed for either ordering or value and the types are omitted.
     111\begin{cfa}
     112variant Weekday {
     113        int tag; // implicit 0 => Mon, ..., 6 => Sun
     114        @case Mon;@ // no type
     115        ...
     116        @case Sun;@
     117};
     118\end{cfa}
     119The type system ensures tag setting and testing are correct.
     120However, the enumeration operations are limited to the available tag operations, \eg pattern matching.
     121\begin{cfa}
     122Weekday weekday = Mon;
     123if ( @dynamic_cast(Mon)@weekday ) ... // test tag == Mon
     124\end{cfa}
     125While enumerating among tag names is possible:
     126\begin{cfa}
     127for ( Mon, Wed, Fri, Sun ) ...
     128\end{cfa}
     129ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@.
     130
     131However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values.
     132Iterating using tag ordering and heterogeneous types, requires pattern matching.
     133\begin{cfa}
     134for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) {
     135        switch( cursor ) {
     136                case Mon { /* access special type for Mon */ }
     137                ...
     138                case Fri { /* access special type for Fri */ }
     139        }
     140}
     141\end{cfa}
     142If the variant type or number of loop steps changes, the pattern must be adjusted.
     143If the start/stop values are dynamic, it is impossible to statically determine if all variant types are listed.
     144
     145Forcing the notion of enumerating into variant types is ill formed and confusing.
     146Hence, there is only a weak equivalence between enumerations and variant type, justifying the enumeration type in a programming language.
     147
    35148
    36149\section{Contributions}
    37150
     151The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
     152On the surface, enumerations seem like a simple type.
     153However, when extended with advanced features, enumerations become complex for both the type system and the implementation.
Note: See TracChangeset for help on using the changeset viewer.