Changeset 314c9d8 for doc


Ignore:
Timestamp:
Apr 25, 2024, 3:43:35 PM (2 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
55c97e4
Parents:
566cc33
Message:

more proofreading on introduction chapter (discussion of ADT)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/jiada_liang_MMath/intro.tex

    r566cc33 r314c9d8  
    55(In \CFA, the primary constants @0@ and @1@ can be overloaded for any type.)
    66Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
    7 In theory, there are an infinite set of primary names per type.
    8 
    9 \Newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (mega byte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
     7In theory, there are an infinite set of primary constant names per type.
     8
     9\Newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (megabyte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
    1010Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
    11 Its common purpose is to eliminate duplication of the primary constant throughout a program.
    12 For example, the secondary name replaces its primary name, thereafter changing the binding of the secondary to primary name automatically distributes the rebinding throughout the program.
     11Its purpose is for readability and to eliminate duplication of the primary constant throughout a program.
     12For example, a meaningful secondary name replaces a primary name throughout a program;
     13thereafter, changing the binding of the secondary to primary name automatically distributes the rebinding, preventing errors.
    1314In some cases, secondary naming is \Newterm{opaque}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
    1415Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
    1516The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
    1617
    17 Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
     18Secondary names can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
    1819Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
    1920\begin{quote}
     
    3435This independence from internal representation allows multiple names to have the same representation (eighth note, quaver), giving synonyms.
    3536A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
    36 Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
     37Ordering allows iterating among the enumeration set using relational operators and advancement, \eg:
    3738\begin{cfa}
    3839for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
    3940\end{cfa}
    40 Here the internal representations for the secondary names are logically \emph{generated} rather than listing a subset of names.
     41Here the internal representation for the secondary names are logically \emph{generated} rather than listing a subset of names.
    4142
    4243Hence, the fundamental aspects of an enumeration are:
    4344\begin{enumerate}
    4445\item
    45 It defines a type from which instants can be generated.
    46 \item
    47 The type lists a finite set of secondary names, which become its primary constants.
    48 This differentiates an enumeration from general types with an infinite number of primary constants.
    49 \item
    50 An enumeration's secondary names represent constants, which follows from their binding (aliasing) to primary names, which are constants.
    51 \item
    52 For safety, an enumeration instance is restricted to hold only its type's secondary names.
     46\begin{sloppypar}
     47It provides a finite set of secondary names, which become its primary constants.
     48This differentiates an enumeration from general types with an infinite set
     49of primary constants.
     50\end{sloppypar}
     51\item
     52The secondary names are constants, which follows transitively from their binding (aliasing) to primary names, which are constants.
     53\item
     54Defines a type for generating instants (variables).
     55\item
     56For safety, an enumeration instance should be restricted to hold only its type's secondary names.
    5357\item
    5458There is a mechanism for \emph{enumerating} over the secondary names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
     
    5963\label{s:Terminology}
    6064
    61 The term \Newterm{enumeration} defines a type with a set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration}}.
    62 As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
     65The term \Newterm{enumeration} defines a type with a set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration} for the name derivation}.
     66As well, an enumerated type can have three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
    6367\begin{cquote}
    6468\sf\setlength{\tabcolsep}{3pt}
    6569\begin{tabular}{rcccccccr}
    6670\it\color{red}enumeration       & \multicolumn{8}{c}{\it\color{red}enumerators} \\
    67 $\downarrow$\hspace*{25pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
    68 @enum@ Week \{                          & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun = 42      & \};   \\
     71$\downarrow$\hspace*{15pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
     72@enum@ Week \{                          & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun {\color{red}= 42} & \};   \\
    6973\it\color{red}label                     & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun           &               \\
    7074\it\color{red}order                     & 0             & 1             & 2             & 3             & 4             & 5             & 6                     &               \\
    71 \it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & 42            &
     75\it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & {\color{red}42}               &
    7276\end{tabular}
    7377\end{cquote}
     
    8892\section{Motivation}
    8993
    90 Some programming languages only provide direct secondary renaming.
     94Many programming languages provide an enumeration-like mechanism, which may or may not cover the previous five fundamental enumeration aspects.
     95Hence, the term \emph{enumeration} can be confusing and misunderstood.
     96Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which featuring is being used.
     97This section discusses some language features that are sometimes called an enumeration but do not provide all enumeration aspects.
     98
     99
     100\subsection{Aliasing}
     101
     102Some languages provide simple secondary aliasing (renaming), \eg:
    91103\begin{cfa}
    92104const Size = 20, Pi = 3.14159, Name = "Jane";
    93105\end{cfa}
    94 Here, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful.
    95 
    96 Secondary renaming can simulate an enumeration, but with extra effort.
     106The secondary name is logically replaced in the program text by its corresponding primary name.
     107Therefore, it is possible to compare the secondary names, \eg @Size < Pi@, only because the primary constants allow it, whereas \eg @Pi < Name@ might be disallowed depending on the language.
     108
     109Aliasing is not macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language.
     110With aliasing, each secondary name is part of the language, and hence, participates fully, such as name overloading in the type system.
     111Aliasing is not an immutable variable, \eg:
     112\begin{cfa}
     113extern @const@ int Size = 20;
     114extern void foo( @const@ int @&@ size );
     115foo( Size ); // take the address of (reference) Size
     116\end{cfa}
     117Taking the address of an immutable variable makes it an \Newterm{lvalue}, which implies it has storage.
     118With separate compilation, it is necessary to choose one translation unit to perform the initialization.
     119If aliasing does require storage, its address and initialization are opaque (compiler only), similar to \CC rvalue reference @&&@.
     120
     121Aliasing does provide readability and automatic resubstitution.
     122It also provides simple enumeration properties, but with extra effort.
    97123\begin{cfa}
    98124const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
     
    102128const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
    103129\end{cfa}
    104 Finally, there is no type to create a type-checked instance or iterator cursor.
    105 Hence, there is only a weak equivalence between secondary naming and enumerations, justifying a seperate enumeration type in a programming language.
    106 
    107 A variant (algebraic) type is often promoted as a kind of enumeration, \ie a variant type can simulate an enumeration.
    108 Fundamentally, a variant type is a tagged-union, where the tag is normally opaque and the types are usually heterogeneous.
    109 \begin{cfa}[morekeywords={variant}]
    110 @variant@ Variant {
    111         @int tag;@  // optional/implicit: 0 => int, 1 => double, 2 => S
    112         @union {@ // implicit
    113                 case int i;
    114                 case double d;
    115                 case struct S { int i, j; } s;
    116         @};@
    117 };
    118 \end{cfa}
    119 Crucially, the union implies instance storage is shared by all the variant types, and therefore, before a variant type can be used in a statically-typed expression, it must be dynamically discriminated to its current contained type.
    120 Hence, knowing which type is in a variant instance is crucial for correctness.
    121 Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary.
    122 Otherwise, a tag is required to denote the particular type in the variant, and the tag is discriminated at runtime using some form of type pattern-matching, after which the value can be used in a statically-typed expression.
    123 
    124 A less frequent variant case is multiple variants with the same type, which normally requires explicit naming of the tag to disambiguate among the common types.
     130For these reasons, aliasing is sometimes called an enumeration.
     131However, there is no type to create a type-checked instance or iterator cursor, so there is no ability for enumerating.
     132Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language.
     133
     134
     135\subsection{Algebraic Data Type}
     136
     137An algebraic data type (ADT)\footnote{ADT is overloaded with abstract data type.} is another language feature often linked with enumeration, where an ADT conjoins an arbitrary type, possibly a \lstinline[language=C++]{class} or @union@, and a named constructor.
     138For example, in Haskell:
     139\begin{haskell}
     140data S = S { i::Int, d::Double }                $\C{// structure}$
     141data @Foo@ = A Int | B Double | C S             $\C{// ADT, composed of three types}$
     142foo = A 3;                                                              $\C{// type Foo is inferred}$
     143bar = B 3.5
     144baz = C S{ i = 7, d = 7.5 }
     145\end{haskell}
     146the ADT has three variants (constructors), @A@, @B@, @C@ with associated types @Int@, @Double@, and @S@.
     147The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@.
     148Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to access the value using dynamic pattern-matching.
    125149\begin{cquote}
    126 \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
    127 \begin{cfa}[morekeywords={variant}]
    128 variant VariantCT {
    129         case @car@: int i;  // explicitly typed
    130         case @boat@: int i;
    131         case @bridge@: int i;
    132 };
    133 \end{cfa}
     150\setlength{\tabcolsep}{15pt}
     151\begin{tabular}{@{}ll@{}}
     152\begin{haskell}
     153prtfoo val = -- function
     154    -- pattern match on constructor
     155    case val of
     156      @A@ a -> print a
     157      @B@ b -> print b
     158      @C@ (S i d) -> do
     159          print i
     160          print d
     161\end{haskell}
    134162&
    135 \begin{cfa}[morekeywords={variant}]
    136 variant VariantCU {
    137         case @car@: ;  // empty or unit type
    138         case @boat@: ;
    139         case @bridge@: ;
    140 };
    141 \end{cfa}
     163\begin{haskell}
     164main = do
     165    prtfoo foo
     166    prtfoo bar
     167    prtfoo baz
     1683
     1693.5
     1707
     1717.5
     172\end{haskell}
    142173\end{tabular}
    143174\end{cquote}
    144 Here, the explicit tag name is used to give different meaning to the values in the common @int@ type, \eg the value 3 has different interpretations depending on the tag name.
    145 It is even possible to remove the type or use the empty @unit@ type (@struct unit {}@).
    146 It is this tag naming that is used to simulate an enumeration.
    147 
    148 Normally, the variant tag is implicitly set by the compiler based on type, but with common types, a tag name is required to resolve type ambiguity.
    149 \begin{cfa}
    150 Variant v = 3;                                                  $\C{// implicitly set tag to 0 based on type of 3}$
    151 VariantCT ve = boats.3;                                 $\C{// explicitly set tag to 1 using tag name}$
    152 \end{cfa}
    153 Type pattern-matching is then used to dynamically test the tag and branch to a section of statically-typed code to safely manipulate the value, \eg:
     175For safety, most languages require all assocaited types to be listed or a default case with no field accesses.
     176
     177A less frequent case is multiple constructors with the same type.
     178\begin{haskell}
     179data Bar = X Int | Y Int | Z Int;
     180foo = X 3;
     181bar = Y 3;
     182baz = Z 5;
     183\end{haskell}
     184Here, the constructor name gives different meaning to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching.
     185
     186Note, the term \Newterm{variant} is often associated with ADTs.
     187However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}.
     188In these languages, the variant is often a union using RTTI tags, which cannot be used to simulate an enumeration.
     189Hence, in this work the term variant is not a synonym for ADT.
     190
     191% https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.19.1.0-179c/GHC-Enum.html
     192% https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-Enum.html
     193
     194The association between ADT and enumeration occurs if all the constructors have a unit (empty) type, \eg @struct unit {}@.
     195Note, the unit type is not the same as \lstinline{void}, \eg:
     196\begin{cfa}
     197void foo( void );
     198struct unit {} u;  // empty type
     199unit bar( unit );
     200foo( foo() );        // void argument does not match with void parameter
     201bar( bar( u ) );   // unit argument does match with unit parameter
     202\end{cfa}
     203
     204For example, in the Haskell ADT:
     205\begin{haskell}
     206data Week = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving(Enum, Eq, Show)
     207\end{haskell}
     208the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other type, @Eq@ allows equality comparison, and @Show@ is for printing.
     209The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provides enumerating operations @succ@, @pred@, @enumFrom@ @enumFromTo@.
     210\VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating).
     211
     212\begin{figure}
    154213\begin{cquote}
    155 \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
    156 \begin{cfa}[morekeywords={match}]
    157 @match@( v ) { // know type implicitly or test tag
    158         case int { /* only access i field */ }
    159         case double { /* only access d field */ }
    160         case S { /* only access s field */ }
    161 }
    162 \end{cfa}
     214\setlength{\tabcolsep}{15pt}
     215\begin{tabular}{@{}ll@{}}
     216\begin{haskell}
     217day = Tue
     218main = do
     219    if day == Tue then
     220        print day
     221    else
     222        putStr "not Tue"
     223    print (enumFrom Mon)            -- week
     224    print (enumFromTo Mon Fri)   -- weekday
     225    print (enumFromTo Sat Sun)  -- weekend
     226\end{haskell}
    163227&
    164 \begin{cfa}[morekeywords={match}]
    165 @match@( ve ) {
    166         case car: int { /* car interpretation */ }
    167         case boat: int { /* boat interpretation */ }
    168         case bridge: int { /* bridge interpretation */ }
    169 }
    170 \end{cfa}
     228\begin{haskell}
     229Tue
     230[Mon,Tue,Wed,Thu,Fri,Sat,Sun]
     231[Mon,Tue,Wed,Thu,Fri]
     232[Sat,Sun]
     233
     234
     235
     236
     237
     238\end{haskell}
    171239\end{tabular}
    172240\end{cquote}
    173 For safety, some languages require all variant types to be listed or a @default@ case with no field accesses.
    174 
    175 To further strengthen the simulate for an enumeration with different values, each variant type can be a @const@ type or the tag becomes non-opaque, possibly taking advantage of the opaque auto-numbering.
    176 \begin{cquote}
    177 \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
    178 \begin{cfa}
    179 variant Week {
    180         case Mon: const int = 0;
    181         ...
    182         case Sat: const int = 5;
    183         case Sun: const int = 10;
    184 };
    185 \end{cfa}
    186 &
    187 \begin{cfa}
    188 variant Week {
    189         case Mon: ;  // tag auto-numbering
    190         ...
    191         case Sat: ;
    192         case @Sun = 10@: ; // directly set tag value
    193 };
    194 \end{cfa}
    195 \end{tabular}
    196 \end{cquote}
    197 Directly setting the tag implies restrictions, like unique values.
    198 In both cases, instances of @Week@ are @const@ (immutable).
    199 However, usage between these two types becomes complex.
    200 \begin{cfa}
    201 Week day = Week.Mon;  // sets value or tag depending on type
    202 if ( day == Week.Mon )   // dereference value or tag ?
    203 \end{cfa}
    204 Here, the dereference of @day@ should return the value of the type stored in the variant, never the tag.
    205 If it does return the tag, some special meaning must be given to the empty/unit type, especially if a variant contains both regular and unit types.
    206 
    207 
    208 In general, the enumeration simulation and the variant extensions to support it, are deviating from the normal use of a variant (union) type.
    209 As well, the enumeration operations are limited to the available tag operations, \eg pattern matching.
    210 While enumerating among tag names is possible:
    211 \begin{cfa}[morekeywords={in}]
    212 for ( cursor in Week.Mon, Week.Wed, Week.Fri, Week.Sun ) ...
    213 \end{cfa}
    214 what is the type of @cursor@?
    215 If it the tag type (@int@), how is this value used?
    216 If it is the variant type, where is the instance variable, which only contains one value.
    217 Hence, either enumerating with a variant enumeration is disallowed or some unusual typing rule must be invented to make it work but only in restricted contexts.
    218 
    219 While functional programming systems regularly re-purposing variant types into enumeration types, this process seems contrived and confusing.
    220 A variant tag is not an enumeration, it is a discriminant among a restricted set of types stored in a storage block.
    221 Hence, there is only a weak equivalence between an enumeration and variant type, justifying a seperate enumeration type in a programming language.
     241\caption{Haskell Enumeration}
     242\label{f:HaskellEnumeration}
     243\end{figure}
     244
     245The key observation is the dichotomy between an ADT and enumeration: the ADT uses the associated type resulting in a union-like data structure, and the enumeration does not use the associated type, and hence, is not a union.
     246While the enumeration is constructed using the ADT mechanism, it is so restricted it is not really an ADT.
     247Furthermore, a general ADT cannot be an enumeration because the constructors generate different values making enumerating meaningless.
     248While functional programming languages regularly repurpose the ADT type into an enumeration type, this process seems contrived and confusing.
     249Hence, there is only a weak equivalence between an enumeration and ADT, justifying a separate enumeration type in a programming language.
    222250
    223251
    224252\section{Contributions}
    225253
    226 The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
     254The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C.
    227255On the surface, enumerations seem like a simple type.
    228256However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
    229257
     258The contribution of this work are:
    230259\begin{enumerate}
    231260\item
     
    236265typing
    237266\item
    238 subset
     267subseting
    239268\item
    240269inheritance
Note: See TracChangeset for help on using the changeset viewer.