Changeset caf2cba
- Timestamp:
- Mar 23, 2024, 7:02:11 PM (9 months ago)
- Branches:
- master
- Children:
- 48b76d03
- Parents:
- e00b10d
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/jiada_liang_MMath/intro.tex
re00b10d rcaf2cba 1 1 \chapter{Introduction} 2 2 3 Naming values is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), etc. 4 Naming is also commonly used to represent many other numerical phenomenon, such as days of the week, months of a year, floors of a building (basement), specific times (noon, New Years). 5 Many programming languages capture this important software engineering capability through a mechanism called an \Newterm{enumeration}. 6 An enumeration is similar to other programming-language types by providing a set of constrained values, but adds the ability to name \emph{all} the values in the set. 7 Note, all enumeration names must be unique but different names can represent the same value (eight note, quaver), which are synonyms. 3 All types in a programming language must have a set of constants, and these constants have primary names, \eg integral types have constants @-1@, @17@, @12345@, \etc. 4 Constants can be overloaded among types, \eg @0@ is a null pointer, and zero for integral and floating-point types. 5 Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types. 6 In theory, there are an infinite set of primary names per type. 8 7 9 Specifically, an enumerated type restricts its values to a fixed set of named constants. 10 While all types are restricted to a fixed set of values because of the underlying von Neumann architecture, and hence, to a corresponding set of constants, \eg @3@, @3.5@, @3.5+2.1i@, @'c'@, @"abc"@, etc., these values are not named, other than the programming-language supplied constant names. 8 Secondary naming is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc. 9 Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name. 10 In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@. 11 Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{ 12 The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. 11 13 12 Fundamentally, all enumeration systems have an \Newterm{enumeration} type with an associated set of \Newterm{enumerator} names. 13 An enumeration has three universal attributes, \Newterm{label}, \Newterm{position}, and \Newterm{value}, as shown by this representative enumeration, where position and value can be different. 14 Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a fan deck (gator-belly blue, sky-blue pink), \etc. 15 Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}. 16 \begin{quote} 17 enumerate (verb, transitive). 18 To count, ascertain the number of; 19 \emph{more 20 usually, to mention (a number of things or persons) separately, as if for the 21 purpose of counting}; 22 to specify as in a list or catalogue.~\cite{OED} 23 \end{quote} 24 Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to its secondary names. 25 It is possible to enumerate among set names without having an ordering among the set elements. 26 For example, the days of the week are enumerated Monday to Sunday, the weekdays are Monday to Friday, the weekend is Saturday and Sunday, and every second weekday is Monday, Wednesday, Friday, and Sunday. 27 This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms. 28 A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday comes before Friday and Friday comes after. 29 Ordering allows iterating among the enumeration set using relational operators and advancement, \eg 30 \begin{cfa} 31 for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ... 32 \end{cfa} 33 rather than listing a proper subset of enumeration names. 34 35 In this work, the term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name. 36 An enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}, as shown by this representative enumeration. 14 37 \begin{cquote} 15 38 \small\sf\setlength{\tabcolsep}{3pt} 16 \begin{tabular}{rccccccc cccc}17 \it\color{red}enumeration & \multicolumn{7}{c}{\it\color{red}enumerators} \\18 $\downarrow$\hspace*{25pt} & \multicolumn{7}{c}{$\downarrow$} \\19 @enum@ Weekday \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun\}; \\20 \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun\\21 \it\color{red} position & 0 & 1 & 2 & 3 & 4 & 5 & 6\\22 \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 639 \begin{tabular}{rcccccccr} 40 \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\ 41 $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\ 42 @enum@ Weekday \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun & \}; \\ 43 \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\ 44 \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\ 45 \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 23 46 \end{tabular} 24 47 \end{cquote} 25 Here, the \Newterm{enumeration} @Weekday@ defines the ordered \Newterm{enumerator}s @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@. 26 By convention, the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of the associated enumerator constant values, implying an ordering among the enumerators. 27 As well, the value can be explicitly set so it is different from the position. 28 Because an enumerator is a constant, it cannot appear in a mutable context, \eg @Mon = Sun@ is meaningless, and an enumerator has no address, \ie it is an \Newterm{rvalue}\footnote{ 29 The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. 48 Here, the enumeration @Weekday@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@. 49 The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values. 50 Specifying complex ordering is possible: 51 \begin{cfa} 52 enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$ 53 enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$ 54 \end{cfa} 55 For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@. 56 For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@. 57 Only flat total-ordering among enumerators is considered in this work. 58 Finally, the values are the same as the ordering, but they can be set independently and be of any type. 30 59 31 On the surface, enumerations seem like a simple type.32 However, when extended with features available in other language types, enumerations become a complex.33 60 34 The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C. 61 \section{Enumeration Motivation} 62 63 Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering. 64 \begin{cfa} 65 const Size = 20, Pi = 3.14159; 66 enum { Size = 20, Pi = 3.14159 }; // unnamed enumeration $\(\Rightarrow\)$ no ordering 67 \end{cfa} 68 Without a type name, it is impossible to create an iterating cursor. 69 It is still possible to compare the internal representations, if that is meaningful, \eg @Size < Pi@. 70 71 Secondary renaming can similate an enumeration, but with extra effort. 72 \begin{cfa} 73 const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7; 74 \end{cfa} 75 Furthermore, reordering the enumerators requires manual renumbering. 76 \begin{cfa} 77 const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7; 78 \end{cfa} 79 Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language. 80 81 A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration. 82 A variant type is a tagged-union, where the possible types may be heterogeneous. 83 \begin{cfa} 84 @variant@ Variant { 85 @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S 86 @union {@ // implicit 87 case int i; 88 case double d; 89 case struct S { int i, j; } s; 90 @};@ 91 }; 92 \end{cfa} 93 Crucially, the union implies instance storage is shared by all of the variant types. 94 Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}. 95 Knowing which type is in a variant instance is crucial for correctness. 96 Occasionally, it is possible to statically determine, all regions where each variant type is used, so no runtime checking is necessary. 97 Otherwise, a tag is required to denote the particular type in the variant. 98 The tag can be implicitly set by the compiler on assignment, or explicitly set by the programmer. 99 Finally, some mechanism is necessary to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg type pattern-matching. 100 \begin{cfa} 101 Variant v = 3; // implicitly set tag to 0 102 switch( v ) { // know the type or test the tag 103 case int { /* only access i field in v */ } 104 case double { /* only access d field in v */ } 105 case S { /* only access s field in v */ } 106 } 107 \end{cfa} 108 For safety, either all variant types must be listed or a @default@ case must exist with no field accesses. 109 110 To simulate an enumeration with a variant, the tag is re-purposed for either ordering or value and the types are omitted. 111 \begin{cfa} 112 variant Weekday { 113 int tag; // implicit 0 => Mon, ..., 6 => Sun 114 @case Mon;@ // no type 115 ... 116 @case Sun;@ 117 }; 118 \end{cfa} 119 The type system ensures tag setting and testing are correct. 120 However, the enumeration operations are limited to the available tag operations, \eg pattern matching. 121 \begin{cfa} 122 Weekday weekday = Mon; 123 if ( @dynamic_cast(Mon)@weekday ) ... // test tag == Mon 124 \end{cfa} 125 While enumerating among tag names is possible: 126 \begin{cfa} 127 for ( Mon, Wed, Fri, Sun ) ... 128 \end{cfa} 129 ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@. 130 131 However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values. 132 Iterating using tag ordering and heterogeneous types, requires pattern matching. 133 \begin{cfa} 134 for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) { 135 switch( cursor ) { 136 case Mon { /* access special type for Mon */ } 137 ... 138 case Fri { /* access special type for Fri */ } 139 } 140 } 141 \end{cfa} 142 If the variant type or number of loop steps changes, the pattern must be adjusted. 143 If the start/stop values are dynamic, it is impossible to statically determine if all variant types are listed. 144 145 Forcing the notion of enumerating into variant types is ill formed and confusing. 146 Hence, there is only a weak equivalence between enumerations and variant type, justifying the enumeration type in a programming language. 147 35 148 36 149 \section{Contributions} 37 150 151 The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C. 152 On the surface, enumerations seem like a simple type. 153 However, when extended with advanced features, enumerations become complex for both the type system and the implementation.
Note: See TracChangeset
for help on using the changeset viewer.