Changeset 486caad for doc/theses/jiada_liang_MMath/intro.tex
- Timestamp:
- Mar 25, 2024, 7:15:30 PM (7 months ago)
- Branches:
- master
- Children:
- d734fa1
- Parents:
- df78cce (diff), bf050c5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/jiada_liang_MMath/intro.tex
rdf78cce r486caad 1 1 \chapter{Introduction} 2 2 3 Naming values is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), etc. 4 Naming is also commonly used to represent many other numerical phenomenon, such as days of the week, months of a year, floors of a building (basement), specific times (noon, New Years). 5 Many programming languages capture this important software engineering capability through a mechanism called an \Newterm{enumeration}. 6 An enumeration is similar to other programming-language types by providing a set of constrained values, but adds the ability to name \emph{all} the values in its set. 7 Note, all enumeration names must be unique but different names can represent the same value (eight note, quaver), which are synonyms. 3 All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @12345@, \etc. 4 Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types. 5 Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types. 6 In theory, there are an infinite set of primary names per type. 8 7 9 Specifically, an enumerated type restricts its values to a fixed set of named constants. 10 While all types are restricted to a fixed set of values because of the underlying von Neumann architecture, and hence, to a corresponding set of constants, \eg @3@, @3.5@, @3.5+2.1i@, @'c'@, @"abc"@, etc., these values are not named, other than the programming-language supplied constant names. 8 \Newterm{Secondary naming} is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc. 9 Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name. 10 In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@. 11 (The names the thing.) 12 Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{ 13 The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. 11 14 12 Fundamentally, all enumeration systems have an \Newterm{enumeration} type with an associated set of \Newterm{enumerator} names. 13 An enumeration has three universal attributes, \Newterm{position}, \Newterm{label}, and \Newterm{value}, as shown by this representative enumeration, where position and value can be different. 15 Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc. 16 Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}. 17 \begin{quote} 18 enumerate (verb, transitive). 19 To count, ascertain the number of; 20 \emph{more 21 usually, to mention (a number of things or persons) separately, as if for the 22 purpose of counting}; 23 to specify as in a list or catalogue.~\cite{OED} 24 \end{quote} 25 Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to hold only the secondary names. 26 It is possible to enumerate among set names without having an ordering among the set elements. 27 For example, the week, the weekdays, the weekend, and every second day of the week. 28 \begin{cfa}[morekeywords={in}] 29 for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$ 30 for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$ 31 for ( cursor in Thu, Fri } ... $\C{// weekend}$ 32 for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$ 33 \end{cfa} 34 This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms. 35 A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after. 36 Ordering allows iterating among the enumeration set using relational operators and advancement, \eg 37 \begin{cfa} 38 for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ... 39 \end{cfa} 40 Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names. 41 42 43 \section{Terminology} 44 45 The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name. 46 As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}. 14 47 \begin{cquote} 15 \s mall\sf\setlength{\tabcolsep}{3pt}16 \begin{tabular}{rccccccc cccc}17 \it\color{red}enumeration & \multicolumn{7}{c}{\it\color{red}enumerators} \\18 $\downarrow$\hspace*{25pt} & \multicolumn{7}{c}{$\downarrow$} \\19 @enum@ Week day \{ & Monday, & Tuesday, & Wednesday, & Thursday,& Friday, & Saturday, & Sunday \};\\20 \it\color{red} position & 0 & 1 & 2 & 3 & 4 & 5 & 6\\21 \it\color{red} label & Monday & Tuesday & Wednesday & Thursday & Friday & Saturday & Sunday\\22 \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 648 \sf\setlength{\tabcolsep}{3pt} 49 \begin{tabular}{rcccccccr} 50 \it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\ 51 $\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\ 52 @enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\ 53 \it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\ 54 \it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\ 55 \it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 & 23 56 \end{tabular} 24 57 \end{cquote} 25 Here, the \Newterm{enumeration} @Weekday@ defines the ordered \Newterm{enumerator}s @Monday@, @Tuesday@, @Wednesday@, @Thursday@, @Friday@, @Saturday@ and @Sunday@. 26 By convention, the successor of @Tuesday@ is @Monday@ and the predecessor of @Tuesday@ is @Wednesday@, independent of the associated enumerator constant values. 27 Because an enumerator is a constant, it cannot appear in a mutable context, \eg @Mon = Sun@ is meaningless, and an enumerator has no address, it is an \Newterm{rvalue}\footnote{ 28 The term rvalue defines an expression that can only appear on the right-hand side of an assignment.}. 58 Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@. 59 The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values. 60 The value is the constant represented by the secondary name, which can be implicitly or explicitly set. 61 62 Specifying complex ordering is possible: 63 \begin{cfa} 64 enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$ 65 enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$ 66 \end{cfa} 67 For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@. 68 For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@. 69 Only flat total-ordering among enumerators is considered in this work. 70 71 72 \section{Motivation} 73 74 Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering. 75 \begin{cfa} 76 const Size = 20, Pi = 3.14159; 77 enum { Size = 20, Pi = 3.14159 }; // unnamed enumeration $\(\Rightarrow\)$ no ordering 78 \end{cfa} 79 In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful; 80 however, without an enumeration type-name, it is impossible to create an iterator cursor. 81 82 Secondary renaming can similate an enumeration, but with extra effort. 83 \begin{cfa} 84 const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7; 85 \end{cfa} 86 Furthermore, reordering the enumerators requires manual renumbering. 87 \begin{cfa} 88 const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7; 89 \end{cfa} 90 Finally, there is no common type to create a type-checked instance or iterator cursor. 91 Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language. 92 93 A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration. 94 A variant type is a tagged-union, where the possible types may be heterogeneous. 95 \begin{cfa} 96 @variant@ Variant { 97 @int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S 98 @union {@ // implicit 99 case int i; 100 case double d; 101 case struct S { int i, j; } s; 102 @};@ 103 }; 104 \end{cfa} 105 Crucially, the union implies instance storage is shared by all of the variant types. 106 Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}. 107 Knowing which type is in a variant instance is crucial for correctness. 108 Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary; 109 otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching. 110 111 The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer. 112 Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg: 113 \begin{cfa}[morekeywords={match}] 114 Variant v = 3; // implicitly set tag to 0 115 @match@( v ) { // know the type or test the tag 116 case int { /* only access i field in v */ } 117 case double { /* only access d field in v */ } 118 case S { /* only access s field in v */ } 119 } 120 \end{cfa} 121 For safety, either all variant types must be listed or a @default@ case must exist with no field accesses. 122 123 To simulate an enumeration with a variant, the tag is \emph{re-purposed} for either ordering or value and the variant types are omitted. 124 \begin{cfa} 125 variant Weekday { 126 int tag; // implicit 0 => Mon, ..., 6 => Sun 127 @case Mon;@ // no type 128 ... 129 @case Sun;@ 130 }; 131 \end{cfa} 132 The type system ensures tag setting and testing are correctly done. 133 However, the enumeration operations are limited to the available tag operations, \eg pattern matching. 134 \begin{cfa} 135 Week week = Mon; 136 if ( @dynamic_cast(Mon)@week ) ... // test tag == Mon 137 \end{cfa} 138 While enumerating among tag names is possible: 139 \begin{cfa}[morekeywords={in}] 140 for ( cursor in Mon, Wed, Fri, Sun ) ... 141 \end{cfa} 142 ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@. 143 144 However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values. 145 Iterating using tag ordering and heterogeneous types, also requires pattern matching. 146 \begin{cfa}[morekeywords={match}] 147 for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) { 148 match( cursor ) { 149 case Mon { /* access special type for Mon */ } 150 ... 151 case Fri { /* access special type for Fri */ } 152 default 153 } 154 } 155 \end{cfa} 156 If the variant type is changed by adding/removing types or the loop range changes, the pattern matching must be adjusted. 157 As well, if the start/stop values are dynamic, it may be impossible to statically determine if all variant types are listed. 158 159 Re-purposing the notion of enumerating into variant types is ill formed and confusing. 160 Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language. 161 29 162 30 163 \section{Contributions} 164 165 The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C. 166 On the surface, enumerations seem like a simple type. 167 However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation. 168 169 \begin{enumerate} 170 \item 171 overloading 172 \item 173 scoping 174 \item 175 typing 176 \item 177 subset 178 \item 179 inheritance 180 \end{enumerate}
Note: See TracChangeset
for help on using the changeset viewer.