source: doc/theses/jiada_liang_MMath/intro.tex @ c148966

Last change on this file since c148966 was 41fb996, checked in by Peter A. Buhr <pabuhr@…>, 5 months ago

word smithing and poking at rust enumerations

  • Property mode set to 100644
File size: 9.9 KB
RevLine 
[18ebc28]1\chapter{Introduction}
2
[41fb996]3All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @12345@, \etc.
[48b76d03]4Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
[caf2cba]5Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
6In theory, there are an infinite set of primary names per type.
[18ebc28]7
[41fb996]8\Newterm{Secondary naming} is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
[caf2cba]9Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
10In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
[41fb996]11(The names the thing.)
[caf2cba]12Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
13The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
14
[48b76d03]15Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
[caf2cba]16Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
17\begin{quote}
18enumerate (verb, transitive).
19To count, ascertain the number of;
20\emph{more
21usually, to mention (a number of things or persons) separately, as if for the
22purpose of counting};
23to specify as in a list or catalogue.~\cite{OED}
24\end{quote}
[41fb996]25Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to hold only the secondary names.
[caf2cba]26It is possible to enumerate among set names without having an ordering among the set elements.
[48b76d03]27For example, the week, the weekdays, the weekend, and every second day of the week.
28\begin{cfa}[morekeywords={in}]
29for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
30for ( cursor in Mon, Tue, Wed, Thu, Fri } ...   $\C{// weekday}$
31for ( cursor in Thu, Fri } ...                                  $\C{// weekend}$
32for ( cursor in Mon, Wed, Fri, Sun } ...                $\C{// every second day of week}\CRT$
33\end{cfa}
[caf2cba]34This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms.
[48b76d03]35A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
[caf2cba]36Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
37\begin{cfa}
38for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
39\end{cfa}
[48b76d03]40Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names.
41
[956299b]42
[48b76d03]43\section{Terminology}
44
45The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name.
46As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
[956299b]47\begin{cquote}
[48b76d03]48\sf\setlength{\tabcolsep}{3pt}
[caf2cba]49\begin{tabular}{rcccccccr}
50\it\color{red}enumeration       & \multicolumn{8}{c}{\it\color{red}enumerators} \\
51$\downarrow$\hspace*{25pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
[41fb996]52@enum@ Week \{                          & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun = 42      & \};   \\
[48b76d03]53\it\color{red}label                     & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun           &               \\
54\it\color{red}order                     & 0             & 1             & 2             & 3             & 4             & 5             & 6                     &               \\
55\it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & 42            &
[956299b]56\end{tabular}
57\end{cquote}
[41fb996]58Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
[caf2cba]59The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
[48b76d03]60The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
61
[caf2cba]62Specifying complex ordering is possible:
63\begin{cfa}
64enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
65enum E2 { {A, {B, C} }, { {D, E}, F };  $\C{// nesting}$
66\end{cfa}
67For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
68For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
69Only flat total-ordering among enumerators is considered in this work.
[7d9a805b]70
71
[48b76d03]72\section{Motivation}
[caf2cba]73
74Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering.
75\begin{cfa}
76const Size = 20, Pi = 3.14159;
77enum { Size = 20, Pi = 3.14159 };   // unnamed enumeration $\(\Rightarrow\)$ no ordering
78\end{cfa}
[48b76d03]79In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful;
80however, without an enumeration type-name, it is impossible to create an iterator cursor.
[caf2cba]81
82Secondary renaming can similate an enumeration, but with extra effort.
83\begin{cfa}
84const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
85\end{cfa}
86Furthermore, reordering the enumerators requires manual renumbering.
87\begin{cfa}
88const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
89\end{cfa}
[48b76d03]90Finally, there is no common type to create a type-checked instance or iterator cursor.
[caf2cba]91Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language.
92
93A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration.
94A variant type is a tagged-union, where the possible types may be heterogeneous.
95\begin{cfa}
96@variant@ Variant {
97        @int tag;@  // optional/implicit: 0 => int, 1 => double, 2 => S
98        @union {@ // implicit
99                case int i;
100                case double d;
101                case struct S { int i, j; } s;
102        @};@
103};
104\end{cfa}
105Crucially, the union implies instance storage is shared by all of the variant types.
106Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}.
107Knowing which type is in a variant instance is crucial for correctness.
[41fb996]108Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary;
[48b76d03]109otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching.
110
111The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer.
112Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg:
113\begin{cfa}[morekeywords={match}]
114Variant v = 3// implicitly set tag to 0
115@match@( v ) {    // know the type or test the tag
[caf2cba]116        case int { /* only access i field in v */ }
117        case double { /* only access d field in v */ }
118        case S { /* only access s field in v */ }
119}
120\end{cfa}
121For safety, either all variant types must be listed or a @default@ case must exist with no field accesses.
122
[41fb996]123To simulate an enumeration with a variant, the tag is \emph{re-purposed} for either ordering or value and the variant types are omitted.
[caf2cba]124\begin{cfa}
125variant Weekday {
126        int tag; // implicit 0 => Mon, ..., 6 => Sun
127        @case Mon;@ // no type
128        ...
129        @case Sun;@
130};
131\end{cfa}
[41fb996]132The type system ensures tag setting and testing are correctly done.
[caf2cba]133However, the enumeration operations are limited to the available tag operations, \eg pattern matching.
134\begin{cfa}
[41fb996]135Week week = Mon;
136if ( @dynamic_cast(Mon)@week ) ... // test tag == Mon
[caf2cba]137\end{cfa}
138While enumerating among tag names is possible:
[48b76d03]139\begin{cfa}[morekeywords={in}]
140for ( cursor in Mon, Wed, Fri, Sun ) ...
[caf2cba]141\end{cfa}
142ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@.
143
144However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values.
[48b76d03]145Iterating using tag ordering and heterogeneous types, also requires pattern matching.
[41fb996]146\begin{cfa}[morekeywords={match}]
[caf2cba]147for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) {
[41fb996]148        match( cursor ) {
[caf2cba]149                case Mon { /* access special type for Mon */ }
150                ...
151                case Fri { /* access special type for Fri */ }
[41fb996]152                default
[caf2cba]153        }
154}
155\end{cfa}
[41fb996]156If the variant type is changed by adding/removing types or the loop range changes, the pattern matching must be adjusted.
157As well, if the start/stop values are dynamic, it may be impossible to statically determine if all variant types are listed.
[caf2cba]158
[41fb996]159Re-purposing the notion of enumerating into variant types is ill formed and confusing.
[48b76d03]160Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language.
[caf2cba]161
[f9da761]162
163\section{Contributions}
[7d9a805b]164
[caf2cba]165The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
166On the surface, enumerations seem like a simple type.
[41fb996]167However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
[48b76d03]168
169\begin{enumerate}
170\item
171overloading
172\item
173scoping
174\item
175typing
176\item
177subset
178\item
179inheritance
180\end{enumerate}
Note: See TracBrowser for help on using the repository browser.