source: doc/theses/jiada_liang_MMath/intro.tex @ c148966

Last change on this file since c148966 was 41fb996, checked in by Peter A. Buhr <pabuhr@…>, 4 months ago

word smithing and poking at rust enumerations

  • Property mode set to 100644
File size: 9.9 KB
Line 
1\chapter{Introduction}
2
3All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @12345@, \etc.
4Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
5Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
6In theory, there are an infinite set of primary names per type.
7
8\Newterm{Secondary naming} is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
9Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
10In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
11(The names the thing.)
12Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
13The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
14
15Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
16Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
17\begin{quote}
18enumerate (verb, transitive).
19To count, ascertain the number of;
20\emph{more
21usually, to mention (a number of things or persons) separately, as if for the
22purpose of counting};
23to specify as in a list or catalogue.~\cite{OED}
24\end{quote}
25Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to hold only the secondary names.
26It is possible to enumerate among set names without having an ordering among the set elements.
27For example, the week, the weekdays, the weekend, and every second day of the week.
28\begin{cfa}[morekeywords={in}]
29for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
30for ( cursor in Mon, Tue, Wed, Thu, Fri } ...   $\C{// weekday}$
31for ( cursor in Thu, Fri } ...                                  $\C{// weekend}$
32for ( cursor in Mon, Wed, Fri, Sun } ...                $\C{// every second day of week}\CRT$
33\end{cfa}
34This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms.
35A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
36Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
37\begin{cfa}
38for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
39\end{cfa}
40Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names.
41
42
43\section{Terminology}
44
45The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name.
46As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
47\begin{cquote}
48\sf\setlength{\tabcolsep}{3pt}
49\begin{tabular}{rcccccccr}
50\it\color{red}enumeration       & \multicolumn{8}{c}{\it\color{red}enumerators} \\
51$\downarrow$\hspace*{25pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
52@enum@ Week \{                          & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun = 42      & \};   \\
53\it\color{red}label                     & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun           &               \\
54\it\color{red}order                     & 0             & 1             & 2             & 3             & 4             & 5             & 6                     &               \\
55\it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & 42            &
56\end{tabular}
57\end{cquote}
58Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
59The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
60The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
61
62Specifying complex ordering is possible:
63\begin{cfa}
64enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
65enum E2 { {A, {B, C} }, { {D, E}, F };  $\C{// nesting}$
66\end{cfa}
67For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
68For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
69Only flat total-ordering among enumerators is considered in this work.
70
71
72\section{Motivation}
73
74Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering.
75\begin{cfa}
76const Size = 20, Pi = 3.14159;
77enum { Size = 20, Pi = 3.14159 };   // unnamed enumeration $\(\Rightarrow\)$ no ordering
78\end{cfa}
79In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful;
80however, without an enumeration type-name, it is impossible to create an iterator cursor.
81
82Secondary renaming can similate an enumeration, but with extra effort.
83\begin{cfa}
84const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
85\end{cfa}
86Furthermore, reordering the enumerators requires manual renumbering.
87\begin{cfa}
88const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
89\end{cfa}
90Finally, there is no common type to create a type-checked instance or iterator cursor.
91Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language.
92
93A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration.
94A variant type is a tagged-union, where the possible types may be heterogeneous.
95\begin{cfa}
96@variant@ Variant {
97        @int tag;@  // optional/implicit: 0 => int, 1 => double, 2 => S
98        @union {@ // implicit
99                case int i;
100                case double d;
101                case struct S { int i, j; } s;
102        @};@
103};
104\end{cfa}
105Crucially, the union implies instance storage is shared by all of the variant types.
106Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}.
107Knowing which type is in a variant instance is crucial for correctness.
108Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary;
109otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching.
110
111The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer.
112Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg:
113\begin{cfa}[morekeywords={match}]
114Variant v = 3// implicitly set tag to 0
115@match@( v ) {    // know the type or test the tag
116        case int { /* only access i field in v */ }
117        case double { /* only access d field in v */ }
118        case S { /* only access s field in v */ }
119}
120\end{cfa}
121For safety, either all variant types must be listed or a @default@ case must exist with no field accesses.
122
123To simulate an enumeration with a variant, the tag is \emph{re-purposed} for either ordering or value and the variant types are omitted.
124\begin{cfa}
125variant Weekday {
126        int tag; // implicit 0 => Mon, ..., 6 => Sun
127        @case Mon;@ // no type
128        ...
129        @case Sun;@
130};
131\end{cfa}
132The type system ensures tag setting and testing are correctly done.
133However, the enumeration operations are limited to the available tag operations, \eg pattern matching.
134\begin{cfa}
135Week week = Mon;
136if ( @dynamic_cast(Mon)@week ) ... // test tag == Mon
137\end{cfa}
138While enumerating among tag names is possible:
139\begin{cfa}[morekeywords={in}]
140for ( cursor in Mon, Wed, Fri, Sun ) ...
141\end{cfa}
142ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@.
143
144However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values.
145Iterating using tag ordering and heterogeneous types, also requires pattern matching.
146\begin{cfa}[morekeywords={match}]
147for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) {
148        match( cursor ) {
149                case Mon { /* access special type for Mon */ }
150                ...
151                case Fri { /* access special type for Fri */ }
152                default
153        }
154}
155\end{cfa}
156If the variant type is changed by adding/removing types or the loop range changes, the pattern matching must be adjusted.
157As well, if the start/stop values are dynamic, it may be impossible to statically determine if all variant types are listed.
158
159Re-purposing the notion of enumerating into variant types is ill formed and confusing.
160Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language.
161
162
163\section{Contributions}
164
165The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
166On the surface, enumerations seem like a simple type.
167However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
168
169\begin{enumerate}
170\item
171overloading
172\item
173scoping
174\item
175typing
176\item
177subset
178\item
179inheritance
180\end{enumerate}
Note: See TracBrowser for help on using the repository browser.