source: doc/theses/jiada_liang_MMath/intro.tex @ 10a99d87

Last change on this file since 10a99d87 was 46651fb, checked in by Peter A. Buhr <pabuhr@…>, 43 hours ago

small wording change to enumeration introduction

  • Property mode set to 100644
File size: 18.4 KB
Line 
1\chapter{Introduction}
2
3All types in a programming language have a set of constants (symbols), and these constants represent values, \eg integer types have constants @-1@, @17@, @0xff@ representing whole numbers, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@ representing  real numbers, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}} representing (human readable) text, \etc.
4Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integer and floating-point types.
5(In \CFA, the constants @0@ and @1@ can be overloaded for any type.)
6A constant's symbolic name is dictated by language syntax related to types, \eg @5.@ (double), @5.0f@ (float), @5l@ (long double).
7In general, the representation of a constant's value is \newterm{opaque}, so the internal representation can be chosen arbitrarily.
8In theory, there are an infinite set of constant names per type representing an infinite set of values.
9
10It is common in mathematics, engineering, and computer science to alias new constants to existing constants so they have the same value, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), K(k), M, G, T for powers of 2\footnote{Overloaded with SI powers of 10.} often prefixing bits (b) or bytes (B), \eg Gb, MB, and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
11An alias can bind to another alias, which transitively binds it to the specified constant.
12Multiple aliases can represent the same value, \eg eighth note and quaver, giving synonyms.
13
14Many programming languages capture this important software-engineering capability through a mechanism called \newterm{constant} or \newterm{literal} naming, where a new constant is aliased to an existing constant.
15Its purpose is for readability: replacing a constant name that directly represents a value with a name that is more symbolic and meaningful in the context of the program.
16Thereafter, changing the aliasing of the new constant to another constant automatically distributes the rebinding, preventing errors.
17% and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
18Because an aliased name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \newterm{rvalue}\footnote{
19The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
20In theory, there are an infinite set of possible aliasing, in practice, the number of aliasing per program is finite and small.
21
22Aliased constants can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
23In this case, the binding between a constant name and value can be implicit, where the values are chosen to support any set operations.
24Many programming languages capture the aliasing and ordering through a mechanism called an \newterm{enumeration}.
25\begin{quote}
26enumerate (verb, transitive).
27To count, ascertain the number of;
28more usually, to mention (a number of things or persons) separately, as if for the purpose of counting;
29to specify as in a list or catalogue.~\cite{OEDenumerate}
30\end{quote}
31Within an enumeration set, the enumeration names (aliases) must be unique, and instances of an enumerated type are \emph{often} restricted to hold only these names.
32
33It is possible to enumerate among set names without having an ordering among the set values.
34For example, the week, the weekdays, the weekend, and every second day of the week.
35\begin{cfa}[morekeywords={in}]
36for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
37for ( cursor in Mon, Tue, Wed, Thu, Fri } ...   $\C{// weekday}$
38for ( cursor in Sat, Sun } ...                                  $\C{// weekend}$
39for ( cursor in Mon, Wed, Fri, Sun } ...                $\C{// every second day of week}\CRT$
40\end{cfa}
41A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
42Ordering allows iterating among the enumeration set using relational operators and advancement, \eg:
43\begin{cfa}
44for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
45\end{cfa}
46Here the values for the set names are logically \emph{generated} rather than listing a subset of names.
47
48Hence, the fundamental aspects of an enumeration are:
49\begin{enumerate}
50\item
51\begin{sloppypar}
52It provides a finite set of new constants, which are implicitly or explicitly assigned values that must be appropriate for any set operations.
53This aspect differentiates an enumeration from general types with an infinite set of constants.
54\end{sloppypar}
55\item
56The alias names are constants, which follows transitively from their binding to other constants.
57\item
58Defines a type for generating instants (variables).
59\item
60For safety, an enumeration instance should be restricted to hold only its constant names.
61\item
62There is a mechanism for \emph{enumerating} over the enumeration names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
63\end{enumerate}
64
65
66\section{Terminology}
67\label{s:Terminology}
68
69The term \newterm{enumeration} defines a type with a set of new constants, and the term \newterm{enumerator} represents an arbitrary alias name \see{\VRef{s:CEnumeration} for the name derivations}.
70An enumerated type can have three fundamental properties, \newterm{label} (name), \newterm{order} (position), and \newterm{value} (payload).
71\begin{cquote}
72\sf\setlength{\tabcolsep}{3pt}
73\begin{tabular}{rcccccccr}
74\it\color{red}enumeration       & \multicolumn{8}{c}{\it\color{red}enumerators} \\
75$\downarrow$\hspace*{15pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
76@enum@ Week \{                          & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun {\color{red}= 42} & \};   \\
77\it\color{red}label                     & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun           &               \\
78\it\color{red}order                     & 0             & 1             & 2             & 3             & 4             & 5             & 6                     &               \\
79\it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & {\color{red}42}               &
80\end{tabular}
81\end{cquote}
82Here, the enumeration @Week@ defines the enumerator constants @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@, and @Sun@.
83The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
84The value is the implicitly/explicitly assigned constant to support any enumeration operations;
85the value may be hidden (opaque) or visible.
86
87Specifying complex ordering is possible:
88\begin{cfa}
89enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
90enum E2 { {A, {B, C} }, { {D, E}, F };  $\C{// nesting}$
91\end{cfa}
92For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
93For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
94Only flat total-ordering among enumerators is considered in this work.
95
96
97\section{Motivation}
98
99Many programming languages provide an enumeration-like mechanism, which may or may not cover the previous five fundamental enumeration aspects.
100Hence, the term \emph{enumeration} can be confusing and misunderstood.
101Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which feature is being used.
102This section discusses some language features that are sometimes called an enumeration but do not provide all enumeration aspects.
103
104
105\subsection{Aliasing}
106\label{s:Aliasing}
107
108Some languages provide simple aliasing (renaming), \eg:
109\begin{cfa}
110const Size = 20, Pi = 3.14159, Name = "Jane";
111\end{cfa}
112The alias name is logically replaced in the program text by its matching constant.
113It is possible to compare aliases, if the constants allow it, \eg @Size < Pi@, whereas @Pi < Name@ might be disallowed depending on the language.
114
115Aliasing is not macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language.
116With aliasing, each new name is part of the language, and hence, participates fully, such as name overloading in the type system.
117Aliasing is not an immutable variable, \eg:
118\begin{cfa}
119extern @const@ int Size = 20;
120extern void foo( @const@ int @&@ size );
121foo( Size ); // take the address of (reference) Size
122\end{cfa}
123Taking the address of an immutable variable makes it an \newterm{lvalue}, which implies it has storage.
124With separate compilation, it is necessary to choose one translation unit to perform the initialization.
125If aliasing does require storage, its address and initialization are opaque (compiler only), similar to \CC rvalue reference @&&@.
126
127Aliasing does provide readability and automatic resubstitution.
128It also provides simple enumeration properties, but with effort.
129\begin{cfa}
130const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
131\end{cfa}
132Any reordering of the enumerators requires manual renumbering.
133\begin{cfa}
134const @Sun = 1@, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
135\end{cfa}
136For these reasons, aliasing is sometimes called an enumeration.
137However, there is no type to create a type-checked instance or iterator cursor, so there is no ability for enumerating.
138Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language.
139
140
141\subsection{Algebraic Data Type}
142\label{s:AlgebraicDataType}
143
144An algebraic data type (ADT)\footnote{ADT is overloaded with abstract data type.} is another language feature often linked with enumeration, where an ADT conjoins an arbitrary type, possibly a \lstinline[language=C++]{class} or @union@, and a named constructor.
145For example, in Haskell:
146\begin{haskell}
147data S = S { i::Int, d::Double }                $\C{// structure}$
148data @Foo@ = A Int | B Double | C S             $\C{// ADT, composed of three types}$
149foo = A 3;                                                              $\C{// type Foo is inferred}$
150bar = B 3.5
151baz = C S{ i = 7, d = 7.5 }
152\end{haskell}
153the ADT has three variants (constructors), @A@, @B@, @C@ with associated types @Int@, @Double@, and @S@.
154The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@.
155Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to intialize and access the value using dynamic pattern-matching.
156\begin{cquote}
157\setlength{\tabcolsep}{15pt}
158\begin{tabular}{@{}ll@{}}
159\begin{haskell}
160prtfoo val = -- function
161    -- pattern match on constructor
162    case val of
163      @A@ a -> print a
164      @B@ b -> print b
165      @C@ (S i d) -> do
166          print i
167          print d
168\end{haskell}
169&
170\begin{haskell}
171main = do
172    prtfoo foo
173    prtfoo bar
174    prtfoo baz
1753
1763.5
1777
1787.5
179\end{haskell}
180\end{tabular}
181\end{cquote}
182For safety, most languages require all associated types to be listed or a default case with no field accesses.
183
184A less frequent case is multiple constructors with the same type.
185\begin{haskell}
186data Bar = X Int | Y Int | Z Int;
187foo = X 3;
188bar = Y 3;
189baz = Z 5;
190\end{haskell}
191Here, the constructor name gives different meaning to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching.
192
193Note, the term \newterm{variant} is often associated with ADTs.
194However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}.
195In these languages, the variant is often a union using RTTI tags for discrimination, which cannot be used to simulate an enumeration.
196Hence, in this work the term variant is not a synonym for ADT.
197
198% https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.19.1.0-179c/GHC-Enum.html
199% https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-Enum.html
200
201The association between ADT and enumeration occurs if all the constructors have a unit (empty) type, \eg @struct unit {}@.
202Note, the unit type is not the same as \lstinline{void}, \eg:
203\begin{cfa}
204void foo( void );
205struct unit {} u;       $\C[1.5in]{// empty type}$
206unit bar( unit );
207foo( foo() );           $\C{// void argument does not match with void parameter}$
208bar( bar( u ) );        $\C{// unit argument does match with unit parameter}\CRT$
209\end{cfa}
210
211For example, in the Haskell ADT:
212\begin{haskell}
213data Week = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving(Enum, Eq, Show)
214\end{haskell}
215the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other associated types, @Eq@ allows equality comparison, and @Show@ is for printing.
216The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provides enumerating operations @succ@, @pred@, @enumFrom@ @enumFromTo@.
217\VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating).
218
219\begin{figure}
220\begin{cquote}
221\setlength{\tabcolsep}{15pt}
222\begin{tabular}{@{}ll@{}}
223\begin{haskell}
224day = Tue
225main = do
226    if day == Tue then
227        print day
228    else
229        putStr "not Tue"
230    print (enumFrom Mon)            -- week
231    print (enumFromTo Mon Fri)   -- weekday
232    print (enumFromTo Sat Sun)  -- weekend
233\end{haskell}
234&
235\begin{haskell}
236Tue
237[Mon,Tue,Wed,Thu,Fri,Sat,Sun]
238[Mon,Tue,Wed,Thu,Fri]
239[Sat,Sun]
240
241
242
243
244
245\end{haskell}
246\end{tabular}
247\end{cquote}
248\caption{Haskell Enumeration}
249\label{f:HaskellEnumeration}
250\end{figure}
251
252The key observation is the dichotomy between an ADT and enumeration: the ADT uses the associated type resulting in a union-like data structure, and the enumeration does not use the associated type, and hence, is not a union.
253While an enumeration is constructed using the ADT mechanism, it is so restricted it is not an ADT.
254Furthermore, a general ADT cannot be an enumeration because the constructors generate different values making enumerating meaningless.
255While functional programming languages regularly repurpose the ADT type into an enumeration type, this process seems contrived and confusing.
256Hence, there is only a weak equivalence between an enumeration and ADT, justifying a separate enumeration type in a programming language.
257
258
259\section{Contributions}
260
261The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C.
262On the surface, enumerations seem like a simple type.
263However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
264
265The contribution of this work are:
266\begin{enumerate}
267\item
268overloading
269\item
270scoping
271\item
272typing
273\item
274subseting
275\item
276inheritance
277\end{enumerate}
278
279
280\begin{comment}
281Date: Wed, 1 May 2024 13:41:58 -0400
282Subject: Re: Enumeration
283To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
284From: Gregor Richards <gregor.richards@uwaterloo.ca>
285
286I think I have only one comment and one philosophical quibble to make:
287
288Comment: I really can't agree with putting MB in the same category as the
289others. MB is both a quantity and a unit, and the suggestion that MB *is* one
290million evokes the rather disgusting comparison 1MB = 1000km.  Unit types are
291not in the scope of this work.
292
293Philosophical quibble: Pi *is* 3.14159...etc. Monday is not 0; associating
294Monday with 0 is just a consequence of the language. The way this is written
295suggests that the intentional part is subordinate to the implementation detail,
296which seems backwards to me. Calling the number "primary" and the name
297"secondary" feels like you're looking out from inside of the compiler, instead
298of looking at the language from the outside. And, calling secondary values
299without visible primary values "opaque"-which yes, I realize is my own term
300;)-suggests that you insist that the primary value is a part of the design, or
301at least mental model, of the program. Although as a practical matter there is
302some system value associated with the constructor/tag of an ADT, that value is
303not part of the mental model, and so calling it "primary" and calling the name
304"secondary" and "opaque" seems either (a) very odd or (b) very C-biased. Or
305both.
306
307With valediction,
308  - Gregor Richards
309
310
311Date: Thu, 30 May 2024 23:15:23 -0400
312Subject: Re: Meaning?
313To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
314CC: <ajbeach@uwaterloo.ca>, <j82liang@uwaterloo.ca>
315From: Gregor Richards <gregor.richards@uwaterloo.ca>
316
317I have to disagree with this being agreeing to disagree, since we agree
318here. My core point was that it doesn't matter whether you enumerate over the
319names or the values. This is a distinction without a difference in any case
320that matters. If any of the various ways of looking at it are actually
321different from each other, then that's because the enumeration has failed to be
322an enumeration in some other way, not because of the actual process of
323enumeration. Your flag enum is a 1-to-1 map of names and values, so whether you
324walk through names or walk through values is not an actual distinction. It
325could be distinct in the *order* that it walks through, but that doesn't
326actually matter, it's just a choice that has to be made. Walking through entire
327range of machine values, including ones that aren't part of the enumeration,
328would be bizarre in any case.
329
330Writing these out has crystallized some thoughts, albeit perhaps not in a way
331that's any help to y'all. An enumeration is a set of names; ideally an ordered
332set of names. The state of enumerations in programming languages muddies things
333because they often expose the machine value underlying those names, resulting
334in a possibly ordered set of names and a definitely ordered set of values. And,
335muddying things further, because those underlying values are exposed, enums are
336used in ways that *depend* on the underlying values being exposed, making that
337a part of the definition. But, an enumeration is conceptually just *one* set,
338not both. So much of the difficulty is that you're trying to find a way to make
339a concept that should be a single set agree with an implementation that's two
340sets. If those sets have a 1-to-1 mapping, then who cares, they're just
341aliases. It's the possibility of the map being surjective (having multiple
342names for the same underlying values) that breaks everything. Personally, I
343think that an enum with aliases isn't an enumeration anyway, so who cares about
344the rest; if you're not wearing the gourd as a shoe, then it's not an
345enumeration.
346
347With valediction,
348  - Gregor Richards
349\end{comment}
Note: See TracBrowser for help on using the repository browser.