Context Navigation

source: doc/theses/jiada_liang_MMath/intro.tex @ 4da9142

Last change on this file since 4da9142 was 4da9142, checked in by Peter A. Buhr <pabuhr@…>, 3 months ago
more proofreading on enumerations
Property mode set to `100644`
File size: 12.9 KB

Line
1	\chapter{Introduction}
2
3	All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @0xff@, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}}, \etc.
4	Con\-stants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
5	(In \CFA, the primary constants @0@ and @1@ can be overloaded for any type.)
6	Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
7	In theory, there are an infinite set of primary names per type.
8
9	\Newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (mega byte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
10	Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
11	Its common purpose is to eliminate duplication of the primary constant throughout a program.
12	For example, the secondary name replaces its primary name, thereafter changing the binding of the secondary to primary name automatically distributes the rebinding throughout the program.
13	In some cases, secondary naming is \Newterm{opaque}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
14	Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
15	The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
16
17	Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
18	Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
19	\begin{quote}
20	enumerate (verb, transitive).
21	To count, ascertain the number of;
22	more usually, to mention (a number of things or persons) separately, as if for the purpose of counting;
23	to specify as in a list or catalogue.~\cite{OEDenumerate}
24	\end{quote}
25	Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are \emph{often} restricted to hold only the secondary names.
26	It is possible to enumerate among set names without having an ordering among the set elements.
27	For example, the week, the weekdays, the weekend, and every second day of the week.
28	\begin{cfa}[morekeywords={in}]
29	for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
30	for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$
31	for ( cursor in Sat, Sun } ... $\C{// weekend}$
32	for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$
33	\end{cfa}
34	This independence from internal representation allows multiple names to have the same representation (eighth note, quaver), giving synonyms.
35	A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
36	Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
37	\begin{cfa}
38	for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
39	\end{cfa}
40	Here the internal representations for the secondary names are logically \emph{generated} rather than listing a subset of names.
41
42	Hence, the fundamental aspects of an enumeration are:
43	\begin{enumerate}
44	\item
45	It defines a type from which instants can be generated.
46	\item
47	The type lists a finite set of secondary names, which become its primary constants.
48	This differentiates an enumeration from general types with an infinite number of primary constants.
49	\item
50	An enumeration's secondary names represent constants, which follows from their binding (aliasing) to primary names, which are constants.
51	\item
52	For safety, an enumeration instance is restricted to hold only its type's secondary names.
53	\item
54	There is a mechanism for \emph{enumerating} over the secondary names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
55	\end{enumerate}
56
57
58	\section{Terminology}
59	\label{s:Terminology}
60
61	The term \Newterm{enumeration} defines a type with a set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration}}.
62	As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
63	\begin{cquote}
64	\sf\setlength{\tabcolsep}{3pt}
65	\begin{tabular}{rcccccccr}
66	\it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
67	$\downarrow$\hspace*{25pt} & \multicolumn{8}{c}{$\downarrow$} \\
68	@enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun = 42 & \}; \\
69	\it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
70	\it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
71	\it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & 42 &
72	\end{tabular}
73	\end{cquote}
74	Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
75	The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
76	The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
77
78	Specifying complex ordering is possible:
79	\begin{cfa}
80	enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
81	enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
82	\end{cfa}
83	For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
84	For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
85	Only flat total-ordering among enumerators is considered in this work.
86
87
88	\section{Motivation}
89
90	Some programming languages only provide direct secondary renaming.
91	\begin{cfa}
92	const Size = 20, Pi = 3.14159, Name = "Jane";
93	\end{cfa}
94	Here, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful.
95
96	Secondary renaming can simulate an enumeration, but with extra effort.
97	\begin{cfa}
98	const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
99	\end{cfa}
100	Any reordering of the enumerators requires manual renumbering.
101	\begin{cfa}
102	const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
103	\end{cfa}
104	Finally, there is no type to create a type-checked instance or iterator cursor.
105	Hence, there is only a weak equivalence between secondary naming and enumerations, justifying a seperate enumeration type in a programming language.
106
107	A variant (algebraic) type is often promoted as a kind of enumeration, \ie a variant type can simulate an enumeration.
108	Fundamentally, a variant type is a tagged-union, where the tag is normally opaque and the types are usually heterogeneous.
109	\begin{cfa}[morekeywords={variant}]
110	@variant@ Variant {
111	@int tag;@ // optional/implicit: 0 => int, 1 => double, 2 => S
112	@union {@ // implicit
113	case int i;
114	case double d;
115	case struct S { int i, j; } s;
116	@};@
117	};
118	\end{cfa}
119	Crucially, the union implies instance storage is shared by all the variant types, and therefore, before a variant type can be used in a statically-typed expression, it must be dynamically discriminated to its current contained type.
120	Hence, knowing which type is in a variant instance is crucial for correctness.
121	Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary.
122	Otherwise, a tag is required to denote the particular type in the variant, and the tag is discriminated at runtime using some form of type pattern-matching, after which the value can be used in a statically-typed expression.
123
124	A less frequent variant case is multiple variants with the same type, which normally requires explicit naming of the tag to disambiguate among the common types.
125	\begin{cquote}
126	\begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
127	\begin{cfa}[morekeywords={variant}]
128	variant VariantCT {
129	case @car@: int i; // explicitly typed
130	case @boat@: int i;
131	case @bridge@: int i;
132	};
133	\end{cfa}
134	&
135	\begin{cfa}[morekeywords={variant}]
136	variant VariantCU {
137	case @car@: ; // empty or unit type
138	case @boat@: ;
139	case @bridge@: ;
140	};
141	\end{cfa}
142	\end{tabular}
143	\end{cquote}
144	Here, the explicit tag name is used to give different meaning to the values in the common @int@ type, \eg the value 3 has different interpretations depending on the tag name.
145	It is even possible to remove the type or use the empty @unit@ type (@struct unit {}@).
146	It is this tag naming that is used to simulate an enumeration.
147
148	Normally, the variant tag is implicitly set by the compiler based on type, but with common types, a tag name is required to resolve type ambiguity.
149	\begin{cfa}
150	Variant v = 3; $\C{// implicitly set tag to 0 based on type of 3}$
151	VariantCT ve = boats.3; $\C{// explicitly set tag to 1 using tag name}$
152	\end{cfa}
153	Type pattern-matching is then used to dynamically test the tag and branch to a section of statically-typed code to safely manipulate the value, \eg:
154	\begin{cquote}
155	\begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
156	\begin{cfa}[morekeywords={match}]
157	@match@( v ) { // know type implicitly or test tag
158	case int { /* only access i field */ }
159	case double { /* only access d field */ }
160	case S { /* only access s field */ }
161	}
162	\end{cfa}
163	&
164	\begin{cfa}[morekeywords={match}]
165	@match@( ve ) {
166	case car: int { /* car interpretation */ }
167	case boat: int { /* boat interpretation */ }
168	case bridge: int { /* bridge interpretation */ }
169	}
170	\end{cfa}
171	\end{tabular}
172	\end{cquote}
173	For safety, some languages require all variant types to be listed or a @default@ case with no field accesses.
174
175	To further strengthen the simulate for an enumeration with different values, each variant type can be a @const@ type or the tag becomes non-opaque, possibly taking advantage of the opaque auto-numbering.
176	\begin{cquote}
177	\begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
178	\begin{cfa}
179	variant Week {
180	case Mon: const int = 0;
181	...
182	case Sat: const int = 5;
183	case Sun: const int = 10;
184	};
185	\end{cfa}
186	&
187	\begin{cfa}
188	variant Week {
189	case Mon: ; // tag auto-numbering
190	...
191	case Sat: ;
192	case @Sun = 10@: ; // directly set tag value
193	};
194	\end{cfa}
195	\end{tabular}
196	\end{cquote}
197	Directly setting the tag implies restrictions, like unique values.
198	In both cases, instances of @Week@ are @const@ (immutable).
199	However, usage between these two types becomes complex.
200	\begin{cfa}
201	Week day = Week.Mon; // sets value or tag depending on type
202	if ( day == Week.Mon ) // dereference value or tag ?
203	\end{cfa}
204	Here, the dereference of @day@ should return the value of the type stored in the variant, never the tag.
205	If it does return the tag, some special meaning must be given to the empty/unit type, especially if a variant contains both regular and unit types.
206
207
208	In general, the enumeration simulation and the variant extensions to support it, are deviating from the normal use of a variant (union) type.
209	As well, the enumeration operations are limited to the available tag operations, \eg pattern matching.
210	While enumerating among tag names is possible:
211	\begin{cfa}[morekeywords={in}]
212	for ( cursor in Week.Mon, Week.Wed, Week.Fri, Week.Sun ) ...
213	\end{cfa}
214	what is the type of @cursor@?
215	If it the tag type (@int@), how is this value used?
216	If it is the variant type, where is the instance variable, which only contains one value.
217	Hence, either enumerating with a variant enumeration is disallowed or some unusual typing rule must be invented to make it work but only in restricted contexts.
218
219	While functional programming systems regularly re-purposing variant types into enumeration types, this process seems contrived and confusing.
220	A variant tag is not an enumeration, it is a discriminant among a restricted set of types stored in a storage block.
221	Hence, there is only a weak equivalence between an enumeration and variant type, justifying a seperate enumeration type in a programming language.
222
223
224	\section{Contributions}
225
226	The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
227	On the surface, enumerations seem like a simple type.
228	However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
229
230	\begin{enumerate}
231	\item
232	overloading
233	\item
234	scoping
235	\item
236	typing
237	\item
238	subset
239	\item
240	inheritance
241	\end{enumerate}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: