Context Navigation

source: doc/theses/jiada_liang_MMath/intro.tex@ bf4fe05

Visit:

Last change on this file since bf4fe05 was ec20ab9, checked in by Peter A. Buhr <pabuhr@…>, 18 months ago
small updates, and more proofreading of the related-works chapter
Property mode set to `100644`
File size: 14.1 KB

Rev	Line
[18ebc28]	1	\chapter{Introduction}
	2
[caaf424]	3	All types in a programming language must have a set of constants, and these constants have \newterm{primary names}, \eg integral types have constants @-1@, @17@, @0xff@, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}}, \etc.
[4da9142]	4	Con\-stants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
	5	(In \CFA, the primary constants @0@ and @1@ can be overloaded for any type.)
[caf2cba]	6	Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
[314c9d8]	7	In theory, there are an infinite set of primary constant names per type.
[18ebc28]	8
[caaf424]	9	\newterm{Secondary naming} is a common practice in mathematics, engineering and computer science, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MB (megabyte, 1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
	10	Many programming languages capture this important software-engineering capability through a mechanism called \newterm{constant} or \newterm{literal} naming, where a secondary name is aliased to a primary name.
[314c9d8]	11	Its purpose is for readability and to eliminate duplication of the primary constant throughout a program.
	12	For example, a meaningful secondary name replaces a primary name throughout a program;
	13	thereafter, changing the binding of the secondary to primary name automatically distributes the rebinding, preventing errors.
[caaf424]	14	In some cases, secondary naming is \newterm{opaque}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
	15	Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \newterm{rvalue}\footnote{
[caf2cba]	16	The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
	17
[314c9d8]	18	Secondary names can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
[caaf424]	19	Many programming languages capture these groupings through a mechanism called an \newterm{enumeration}.
[caf2cba]	20	\begin{quote}
	21	enumerate (verb, transitive).
	22	To count, ascertain the number of;
[4da9142]	23	more usually, to mention (a number of things or persons) separately, as if for the purpose of counting;
	24	to specify as in a list or catalogue.~\cite{OEDenumerate}
[caf2cba]	25	\end{quote}
[4da9142]	26	Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are \emph{often} restricted to hold only the secondary names.
[caf2cba]	27	It is possible to enumerate among set names without having an ordering among the set elements.
[48b76d03]	28	For example, the week, the weekdays, the weekend, and every second day of the week.
	29	\begin{cfa}[morekeywords={in}]
	30	for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
	31	for ( cursor in Mon, Tue, Wed, Thu, Fri } ... $\C{// weekday}$
[4da9142]	32	for ( cursor in Sat, Sun } ... $\C{// weekend}$
[48b76d03]	33	for ( cursor in Mon, Wed, Fri, Sun } ... $\C{// every second day of week}\CRT$
	34	\end{cfa}
[4da9142]	35	This independence from internal representation allows multiple names to have the same representation (eighth note, quaver), giving synonyms.
[48b76d03]	36	A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
[314c9d8]	37	Ordering allows iterating among the enumeration set using relational operators and advancement, \eg:
[caf2cba]	38	\begin{cfa}
	39	for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
	40	\end{cfa}
[314c9d8]	41	Here the internal representation for the secondary names are logically \emph{generated} rather than listing a subset of names.
[4da9142]	42
	43	Hence, the fundamental aspects of an enumeration are:
	44	\begin{enumerate}
	45	\item
[314c9d8]	46	\begin{sloppypar}
	47	It provides a finite set of secondary names, which become its primary constants.
	48	This differentiates an enumeration from general types with an infinite set
	49	of primary constants.
	50	\end{sloppypar}
[4da9142]	51	\item
[314c9d8]	52	The secondary names are constants, which follows transitively from their binding (aliasing) to primary names, which are constants.
[4da9142]	53	\item
[314c9d8]	54	Defines a type for generating instants (variables).
[4da9142]	55	\item
[314c9d8]	56	For safety, an enumeration instance should be restricted to hold only its type's secondary names.
[4da9142]	57	\item
	58	There is a mechanism for \emph{enumerating} over the secondary names, where the ordering can be implicit from the type, explicitly listed, or generated arithmetically.
	59	\end{enumerate}
[48b76d03]	60
[956299b]	61
[48b76d03]	62	\section{Terminology}
[4da9142]	63	\label{s:Terminology}
[48b76d03]	64
[caaf424]	65	The term \newterm{enumeration} defines a type with a set of secondary names, and the term \newterm{enumerator} represents an arbitrary secondary name \see{\VRef{s:CEnumeration} for the name derivation}.
	66	As well, an enumerated type can have three fundamental properties, \newterm{label}, \newterm{order}, and \newterm{value}.
[956299b]	67	\begin{cquote}
[48b76d03]	68	\sf\setlength{\tabcolsep}{3pt}
[caf2cba]	69	\begin{tabular}{rcccccccr}
	70	\it\color{red}enumeration & \multicolumn{8}{c}{\it\color{red}enumerators} \\
[314c9d8]	71	$\downarrow$\hspace*{15pt} & \multicolumn{8}{c}{$\downarrow$} \\
	72	@enum@ Week \{ & Mon, & Tue, & Wed, & Thu, & Fri, & Sat, & Sun {\color{red}= 42} & \}; \\
[48b76d03]	73	\it\color{red}label & Mon & Tue & Wed & Thu & Fri & Sat & Sun & \\
	74	\it\color{red}order & 0 & 1 & 2 & 3 & 4 & 5 & 6 & \\
[314c9d8]	75	\it\color{red}value & 0 & 1 & 2 & 3 & 4 & 5 & {\color{red}42} &
[956299b]	76	\end{tabular}
	77	\end{cquote}
[41fb996]	78	Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
[caf2cba]	79	The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
[48b76d03]	80	The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
	81
[caf2cba]	82	Specifying complex ordering is possible:
	83	\begin{cfa}
	84	enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
	85	enum E2 { {A, {B, C} }, { {D, E}, F }; $\C{// nesting}$
	86	\end{cfa}
	87	For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
	88	For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
	89	Only flat total-ordering among enumerators is considered in this work.
[7d9a805b]	90
	91
[48b76d03]	92	\section{Motivation}
[caf2cba]	93
[314c9d8]	94	Many programming languages provide an enumeration-like mechanism, which may or may not cover the previous five fundamental enumeration aspects.
	95	Hence, the term \emph{enumeration} can be confusing and misunderstood.
	96	Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which featuring is being used.
	97	This section discusses some language features that are sometimes called an enumeration but do not provide all enumeration aspects.
	98
	99
	100	\subsection{Aliasing}
[f6321173]	101	\label{s:Aliasing}
[314c9d8]	102
	103	Some languages provide simple secondary aliasing (renaming), \eg:
[caf2cba]	104	\begin{cfa}
[4da9142]	105	const Size = 20, Pi = 3.14159, Name = "Jane";
[caf2cba]	106	\end{cfa}
[314c9d8]	107	The secondary name is logically replaced in the program text by its corresponding primary name.
	108	Therefore, it is possible to compare the secondary names, \eg @Size < Pi@, only because the primary constants allow it, whereas \eg @Pi < Name@ might be disallowed depending on the language.
[caf2cba]	109
[314c9d8]	110	Aliasing is not macro substitution, \eg @#define Size 20@, where a name is replaced by its value \emph{before} compilation, so the name is invisible to the programming language.
	111	With aliasing, each secondary name is part of the language, and hence, participates fully, such as name overloading in the type system.
	112	Aliasing is not an immutable variable, \eg:
	113	\begin{cfa}
	114	extern @const@ int Size = 20;
	115	extern void foo( @const@ int @&@ size );
	116	foo( Size ); // take the address of (reference) Size
	117	\end{cfa}
[caaf424]	118	Taking the address of an immutable variable makes it an \newterm{lvalue}, which implies it has storage.
[314c9d8]	119	With separate compilation, it is necessary to choose one translation unit to perform the initialization.
	120	If aliasing does require storage, its address and initialization are opaque (compiler only), similar to \CC rvalue reference @&&@.
	121
	122	Aliasing does provide readability and automatic resubstitution.
	123	It also provides simple enumeration properties, but with extra effort.
[caf2cba]	124	\begin{cfa}
	125	const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
	126	\end{cfa}
[4da9142]	127	Any reordering of the enumerators requires manual renumbering.
[caf2cba]	128	\begin{cfa}
	129	const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
	130	\end{cfa}
[314c9d8]	131	For these reasons, aliasing is sometimes called an enumeration.
	132	However, there is no type to create a type-checked instance or iterator cursor, so there is no ability for enumerating.
	133	Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language.
	134
	135
	136	\subsection{Algebraic Data Type}
[ec20ab9]	137	\label{s:AlgebraicDataType}
[4da9142]	138
[314c9d8]	139	An algebraic data type (ADT)\footnote{ADT is overloaded with abstract data type.} is another language feature often linked with enumeration, where an ADT conjoins an arbitrary type, possibly a \lstinline[language=C++]{class} or @union@, and a named constructor.
	140	For example, in Haskell:
	141	\begin{haskell}
	142	data S = S { i::Int, d::Double } $\C{// structure}$
	143	data @Foo@ = A Int \| B Double \| C S $\C{// ADT, composed of three types}$
	144	foo = A 3; $\C{// type Foo is inferred}$
	145	bar = B 3.5
	146	baz = C S{ i = 7, d = 7.5 }
	147	\end{haskell}
	148	the ADT has three variants (constructors), @A@, @B@, @C@ with associated types @Int@, @Double@, and @S@.
	149	The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@.
	150	Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to access the value using dynamic pattern-matching.
[4da9142]	151	\begin{cquote}
[314c9d8]	152	\setlength{\tabcolsep}{15pt}
	153	\begin{tabular}{@{}ll@{}}
	154	\begin{haskell}
	155	prtfoo val = -- function
	156	-- pattern match on constructor
	157	case val of
	158	@A@ a -> print a
	159	@B@ b -> print b
	160	@C@ (S i d) -> do
	161	print i
	162	print d
	163	\end{haskell}
[4da9142]	164	&
[314c9d8]	165	\begin{haskell}
	166	main = do
	167	prtfoo foo
	168	prtfoo bar
	169	prtfoo baz
	170	3
	171	3.5
	172	7
	173	7.5
	174	\end{haskell}
[4da9142]	175	\end{tabular}
	176	\end{cquote}
[314c9d8]	177	For safety, most languages require all assocaited types to be listed or a default case with no field accesses.
	178
	179	A less frequent case is multiple constructors with the same type.
	180	\begin{haskell}
	181	data Bar = X Int \| Y Int \| Z Int;
	182	foo = X 3;
	183	bar = Y 3;
	184	baz = Z 5;
	185	\end{haskell}
	186	Here, the constructor name gives different meaning to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching.
	187
[caaf424]	188	Note, the term \newterm{variant} is often associated with ADTs.
[314c9d8]	189	However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}.
	190	In these languages, the variant is often a union using RTTI tags, which cannot be used to simulate an enumeration.
	191	Hence, in this work the term variant is not a synonym for ADT.
	192
	193	% https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.19.1.0-179c/GHC-Enum.html
	194	% https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-Enum.html
[48b76d03]	195
[314c9d8]	196	The association between ADT and enumeration occurs if all the constructors have a unit (empty) type, \eg @struct unit {}@.
	197	Note, the unit type is not the same as \lstinline{void}, \eg:
[4da9142]	198	\begin{cfa}
[314c9d8]	199	void foo( void );
	200	struct unit {} u; // empty type
	201	unit bar( unit );
	202	foo( foo() ); // void argument does not match with void parameter
	203	bar( bar( u ) ); // unit argument does match with unit parameter
[4da9142]	204	\end{cfa}
[caf2cba]	205
[314c9d8]	206	For example, in the Haskell ADT:
	207	\begin{haskell}
	208	data Week = Mon \| Tue \| Wed \| Thu \| Fri \| Sat \| Sun deriving(Enum, Eq, Show)
	209	\end{haskell}
	210	the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other type, @Eq@ allows equality comparison, and @Show@ is for printing.
	211	The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provides enumerating operations @succ@, @pred@, @enumFrom@ @enumFromTo@.
	212	\VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating).
	213
	214	\begin{figure}
[4da9142]	215	\begin{cquote}
[314c9d8]	216	\setlength{\tabcolsep}{15pt}
	217	\begin{tabular}{@{}ll@{}}
	218	\begin{haskell}
	219	day = Tue
	220	main = do
	221	if day == Tue then
	222	print day
	223	else
	224	putStr "not Tue"
	225	print (enumFrom Mon) -- week
	226	print (enumFromTo Mon Fri) -- weekday
	227	print (enumFromTo Sat Sun) -- weekend
	228	\end{haskell}
[4da9142]	229	&
[314c9d8]	230	\begin{haskell}
	231	Tue
	232	[Mon,Tue,Wed,Thu,Fri,Sat,Sun]
	233	[Mon,Tue,Wed,Thu,Fri]
	234	[Sat,Sun]
[caf2cba]	235
	236
[4da9142]	237
[314c9d8]	238
	239
	240	\end{haskell}
	241	\end{tabular}
	242	\end{cquote}
	243	\caption{Haskell Enumeration}
	244	\label{f:HaskellEnumeration}
	245	\end{figure}
	246
	247	The key observation is the dichotomy between an ADT and enumeration: the ADT uses the associated type resulting in a union-like data structure, and the enumeration does not use the associated type, and hence, is not a union.
	248	While the enumeration is constructed using the ADT mechanism, it is so restricted it is not really an ADT.
	249	Furthermore, a general ADT cannot be an enumeration because the constructors generate different values making enumerating meaningless.
	250	While functional programming languages regularly repurpose the ADT type into an enumeration type, this process seems contrived and confusing.
	251	Hence, there is only a weak equivalence between an enumeration and ADT, justifying a separate enumeration type in a programming language.
[caf2cba]	252
[f9da761]	253
	254	\section{Contributions}
[7d9a805b]	255
[314c9d8]	256	The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C.
[caf2cba]	257	On the surface, enumerations seem like a simple type.
[41fb996]	258	However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
[48b76d03]	259
[314c9d8]	260	The contribution of this work are:
[48b76d03]	261	\begin{enumerate}
	262	\item
	263	overloading
	264	\item
	265	scoping
	266	\item
	267	typing
	268	\item
[314c9d8]	269	subseting
[48b76d03]	270	\item
	271	inheritance
	272	\end{enumerate}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: