Context Navigation

← Previous Change
Next Change →

intro.tex

Timestamp:

Mar 25, 2024, 7:15:30 PM (7 months ago)

Author:

JiadaL <j82liang@…>

Branches:

master

Children:

d734fa1

Parents:

df78cce (diff), bf050c5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:

: 1 edited

doc/theses/jiada_liang_MMath/intro.tex (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/jiada_liang_MMath/intro.tex

-                      rdf78cce
+                      r486caad
 \chapter{Introduction}
+Naming values is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), etc.
+Naming is also commonly used to represent many other numerical phenomenon, such as days of the week, months of a year, floors of a building (basement), specific times (noon, New Years).
+Many programming languages capture this important software engineering capability through a mechanism called an \Newterm{enumeration}.
+An enumeration is similar to other programming-language types by providing a set of constrained values, but adds the ability to name \emph{all} the values in its set.
+Note, all enumeration names must be unique but different names can represent the same value (eight note, quaver), which are synonyms.
+All types in a programming language must have a set of constants, and these constants have \Newterm{primary names}, \eg integral types have constants @-1@, @17@, @12345@, \etc.
+Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integral and floating-point types.
+Hence, each primary constant has a symbolic name referring to its internal representation, and these names are dictated by language syntax related to types.
+In theory, there are an infinite set of primary names per type.
+Specifically, an enumerated type restricts its values to a fixed set of named constants.
+While all types are restricted to a fixed set of values because of the underlying von Neumann architecture, and hence, to a corresponding set of constants, \eg @3@, @3.5@, @3.5+2.1i@, @'c'@, @"abc"@, etc., these values are not named, other than the programming-language supplied constant names.
+\Newterm{Secondary naming} is a common practice in mathematics and engineering, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), MHz (1E6), and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc.
+Many programming languages capture this important software-engineering capability through a mechanism called \Newterm{constant} or \Newterm{literal} naming, where a secondary name is aliased to a primary name.
+In some cases, secondary naming is \Newterm{pure}, where the matching internal representation can be chosen arbitrarily, and only equality operations are available, \eg @O_RDONLY@, @O_WRONLY@, @O_CREAT@, @O_TRUNC@, @O_APPEND@.
+(The names the thing.)
+Because a secondary name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \Newterm{rvalue}\footnote{
+The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}.
+Fundamentally, all enumeration systems have an \Newterm{enumeration} type with an associated set of \Newterm{enumerator} names.
+An enumeration has three universal attributes, \Newterm{position}, \Newterm{label}, and \Newterm{value}, as shown by this representative enumeration, where position and value can be different.
+Secondary names can form an (ordered) set, \eg days of the week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc.
+Many programming languages capture these groupings through a mechanism called an \Newterm{enumeration}.
+\begin{quote}
+enumerate (verb, transitive).
+To count, ascertain the number of;
+\emph{more
+usually, to mention (a number of things or persons) separately, as if for the
+purpose of counting};
+to specify as in a list or catalogue.~\cite{OED}
+\end{quote}
+Within an enumeration set, the enumeration names must be unique, and instances of an enumerated type are restricted to hold only the secondary names.
+It is possible to enumerate among set names without having an ordering among the set elements.
+For example, the week, the weekdays, the weekend, and every second day of the week.
+\begin{cfa}[morekeywords={in}]
+for ( cursor in Mon, Tue, Wed, Thu, Fri, Sat, Sun } ... $\C[3.75in]{// week}$
+for ( cursor in Mon, Tue, Wed, Thu, Fri } ...   $\C{// weekday}$
+for ( cursor in Thu, Fri } ...                                  $\C{// weekend}$
+for ( cursor in Mon, Wed, Fri, Sun } ...                $\C{// every second day of week}\CRT$
+\end{cfa}
+This independence from internal representation allows multiple names to have the same representation (eight note, quaver), giving synonyms.
+A set can have a partial or total ordering, making it possible to compare set elements, \eg Monday is before Friday and Friday is after.
+Ordering allows iterating among the enumeration set using relational operators and advancement, \eg
+\begin{cfa}
+for ( cursor = Monday; cursor @<=@ Friday; cursor = @succ@( cursor ) ) ...
+\end{cfa}
+Here the internal representations for the secondary names are \emph{generated} rather than listing a subset of names.
+\section{Terminology}
+The term \Newterm{enumeration} defines the set of secondary names, and the term \Newterm{enumerator} represents an arbitrary secondary name.
+As well, an enumerated type has three fundamental properties, \Newterm{label}, \Newterm{order}, and \Newterm{value}.
 \begin{cquote}
 \small\sf\setlength{\tabcolsep}{3pt}
 \begin{tabular}{rccccccccccc}
 \it\color{red}enumeration & \multicolumn{7}{c}{\it\color{red}enumerators}       \\
 $\downarrow$\hspace*{25pt} & \multicolumn{7}{c}{$\downarrow$}                           \\
 @enum@ Weekday \{                               & Monday,       & Tuesday,      & Wednesday,    & Thursday,& Friday,    & Saturday,     & Sunday \}; \\
 \it\color{red}position                  & 0                     & 1                     & 2                             & 3                             & 4                     & 5                     & 6                     \\
 \it\color{red}label                             & Monday        & Tuesday       & Wednesday             & Thursday              & Friday        & Saturday      & Sunday        \\
 \it\color{red}value                             & 0                     & 1                     & 2                             & 3                             & 4                     & 5             & 6
+\sf\setlength{\tabcolsep}{3pt}
+\begin{tabular}{rcccccccr}
+\it\color{red}enumeration       & \multicolumn{8}{c}{\it\color{red}enumerators} \\
+$\downarrow$\hspace*{25pt}      & \multicolumn{8}{c}{$\downarrow$}                              \\
+@enum@ Week \{                          & Mon,  & Tue,  & Wed,  & Thu,  & Fri,  & Sat,  & Sun = 42      & \};   \\
+\it\color{red}label                     & Mon   & Tue   & Wed   & Thu   & Fri   & Sat   & Sun           &               \\
+\it\color{red}order                     & 0             & 1             & 2             & 3             & 4             & 5             & 6                     &               \\
+\it\color{red}value                     & 0             & 1             & 2             & 3             & 4             & 5             & 42            &
 \end{tabular}
 \end{cquote}
+Here, the \Newterm{enumeration} @Weekday@ defines the ordered \Newterm{enumerator}s @Monday@, @Tuesday@, @Wednesday@, @Thursday@, @Friday@, @Saturday@ and @Sunday@.
+By convention, the successor of @Tuesday@ is @Monday@ and the predecessor of @Tuesday@ is @Wednesday@, independent of the associated enumerator constant values.
+Because an enumerator is a constant, it cannot appear in a mutable context, \eg @Mon = Sun@ is meaningless, and an enumerator has no address, it is an \Newterm{rvalue}\footnote{
+The term rvalue defines an expression that can only appear on the right-hand side of an assignment.}.
+Here, the enumeration @Week@ defines the enumerator labels @Mon@, @Tue@, @Wed@, @Thu@, @Fri@, @Sat@ and @Sun@.
+The implicit ordering implies the successor of @Tue@ is @Mon@ and the predecessor of @Tue@ is @Wed@, independent of any associated enumerator values.
+The value is the constant represented by the secondary name, which can be implicitly or explicitly set.
+Specifying complex ordering is possible:
+\begin{cfa}
+enum E1 { $\color{red}[\(_1\)$ {A, B}, $\color{blue}[\(_2\)$ C $\color{red}]\(_1\)$, {D, E} $\color{blue}]\(_2\)$ }; $\C{// overlapping square brackets}$
+enum E2 { {A, {B, C} }, { {D, E}, F };  $\C{// nesting}$
+\end{cfa}
+For @E1@, there is the partial ordering among @A@, @B@ and @C@, and @C@, @D@ and @E@, but not among @A@, @B@ and @D@, @E@.
+For @E2@, there is the total ordering @A@ $<$ @{B, C}@ $<$ @{D, E}@ $<$ @F@.
+Only flat total-ordering among enumerators is considered in this work.
+\section{Motivation}
+Some programming languages only provide secondary renaming, which can be simulated by an enumeration without ordering.
+\begin{cfa}
+const Size = 20, Pi = 3.14159;
+enum { Size = 20, Pi = 3.14159 };   // unnamed enumeration $\(\Rightarrow\)$ no ordering
+\end{cfa}
+In both cases, it is possible to compare the secondary names, \eg @Size < Pi@, if that is meaningful;
+however, without an enumeration type-name, it is impossible to create an iterator cursor.
+Secondary renaming can similate an enumeration, but with extra effort.
+\begin{cfa}
+const Mon = 1, Tue = 2, Wed = 3, Thu = 4, Fri = 5, Sat = 6, Sun = 7;
+\end{cfa}
+Furthermore, reordering the enumerators requires manual renumbering.
+\begin{cfa}
+const Sun = 1, Mon = 2, Tue = 3, Wed = 4, Thu = 5, Fri = 6, Sat = 7;
+\end{cfa}
+Finally, there is no common type to create a type-checked instance or iterator cursor.
+Hence, there is only a weak equivalence between secondary naming and enumerations, justifying the enumeration type in a programming language.
+A variant (algebraic) type is often promoted as a kind of enumeration, \ie a varient type can simulate an enumeration.
+A variant type is a tagged-union, where the possible types may be heterogeneous.
+\begin{cfa}
+@variant@ Variant {
+        @int tag;@  // optional/implicit: 0 => int, 1 => double, 2 => S
+        @union {@ // implicit
+                case int i;
+                case double d;
+                case struct S { int i, j; } s;
+        @};@
+};
+\end{cfa}
+Crucially, the union implies instance storage is shared by all of the variant types.
+Hence, a variant is dynamically typed, as in a dynamic-typed programming-language, but the set of types is statically bound, similar to some aspects of dynamic gradual-typing~\cite{Gradual Typing}.
+Knowing which type is in a variant instance is crucial for correctness.
+Occasionally, it is possible to statically determine all regions where each variant type is used, so a tag and runtime checking is unnecessary;
+otherwise, a tag is required to denote the particular type in the variant and the tag checked at runtime using some form of type pattern-matching.
+The tag can be implicitly set by the compiler on assignment, or explicitly set by the program\-mer.
+Type pattern-matching is then used to dynamically test the tag and branch to a section of code to safely manipulate the value, \eg:
+\begin{cfa}[morekeywords={match}]
+Variant v = 3;  // implicitly set tag to 0
+@match@( v ) {    // know the type or test the tag
+        case int { /* only access i field in v */ }
+        case double { /* only access d field in v */ }
+        case S { /* only access s field in v */ }
+}
+\end{cfa}
+For safety, either all variant types must be listed or a @default@ case must exist with no field accesses.
+To simulate an enumeration with a variant, the tag is \emph{re-purposed} for either ordering or value and the variant types are omitted.
+\begin{cfa}
+variant Weekday {
+        int tag; // implicit 0 => Mon, ..., 6 => Sun
+        @case Mon;@ // no type
+        ...
+        @case Sun;@
+};
+\end{cfa}
+The type system ensures tag setting and testing are correctly done.
+However, the enumeration operations are limited to the available tag operations, \eg pattern matching.
+\begin{cfa}
+Week week = Mon;
+if ( @dynamic_cast(Mon)@week ) ... // test tag == Mon
+\end{cfa}
+While enumerating among tag names is possible:
+\begin{cfa}[morekeywords={in}]
+for ( cursor in Mon, Wed, Fri, Sun ) ...
+\end{cfa}
+ordering for iteration would require a \emph{magic} extension, such as a special @enum@ variant, because it has no meaning for a regular variant, \ie @int@ < @double@.
+However, if a special @enum@ variant allows the tags to be heterogeneously typed, ordering must fall back on case positioning, as many types have incomparable values.
+Iterating using tag ordering and heterogeneous types, also requires pattern matching.
+\begin{cfa}[morekeywords={match}]
+for ( cursor = Mon; cursor <= Fri; cursor = succ( cursor) ) {
+        match( cursor ) {
+                case Mon { /* access special type for Mon */ }
+                ...
+                case Fri { /* access special type for Fri */ }
+                default
+        }
+}
+\end{cfa}
+If the variant type is changed by adding/removing types or the loop range changes, the pattern matching must be adjusted.
+As well, if the start/stop values are dynamic, it may be impossible to statically determine if all variant types are listed.
+Re-purposing the notion of enumerating into variant types is ill formed and confusing.
+Hence, there is only a weak equivalence between an enumeration and variant type, justifying the enumeration type in a programming language.
 \section{Contributions}
+The goal of this work is to to extend the simple and unsafe enumeration type in the C programming-language into a sophisticated and safe type in the \CFA programming-language, while maintain backwards compatibility with C.
+On the surface, enumerations seem like a simple type.
+However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation.
+\begin{enumerate}
+\item
+overloading
+\item
+scoping
+\item
+typing
+\item
+subset
+\item
+inheritance
+\end{enumerate}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 486caad for doc/theses/jiada_liang_MMath/intro.tex

Legend:

doc/theses/jiada_liang_MMath/intro.tex

Download in other formats: