Changeset ab11ab1
- Timestamp:
- Aug 8, 2024, 3:51:52 PM (5 weeks ago)
- Branches:
- master
- Children:
- c4aca65
- Parents:
- 5b4c8df
- Location:
- doc/theses/jiada_liang_MMath
- Files:
-
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/jiada_liang_MMath/background.tex
r5b4c8df rab11ab1 89 89 Here, the aliased constants are: 20, 10, 20, 21, and -7. 90 90 Direct initialization is by a compile-time expression generating a constant value. 91 Indirect initialization (without initializer, @Max10Plus1@) is called \newterm{auto-initialization}, where enumerators are assigned values from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.91 Indirect initialization (without an initializer, @Max10Plus1@) is called \newterm{auto-initialization}, where enumerators are assigned values from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@. 92 92 Because multiple independent enumerators can be combined, enumerators with the same values can occur. 93 93 The enumerators are rvalues, so assignment is disallowed. … … 112 112 enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun }; 113 113 \end{clang} 114 and adopts the same semantics with respect to direct and auto intialization.114 and adopts the same semantics as direct and auto initialization. 115 115 For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@. 116 116 As well, initialization may occur in any order. … … 123 123 Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{ 124 124 A terminating comma appears in other C syntax, \eg the initializer list.} 125 This feature allow enumerator lines to be interchanged without moving a comma.125 This feature allows enumerator lines to be interchanged without moving a comma. 126 126 Named enumerators are also unscoped. 127 127 … … 136 136 A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11} 137 137 \end{quote} 138 However, @int@ means a4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.138 However, @int@ means 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture. 139 139 Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures. 140 140 \VRef[Figure]{f:gccEnumerationStorageSize} shows both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization. … … 204 204 \end{cfa} 205 205 Here, serendipitously the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@. 206 This @N@ is often used as the dimension for an array assoc ated with the enumeration.206 This @N@ is often used as the dimension for an array associated with the enumeration. 207 207 \begin{cfa} 208 208 E array[@N@]; … … 235 235 \section{\texorpdfstring{\CFA}{Cforall}} 236 236 237 \CFA in \emph{not} an object-oriented programming-language, \ie functions cannot be nested in aggregate types, and hence, there is no \newterm{receiver} notation for calling functions, \eg @obj.method(...)@, where the first argument proceeds the call and becomes an 237 \CFA in \emph{not} an object-oriented programming-language, \ie functions cannot be nested in aggregate types, and hence, there is no \newterm{receiver} notation for calling functions, \eg @obj.method(...)@, where the first argument proceeds the call and becomes an implicit first (\lstinline[language=C++]{this}) parameter. 238 238 The following sections provide short descriptions of \CFA features needed further in the thesis. 239 239 Other \CFA features are presented in-situ with short explanations, or no explanation because the feature is obvious to C programmers. … … 263 263 s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$ 264 264 \end{cfa} 265 The type system examines each call size and selects the best matching overloaded function based on the number and types of thearguments.265 The type system examines each call size and selects the best matching overloaded function based on the number and types of arguments. 266 266 If there are mixed-mode operands, @2 + 3.5@, the type system, like in C/\CC, attempts (safe) conversions, converting the argument type(s) to the parameter type(s). 267 267 … … 308 308 \subsection{Constructor and Destructor} 309 309 310 While \CFA i n not objectoriented, it adopts many language features commonly used in object-oriented languages;311 these features are independent fromobject-oriented programming.310 While \CFA is not object-oriented, it adopts many language features commonly used in object-oriented languages; 311 these features are independent of object-oriented programming. 312 312 313 313 All objects in \CFA are initialized by @constructors@ \emph{after} allocation and de-initialized \emph{before} deallocation. … … 378 378 At the call size, the type parameter @T@ is bounded to @int@ from the argument @42@. 379 379 380 For polymorphic functions to be useful, the @forall@ clause needs \newterm{type assertion}s that restrict sthe polymorphic types it accepts.380 For polymorphic functions to be useful, the @forall@ clause needs \newterm{type assertion}s that restrict the polymorphic types it accepts. 381 381 \begin{cfa} 382 382 forall( T @| { void foo( T ); }@ ) void bar( T t ) { @foo( t );@ } … … 392 392 393 393 A @forall@ clause can assert many restrictions on multiple types. 394 A common practice is to refactor the assertions into a named \newterm{trait}, similar to other l naguages, like Go and Rust.394 A common practice is to refactor the assertions into a named \newterm{trait}, similar to other languages, like Go and Rust. 395 395 \begin{cfa} 396 396 forall(T) trait @Bird@ { … … 416 416 When multiple best matches exist, the resolution is ambiguous. 417 417 418 The \CFA resolver attempts to identi ty abest candidate based on: first, the number of parameters and types, and second, when no exact match exists, the fewest implicit conversions and polymorphic variables.418 The \CFA resolver attempts to identify the best candidate based on: first, the number of parameters and types, and second, when no exact match exists, the fewest implicit conversions and polymorphic variables. 419 419 Finding an exact match is not discussed here, because the mechanism is fairly straightforward, even when the search space is large; 420 420 only finding a non-exact match is discussed in detail. … … 425 425 426 426 Most programming languages perform some implicit conversions among basic types to facilitate mixed-mode arithmetic; 427 otherwise, the program becomes littered with many explicit casts, which does not match with programmer expectation.427 otherwise, the program becomes littered with many explicit casts, which does not match with programmer's expectations. 428 428 C is an aggressive language as it provides conversions among almost all of the basic types, even when the conversion is potentially unsafe or not meaningful, \ie @float@ to @bool@. 429 429 C defines the resolution pattern as ``usual arithmetic conversion''~\cite[\S~6.3.1.8]{C11}, in which C looks for a \newterm{common type} between operands, and converts one or both operands to the common type. 430 Loosely defined, a common type is a the smallest type in terms ofsize of representation that both operands can be converted into without losing their precision, called a \newterm{widening} or \newterm{safe conversion}.430 Loosely defined, a common type is the smallest type in terms of the size of representation that both operands can be converted into without losing their precision, called a \newterm{widening} or \newterm{safe conversion}. 431 431 432 432 \CFA generalizes ``usual arithmetic conversion'' to \newterm{conversion cost}. … … 441 441 \end{enumerate} 442 442 Sum of degree is a method to quantify C's integer and floating-point rank. 443 Every pair of widening conversion types is assigned a \newterm{distance}, and distance between the two same typeis 0.444 For example, the distance from @char@ to @int@ is 2, distance from @int@ to @long@ is 1, anddistance from @int@ to @long long int@ is 2.443 Every pair of widening conversion types is assigned a \newterm{distance}, and the distance between the two same types is 0. 444 For example, the distance from @char@ to @int@ is 2, the distance from @int@ to @long@ is 1, and the distance from @int@ to @long long int@ is 2. 445 445 This distance does not mirror C's rank system. 446 446 For example, the rank of @char@ and @signed char@ are the same in C, but the distance from @char@ to @signed char@ is assigned 1. 447 @safe@ cost is summing all pairs of argument to parameter safe conversion distances.447 @safe@ cost is summing all pairs of arguments to parameter safe conversion distances. 448 448 Among the three costs in Bilson's model, @unsafe@ is the most significant cost and @safe@ is the least significant, with an implication that \CFA always choose a candidate with the lowest @unsafe@, if possible. 449 449 450 450 For example, assume the overloaded function @foo@ is called with two @int@ parameter. 451 The cost for every overloaded @foo@ has been list along:451 The cost for every overloaded @foo@ has been listed along: 452 452 \begin{cfa} 453 453 void foo( char, char ); $\C[2.5in]{// (1) (2, 0, 0)}$ … … 462 462 foo( i, j ); $\C{// convert j to long and call (8)}\CRT$ 463 463 \end{cfa} 464 The overloaded instances are ordered from the highest to the lowest cost, and \CFA select the last candidate (8).464 The overloaded instances are ordered from the highest to the lowest cost, and \CFA selects the last candidate (8). 465 465 466 466 In the next iteration of \CFA, Schluntz and Aaron~\cite{Moss18} expanded conversion cost to a 7-tuple with 4 additional categories, @(unsafe, poly, safe, sign, vars, specialization, reference)@, with the following interpretations: … … 469 469 \item \textit{Poly} 470 470 \item \textit{Safe} 471 \item \textit{Sign} is the number of sign/unsign variable conversion .472 \item \textit{Vars} is the number of polymorphic s type variable.473 \item \textit{Specialization} is negative value of the number of type assertion.471 \item \textit{Sign} is the number of sign/unsign variable conversions. 472 \item \textit{Vars} is the number of polymorphic type variables. 473 \item \textit{Specialization} is the negative value of the number of type assertions. 474 474 \item \textit{Reference} is number of reference-to-rvalue conversion. 475 475 \end{itemize} 476 476 The extended conversion-cost model looks for candidates that are more specific and less generic. 477 477 @vars@ disambiguates @forall( T, V ) foo( T, V )@ and @forall( T ) void foo( T, T )@, where the extra type parameter @V@ makes is more generic. 478 A more generic type means lessconstraints on its parameter types.478 A more generic type means fewer constraints on its parameter types. 479 479 \CFA favours candidates with more restrictions on polymorphism, so @forall( T ) void foo( T, T )@ has lower cost. 480 480 @specialization@ is an arbitrary count-down value starting at zero. … … 483 483 484 484 \CFA defines two special cost values: @zero@ and @infinite@. 485 A conversion cost is @zero@ when argument and parameter hasan exact match, and a conversion cost is @infinite@ when there is no defined conversion between two types.485 A conversion cost is @zero@ when the argument and parameter have an exact match, and a conversion cost is @infinite@ when there is no defined conversion between two types. 486 486 For example, the conversion cost from @int@ to a @struct S@ is @infinite@. 487 487 488 488 In \CFA, the meaning of a C-style cast is determined by its @Cast Cost@. 489 For most cast-expression resolution , a cast cost is equal to a conversion cost.489 For most cast-expression resolutions, a cast cost is equal to a conversion cost. 490 490 Cast cost exists as an independent matrix for conversion that cannot happen implicitly, while being possible with an explicit cast. 491 These conversions are often defined to have infinite conversion cost andnon-infinite cast cost.491 These conversions are often defined to have a infinite conversion cost and a non-infinite cast cost. -
doc/theses/jiada_liang_MMath/intro.tex
r5b4c8df rab11ab1 1 1 \chapter{Introduction} 2 2 3 All basic types in a programming language have a set of constants (symbols), and these constants represent computable values, \eg integer types have constants @-1@, @17@, @0xff@ representing whole numbers, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@ representing 3 All basic types in a programming language have a set of constants (symbols), and these constants represent computable values, \eg integer types have constants @-1@, @17@, @0xff@ representing whole numbers, floating-point types have constants @5.3@, @2.3E-5@, @0xff.ffp0@ representing real numbers, character types have constants @'a'@, @"abc\n"@, \mbox{\lstinline{u8"}\texttt{\guillemotleft{na\"{i}ve}\guillemotright}\lstinline{"}} representing (human readable) text, \etc. 4 4 Constants can be overloaded among types, \eg @0@ is a null pointer for all pointer types, and the value zero for integer and floating-point types. 5 5 (In \CFA, the constants @0@ and @1@ can be overloaded for any type.) … … 12 12 A constant's symbolic name is dictated by language syntax related to types, \eg @5.@ (double), @5.0f@ (float), @5l@ (long double). 13 13 In general, the representation of a constant's value is \newterm{opaque}, so the internal representation can be chosen arbitrarily, \eg two's complement, IEEE floating-point. 14 In theory, there arean infinite set of constant names per type representing an infinite set of values.14 In theory, there is an infinite set of constant names per type representing an infinite set of values. 15 15 16 16 It is common in mathematics, engineering, and computer science to alias new constants to existing constants so they have the same value, \eg $\pi$, $\tau$ (2$\pi$), $\phi$ (golden ratio), K(k), M, G, T for powers of 2\footnote{Overloaded with SI powers of 10.} often prefixing bits (b) or bytes (B), \eg Gb, MB, and in general situations, \eg specific times (noon, New Years), cities (Big Apple), flowers (Lily), \etc. … … 23 23 Because an aliased name is a constant, it cannot appear in a mutable context, \eg \mbox{$\pi$ \lstinline{= 42}} is meaningless, and a constant has no address, \ie it is an \newterm{rvalue}\footnote{ 24 24 The term rvalue defines an expression that can only appear on the right-hand side of an assignment expression.}. 25 In theory, there are an infinite set of possible aliasing,in practice, the number of aliasing per program is finite and small.25 In theory, there is an infinite set of possible aliasing; in practice, the number of aliasing per program is finite and small. 26 26 27 27 Aliased constants can form an (ordered) set, \eg days of a week, months of a year, floors of a building (basement, ground, 1st), colours in a rainbow, \etc. … … 59 59 \end{sloppypar} 60 60 \item 61 The alias names are constants, which follow stransitively from their binding to other constants.61 The alias names are constants, which follow transitively from their binding to other constants. 62 62 \item 63 63 Defines a type for generating instants (variables). … … 105 105 Hence, the term \emph{enumeration} can be confusing and misunderstood. 106 106 Furthermore, some languages conjoin the enumeration with other type features, making it difficult to tease apart which feature is being used. 107 This section discusses some language features that are sometimes called anenumeration but do not provide all enumeration aspects.107 This section discusses some language features that are sometimes called enumeration but do not provide all enumeration aspects. 108 108 109 109 … … 140 140 \end{cfa} 141 141 For these reasons, aliasing is sometimes called an enumeration. 142 However, there is no type to create a type-checked instance or iterator cursor, so there is no ability for enumerating.142 However, there is no type to create a type-checked instance or iterator cursor, so there is no ability to enumerate. 143 143 Hence, there are multiple enumeration aspects not provided by aliasing, justifying a separate enumeration type in a programming language. 144 144 … … 158 158 the ADT has three variants (constructors), @A@, @B@, @C@, with associated types @Int@, @Double@, and @S@. 159 159 The constructors create an initialized value of the specific type that is bound to the immutable variables @foo@, @bar@, and @baz@. 160 Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to in tialize and access the value using dynamic pattern-matching.160 Hence, the ADT @Foo@ is like a union containing values of the associated types, and a constructor name is used to initialize and access the value using dynamic pattern-matching. 161 161 \begin{cquote} 162 162 \setlength{\tabcolsep}{20pt} … … 194 194 baz = Z 5; 195 195 \end{haskell} 196 Here, the constructor name gives different meaning to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching.196 Here, the constructor name gives different meanings to the values in the common \lstinline[language=Haskell]{Int} type, \eg the value @3@ has different interpretations depending on the constructor name in the pattern matching. 197 197 198 198 Note, the term \newterm{variant} is often associated with ADTs. 199 199 However, there are multiple languages with a @variant@ type that is not an ADT \see{Algol68~\cite{Algol68} or \CC \lstinline{variant}}. 200 Here, the type (and possibly the position for equivalent types) is used to discrimina ntthe specific \emph{variant} within the variant instance.200 Here, the type (and possibly the position for equivalent types) is used to discriminate the specific \emph{variant} within the variant instance. 201 201 For example, \VRef[Figure]{f:C++variant} shows the \CC equivalent of the two Haskell ADT types using variant types. 202 202 In these languages, the variant cannot be used to simulate an enumeration. … … 246 246 data Week = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving(Enum, Eq, Show) 247 247 \end{haskell} 248 the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other associated types ,@Eq@ allows equality comparison, and @Show@ is for printing.249 The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provide senumerating operations @succ@, @pred@, @enumFrom@, @enumFromTo@.248 the default type for each constructor is the unit type, and deriving from @Enum@ enforces no other associated types. The @Eq@ allows equality comparison, and @Show@ is for printing. 249 The nullary constructors for the unit types are numbered left-to-right from $0$ to @maxBound@$- 1$, and provide enumerating operations @succ@, @pred@, @enumFrom@, @enumFromTo@. 250 250 \VRef[Figure]{f:HaskellEnumeration} shows enumeration comparison and iterating (enumerating). 251 251 … … 296 296 However, when extended with advanced features, enumerations become complex for both the type system and the runtime implementation. 297 297 298 The contribution of this work are:298 The contributions of this work are: 299 299 \begin{enumerate} 300 300 \item … … 303 303 overloading: Provide a pattern to overload functions, literals, and variables for polymorphic enumerations using the \CFA type system. 304 304 \item 305 scoping: Add a name 305 scoping: Add a namespace for enumerations and qualified access into the namespace to deal with the naming problem. 306 306 \item 307 307 generalization: Support all language types for enumerators with associated values providing enumeration constants for any type. -
doc/theses/jiada_liang_MMath/uw-ethesis-frontpgs.tex
r5b4c8df rab11ab1 141 141 One of the key properties of an enumeration is the ability to enumerate (iterate) through the constants, and hence, access their values, if present. 142 142 C restricts an enumeration to the integral type @signed int@, while \CC extends enumerations to all integral types, meaning enumeration names must bind to integer constants. 143 Other modern programming languages provide bindings to any type and additional features to extend enumeration capabilities for better software -engineering practices.144 145 The \CFA (C-for-all) programming language is an evolutionary refinement of the C program ing language.146 One of its distinctive feature is a parametric-polymorphic generic type.147 However, legacy data types from C, such as enumerations, do not adapt well into the \CFA generic type-system.143 Other modern programming languages provide bindings to any type and additional features to extend enumeration capabilities for better software engineering practices. 144 145 The \CFA (C-for-all) programming language is an evolutionary refinement of the C programming language. 146 One of its distinctive features is a parametric-polymorphic generic type. 147 However, legacy data types from C, such as enumerations, do not adapt well to the \CFA generic type system. 148 148 149 149 This thesis extends the simple and unsafe enumeration type in the C programming language into a complex and safe enumeration type in the \CFA programming-language, while maintaining backwards compatibility with C. 150 150 The major contribution is an adaptation of enumerated types with the \CFA type-system in a way that integrates naturally with the generic types. 151 This thesis also presents a number of smaller refinementto the \CFA overload resolution rules for enumerated types, each of which improves the intuitive nature of enumeration name resolution by the compiler.151 This thesis also presents several smaller refinements to the \CFA overload resolution rules for enumerated types, each of which improves the intuitive nature of enumeration name resolution by the compiler. 152 152 Finally, this work adds other useful features to enumerations that better support software-engineering practices and simplify program development. 153 153 … … 166 166 Thanks to Gregor Richards and Yzihou Zhang for reading my thesis. 167 167 168 Special thanks to Andrew James Beach for your insight on the theory development o nthe thesis.168 Special thanks to Andrew James Beach for your insight on the theory development of the thesis. 169 169 170 170 Thanks to Michael Brooks, Fangran Yu, Colby Parsons, Thierry Delisle, Mubeen Zulifiqar, 171 and the entire \CFA team for development of the \CFA language, making it the best language it can be.171 and the entire \CFA team for the development of the \CFA language, making it the best language it can be. 172 172 173 173 Finally, a special thank you to Huawei Canada for funding this work.
Note: See TracChangeset
for help on using the changeset viewer.