Changeset bd72f517 for doc/theses/fangren_yu_MMath
- Timestamp:
- May 13, 2025, 1:17:50 PM (13 months ago)
- Branches:
- master, stuck-waitfor-destruct
- Children:
- 0528d79
- Parents:
- 7d02d35 (diff), 2410424 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - Location:
- doc/theses/fangren_yu_MMath
- Files:
-
- 7 edited
-
background.tex (modified) (1 diff)
-
features.tex (modified) (24 diffs)
-
future.tex (modified) (6 diffs)
-
intro.tex (modified) (41 diffs)
-
resolution.tex (modified) (28 diffs)
-
uw-ethesis.bib (modified) (1 diff)
-
uw-ethesis.tex (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/fangren_yu_MMath/background.tex
r7d02d35 rbd72f517 21 21 Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as @void *@, \ie only pointer types and @int@. 22 22 In \CFA terms, all Cyclone polymorphism must be dtype-static. 23 While the Cyclone design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, it is more restrictive than \CFA's general model.23 While the Cyclone design provides the efficiency benefits discussed in~\VRef{s:GenericImplementation} for dtype-static polymorphism, it is more restrictive than \CFA's general model. 24 24 Smith and Volpano~\cite{Smith98} present Polymorphic C, an ML dialect with polymorphic functions, C-like syntax, and pointer types; 25 25 it lacks many of C's features, most notably structure types, and hence, is not a practical C replacement. -
doc/theses/fangren_yu_MMath/features.tex
r7d02d35 rbd72f517 13 13 Here, manipulating the pointer address is the primary operation, while dereferencing the pointer to its value is the secondary operation. 14 14 For example, \emph{within} a data structure, \eg stack or queue, all operations involve pointer addresses and the pointer may never be dereferenced because the referenced object is opaque. 15 Alternatively, use a reference when its primary purpose is to alias a value, \eg a function parameter that does not copy the argument (performance reason).15 Alternatively, use a reference when its primary purpose is to alias a value, \eg a function parameter that does not copy the argument, for performance reasons. 16 16 Here, manipulating the value is the primary operation, while changing the pointer address is the secondary operation. 17 17 Succinctly, if the address changes often, use a pointer; … … 23 23 \CFA adopts a uniform policy between pointers and references where mutability is a separate property made at the declaration. 24 24 25 The following examples show show pointers and references are treated uniformly in \CFA.25 The following examples show how pointers and references are treated uniformly in \CFA. 26 26 \begin{cfa}[numbers=left,numberblanklines=false] 27 27 int x = 1, y = 2, z = 3;$\label{p:refexamples}$ … … 36 36 @&@r3 = @&@y; @&&@r3 = @&&@r4; $\C{// change r1, r2}$ 37 37 \end{cfa} 38 Like pointers, reference can be cascaded, \ie a reference to a reference, \eg @&& r2@.\footnote{38 Like pointers, references can be cascaded, \ie a reference to a reference, \eg @&& r2@.\footnote{ 39 39 \CC uses \lstinline{&&} for rvalue reference, a feature for move semantics and handling the \lstinline{const} Hell problem.} 40 40 Usage of a reference variable automatically performs the same number of dereferences as the number of references in its declaration, \eg @r2@ becomes @**r2@. … … 64 64 The call applies an implicit dereference once to @x@ so the call is typed @f( int & )@ with @T = int@, rather than with @T = int &@. 65 65 66 As fora pointer type, a reference type may have qualifiers, where @const@ is most common.66 As with a pointer type, a reference type may have qualifiers, where @const@ is most common. 67 67 \begin{cfa} 68 68 int x = 3; $\C{// mutable}$ … … 101 101 Interestingly, C does not give a warning/error if a @const@ pointer is not initialized, while \CC does. 102 102 Hence, type @& const@ is similar to a \CC reference, but \CFA does not preclude initialization with a non-variable address. 103 For example, in system 's programming, there are cases where an immutable address is initialized to a specific memory location.103 For example, in systems programming, there are cases where an immutable address is initialized to a specific memory location. 104 104 \begin{cfa} 105 105 int & const mem_map = *0xe45bbc67@p@; $\C{// hardware mapped registers ('p' for pointer)}$ … … 122 122 \end{cfa} 123 123 the call to @foo@ must pass @x@ by value, implying auto-dereference, while the call to @bar@ must pass @x@ by reference, implying no auto-dereference. 124 Without any restrictions, this ambiguity limits the behaviour of reference types in \CFA polymorphic functions, where a type @T@ can bind to a reference or non-reference type. 124 125 \PAB{My analysis shows} without any restrictions, this ambiguity limits the behaviour of reference types in \CFA polymorphic functions, where a type @T@ can bind to a reference or non-reference type. 125 126 This ambiguity prevents the type system treating reference types the same way as other types, even if type variables could be bound to reference types. 126 127 The reason is that \CFA uses a common \emph{object trait}\label{p:objecttrait} (constructor, destructor and assignment operators) to handle passing dynamic concrete type arguments into polymorphic functions, and the reference types are handled differently in these contexts so they do not satisfy this common interface. … … 157 158 \end{cfa} 158 159 While it is possible to write a reference type as the argument to a generic type, it is disallowed in assertion checking, if the generic type requires the object trait \see{\VPageref{p:objecttrait}} for the type argument, a fairly common use case. 159 Even if the object trait can be made optional, the current type systemoften misbehaves by adding undesirable auto-dereference on the referenced-to value rather than the reference variable itself, as intended.160 Even if the object trait can be made optional, the current compiler implementation often misbehaves by adding undesirable auto-dereference on the referenced-to value rather than the reference variable itself, as intended. 160 161 Some tweaks are necessary to accommodate reference types in polymorphic contexts and it is unclear what can or cannot be achieved. 161 162 Currently, there are contexts where the \CFA programmer is forced to use a pointer type, giving up the benefits of auto-dereference operations and better syntax with reference types. … … 188 189 @[x, y, z]@ = foo( 3, 4 ); // return 3 values into a tuple 189 190 \end{cfa} 190 Along with making returning multiple values a first-class feature, tuples were extended to simplify a number of other common context that normally require multiple statements and/or additional declarations, all of which reduces coding time and errors.191 Along with making returning multiple values a first-class feature, tuples were extended to simplify a number of other common contexts that normally require multiple statements and/or additional declarations. 191 192 \begin{cfa} 192 193 [x, y, z] = 3; $\C[2in]{// x = 3; y = 3; z = 3, where types may be different}$ … … 205 206 Only when returning a tuple from a function is there the notion of a tuple value. 206 207 207 Overloading in the \CFA type-system must support complex composition of tuples and C type conversions using a co stingscheme giving lower cost to widening conversions that do not truncate a value.208 Overloading in the \CFA type-system must support complex composition of tuples and C type conversions using a conversion cost scheme giving lower cost to widening conversions that do not truncate a value. 208 209 \begin{cfa} 209 210 [ int, int ] foo$\(_1\)$( int ); $\C{// overloaded foo functions}$ … … 213 214 \end{cfa} 214 215 The type resolver only has the tuple return types to resolve the call to @bar@ as the @foo@ parameters are identical. 215 The resul tion involves unifying the flattened @foo@ return values with @bar@'s parameter list.216 The resulution involves unifying the flattened @foo@ return values with @bar@'s parameter list. 216 217 However, no combination of @foo@s is an exact match with @bar@'s parameters; 217 218 thus, the resolver applies C conversions to obtain a best match. … … 223 224 bar( foo( 3 ) ) // only one tuple returning call 224 225 \end{lstlisting} 225 Hence, program ers cannot take advantage of the full power of tuples but type match is straightforward.226 Hence, programmers cannot take advantage of the full power of tuples but type match is straightforward. 226 227 227 228 K-W C also supported tuple variables, but with a strong distinction between tuples and tuple values/variables. … … 286 287 \end{figure} 287 288 288 The primary issues for tuples in the \CFA type system are polymorphism and conversions.289 \PAB{I identified} the primary issues for tuples in the \CFA type system are polymorphism and conversions. 289 290 Specifically, does it make sense to have a generic (polymorphic) tuple type, as is possible for a structure? 290 291 \begin{cfa} … … 303 304 \section{Tuple Implementation} 304 305 305 As noted, tradition languages manipulate multiple values by in/out parameters and/or structures.306 As noted, traditional languages manipulate multiple values by in/out parameters and/or structures. 306 307 K-W C adopted the structure for tuple values or variables, and as needed, the fields are extracted by field access operations. 307 308 As well, for the tuple-assignment implementation, the left-hand tuple expression is expanded into assignments of each component, creating temporary variables to avoid unexpected side effects. … … 356 357 \end{figure} 357 358 358 Interestingly, in the third implementation of \CFA tuples by Robert Schluntz~\cite[\S~3]{Schluntz17}, the MVR functions revert back to structure based, where itremains in the current version of \CFA.359 Interestingly, in the third implementation of \CFA tuples by Robert Schluntz~\cite[\S~3]{Schluntz17}, the MVR functions revert back to structure based, and this remains in the current version of \CFA. 359 360 The reason for the reversion is a uniform approach for tuple values/variables making tuples first-class types in \CFA, \ie allow tuples with corresponding tuple variables. 360 361 This reversion was possible, because in parallel with Schluntz's work, generic types were added independently by Moss~\cite{Moss19}, and the tuple variables leveraged the same implementation techniques as for generic variables~\cite[\S~3.7]{Schluntz17}. … … 384 385 Scala, like \CC, provides tuple types through a library using this structural expansion, \eg Scala provides tuple sizes 1 through 22 via hand-coded generic data-structures. 385 386 386 However, after experience gained building the \CFA runtime system, making tuple-types first-class seems to add little benefit.387 However, after experience gained building the \CFA runtime system, \PAB{I convinced them} making tuple-types first-class seems to add little benefit. 387 388 The main reason is that tuples usages are largely unstructured, 388 389 \begin{cfa} … … 512 513 looping is used to traverse the argument pack from left to right. 513 514 The @va_list@ interface is walking up the stack (by address) looking at the arguments pushed by the caller. 514 ( Magicknowledge is needed for arguments pushed using registers.)515 (Compiler-specific ABI knowledge is needed for arguments pushed using registers.) 515 516 516 517 \begin{figure} … … 571 572 572 573 Currently in \CFA, variadic polymorphic functions are the only place tuple types are used. 573 And because \CFA compiles polymorphic functions versus template expansion, many wrapper functions are generated to implement both user-defined generic-types and polymorphism with variadics.574 \PAB{My analysis showed} many wrapper functions are generated to implement both user-defined generic-types and polymorphism with variadics, because \CFA compiles polymorphic functions versus template expansion. 574 575 Fortunately, the only permitted operations on polymorphic function parameters are given by the list of assertion (trait) functions. 575 576 Nevertheless, this small set of functions eventually needs to be called with flattened tuple arguments. 576 577 Unfortunately, packing the variadic arguments into a rigid @struct@ type and generating all the required wrapper functions is significant work and largely wasted because most are never called. 577 578 Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output is to implement a simple variadic call with 3 arguments. 578 As the number of arguments increases, \eg a call with 5 arguments, the translator generates aconcrete @struct@ types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them.579 As the number of arguments increases, \eg a call with 5 arguments, the translator generates concrete @struct@ types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them. 579 580 An alternative approach is to put the variadic arguments into an array, along with an offset array to retrieve each individual argument. 580 581 This method is similar to how the C @va_list@ object is used (and how \CFA accesses polymorphic fields in a generic type), but the \CFA variadics generate the required type information to guarantee type safety (like the @printf@ format string). … … 683 684 684 685 Nested \emph{named} aggregates are allowed in C but there is no qualification operator, like the \CC type operator `@::@', to access an inner type. 685 \emph{To compensate for the missing type operator, all named nested aggregates are hoisted to global scope, regardless of the nesting depth, and type usages within the nested type are replaced with global type name.} 686 To compensate for the missing type operator, all named nested aggregates are hoisted to global scope, regardless of the nesting depth, and type usages within the nested type are replaced with global type name. 686 687 Hoisting nested types can result in name collisions among types at the global level, which defeats the purpose of nesting the type. 687 688 \VRef[Figure]{f:NestedNamedAggregate} shows the nested type @T@ is hoisted to the global scope and the declaration rewrites within structure @S@. … … 729 730 \end{figure} 730 731 731 For good reasons,\CC chose to change this semantics:732 \CC chose to change this semantics: 732 733 \begin{cquote} 733 734 \begin{description}[leftmargin=*,topsep=0pt,itemsep=0pt,parsep=0pt] … … 769 770 Like an anonymous nested type, a named Plan-9 nested type has its field names hoisted into @struct S@, so there is direct access, \eg @s.x@ and @s.i@. 770 771 Hence, the field names must be unique, unlike \CC nested types, but the type names are at a nested scope level, unlike type nesting in C. 771 In addition, a pointer to a structure is automatically converted to a pointer to an anonymous field for assignments and function calls, providing containment inheritance with implicit subtyping, \ie @U@ $ \subset$ @S@ and @W@ $\subset$ @S@, \eg:772 In addition, a pointer to a structure is automatically converted to a pointer to an anonymous field for assignments and function calls, providing containment inheritance with implicit subtyping, \ie @U@ $<:$ @S@ and @W@ $<:$ @S@, \eg: 772 773 \begin{cfa} 773 774 void f( union U * u ); … … 781 782 Note, there is no value assignment, such as, @w = s@, to copy the @W@ field from @S@. 782 783 783 Unfortunately, the Plan-9 designers did not look ahead to other useful features, specifically nested types.784 Unfortunately, the Plan-9 designers did not look ahead to other useful features, specifically nested types. 784 785 This nested type compiles in \CC and \CFA. 785 786 \begin{cfa} … … 808 809 In addition, a semi-non-compatible change is made so that Plan-9 syntax means a forward declaration in a nested type. 809 810 Since the Plan-9 extension is not part of C and rarely used, this change has minimal impact. 810 Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which is good ``eye-candy'' when reading a structure definition to spotPlan-9 definitions.811 Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which clearly indicates the usage of Plan-9 definitions. 811 812 Finally, the following code shows the value and pointer polymorphism. 812 813 \begin{cfa} … … 821 822 822 823 In general, non-standard C features (@gcc@) do not need any special treatment, as they are directly passed through to the C compiler. 823 However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the \CFA type resolver must take this information into account.824 However, \PAB{I found} the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the \CFA type resolver must take this information into account. 824 825 Therefore, the \CFA resolver must implement the Plan-9 features and insert necessary type conversions into the translated code output. 825 826 In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C arithmetic conversions. … … 847 848 \end{c++} 848 849 and again the expression @d.x@ is ambiguous. 849 While \CC has no direct syntax to disambiguate @x@, \ ie@d.B.x@ or @d.C.x@, it is possible with casts, @((B)d).x@ or @((C)d).x@.850 While \CC has no direct syntax to disambiguate @x@, \eg @d.B.x@ or @d.C.x@, it is possible with casts, @((B)d).x@ or @((C)d).x@. 850 851 Like \CC, \CFA compiles the Plan-9 version and provides direct qualification and casts to disambiguate @x@. 851 While ambiguous definitions are allowed, duplicate field names ispoor practice and should be avoided if possible.852 While ambiguous definitions are allowed, duplicate field names are poor practice and should be avoided if possible. 852 853 However, when a programmer does not control all code, this problem can occur and a naming workaround must exist. -
doc/theses/fangren_yu_MMath/future.tex
r7d02d35 rbd72f517 4 4 The following are feature requests related to type-system enhancements that have surfaced during the development of the \CFA language and library, but have not been implemented yet. 5 5 Currently, developers must work around these missing features, sometimes resulting in inefficiency. 6 \PAB{The following sections discuss new features I am proposing to fix these problems.} 6 7 7 8 8 9 \section{Closed Trait Types} 9 10 10 Currently, \CFA does not have any closed types, as open type are the basis of its unique type-system, allowing new functions to be added at any time to override existing ones for trait satisfaction.11 Currently, \CFA does not have any closed types, as open types are the basis of its unique type-system, allowing new functions to be added at any time to override existing ones for trait satisfaction. 11 12 Locally-declared nested-functions,\footnote{ 12 13 Nested functions are not a feature in C but supported by \lstinline{gcc} for multiple decades and are used heavily in \CFA.} … … 17 18 Library implementers normally do not want users to override certain operations and cause the behaviour of polymorphic invocations to change. 18 19 \item 19 Caching and reusing resolution results in the compiler is effected, as newly introduced declarations can participate in assertion resolution;20 Caching and reusing resolution results in the compiler is affected, as newly introduced declarations can participate in assertion resolution; 20 21 as a result, previously invalid subexpressions suddenly become valid, or alternatively cause ambiguity in assertions. 21 22 \end{enumerate} … … 70 71 \end{figure} 71 72 72 A \CFA closed trait type is similar to a Haskell type class requiringan explicit instance declaration.73 A \CFA closed trait type is planned to be working similarly to a Haskell type class that requires an explicit instance declaration. 73 74 The syntax for the closed trait might look like: 74 75 \begin{cfa} … … 91 92 92 93 \section{Associated Types} 94 \label{s:AssociatedTypes} 93 95 94 The analysis presented in \VRef{s:AssertionSatisfaction} shows if all type parameters have to be bound before assertion resolution, the complexity of resolving assertions become much lower as every assertion parameter can be resolved independently.96 The analysis presented in \VRef{s:AssertionSatisfaction} shows if all type parameters have to be bound before assertion resolution, the complexity of resolving assertions becomes much lower as every assertion parameter can be resolved independently. 95 97 That is, by utilizing information from higher up the expression tree for return value overloading, most of the type bindings can be resolved. 96 98 However, there are scenarios where some intermediate types need to be involved in certain operations, which are neither input nor output types. … … 152 154 Note that the type @list *@ satisfies both @pointer_like( list *, int )@ and @pointer_like( list *,@ @list )@ (the latter by the built-in pointer dereference operator) and the expression @*it@ can be either a @struct list@ or an @int@. 153 155 Requiring associated types to be unique makes the @pointer_like@ trait not applicable to @list *@, which is undesirable. 154 I have not attempted to implement associated types in \CFA compiler, but based on the above discussions, one option is to make associated type resolution and return type overloading coexist:156 I have not attempted to implement associated types in the \CFA compiler, but based on the above discussions, one option is to make associated type resolution and return type overloading coexist: 155 157 when the associated type appears in returns, it is deduced from the context and then verify the trait with ordinary assertion resolution; 156 158 when it does not appear in the returns, the type is required to be uniquely determined by the expression that defines the associated type. … … 159 161 \section{User-defined Conversions} 160 162 161 Missing type-system featureis a scheme for user-defined conversions.163 A missing type-system feature in \CFA is a scheme for user-defined conversions. 162 164 Conversion means one type goes through an arbitrary complex process of changing its value to some meaningful value in another type. 163 165 Because the conversion process can be arbitrarily complex, it requires the power of a function. -
doc/theses/fangren_yu_MMath/intro.tex
r7d02d35 rbd72f517 30 30 31 31 \section{Overloading} 32 32 \label{s:Overloading} 33 34 \vspace*{-5pt} 33 35 \begin{quote} 34 36 There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton 35 37 \end{quote} 38 \vspace*{-5pt} 36 39 Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files. 37 Experience from \CC and \CFA developers shows the type system can implicitly and correctly disambiguate sthe majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.40 Experience from \CC and \CFA developers shows the type system can implicitly and correctly disambiguate the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions. 38 41 In many cases, a programmer is unaware of name clashes, as they are silently resolved, simplifying the development process. 39 42 40 43 Disambiguating among overloads is implemented by examining each call site and selecting the best matching overloaded function based on criteria like the types and number of arguments and the return context. 41 Since the hardware does not support mixed-mode operands, @2 + 3.5@, the type system must disallow it or (safely) convert the operands to a common type.44 Since the hardware does not support mixed-mode operands, such as @2 + 3.5@, the type system must disallow it or (safely) convert the operands to a common type. 42 45 Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process. 43 46 This approach matches with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values. … … 49 52 As well, many namespace systems provide a mechanism to open their scope returning to normal overloading, \ie no qualification. 50 53 While namespace mechanisms are very important and provide a number of crucial program-development features, protection from overloading is overstated. 51 Similarly, lexical nesting is another place where overloading occurs.54 Similarly, lexical nesting is another place where duplicate naming issues arise. 52 55 For example, in object-oriented programming, class member names \newterm{shadow} names within members. 53 Some programmers ,qualify all member names with @class::@ or @this->@ to make them unique from names defined in members.54 Even nested lexical blocks result in shadowing, \eg multiple nested loop-indices called @i@ .55 Again, coding styles exist requiring all variables in nested block to be unique to prevent name shadowing .56 Some programmers qualify all member names with @class::@ or @this->@ to make them unique from names defined in members. 57 Even nested lexical blocks result in shadowing, \eg multiple nested loop-indices called @i@, silently changing the meaning of @i@ at lower scope levels. 58 Again, coding styles exist requiring all variables in nested block to be unique to prevent name shadowing problems. 56 59 Depending on the language, these possible ambiguities can be reported (as warnings or errors) and resolved explicitly using some form of qualification and/or cast. 57 58 Formally, overloading is defined by Strachey as \newterm{ad hoc polymorphism}: 60 For example, if variables can be overloaded, shadowed variables of different type can produce ambiguities, indicating potential problems in lower scopes. 61 62 Formally, overloading is defined by Strachey as one kind of \newterm{ad hoc polymorphism}: 63 \vspace*{-5pt} 59 64 \begin{quote} 60 65 In ad hoc polymorphism there is no single systematic way of determining the type of the result from the type of the arguments. … … 63 68 It seems, moreover, that the automatic insertion of transfer functions by the compiling system is limited to this.~\cite[p.~37]{Strachey00} 64 69 \end{quote} 70 \vspace*{-5pt} 65 71 where a \newterm{transfer function} is an implicit conversion to help find a matching overload: 72 \vspace*{-5pt} 66 73 \begin{quote} 67 74 The problem of dealing with polymorphic operators is complicated by the fact that the range of types sometimes overlap. … … 69 76 The functions which perform this operation are known as transfer functions and may either be used explicitly by the programmer, or, in some systems, inserted automatically by the compiling system.~\cite[p.~35]{Strachey00} 70 77 \end{quote} 78 \vspace*{-5pt} 71 79 The differentiating characteristic between parametric polymorphism and overloading is often stated as: polymorphic functions use one algorithm to operate on arguments of many different types, whereas overloaded functions use a different algorithm for each type of argument. 72 80 A similar differentiation is applicable for overloading and default parameters. … … 128 136 \end{cfa} 129 137 the overloaded names @S@ and @E@ are separated into the type and object domain, and C uses the type kinds @struct@ and @enum@ to disambiguate the names. 130 In general, types are not overloaded because inferencing them is difficult to imagine in a statically programming language.138 In general, types are not overloaded because inferencing them is difficult to imagine in a statically typed programming language. 131 139 \begin{cquote} 132 140 \setlength{\tabcolsep}{26pt} … … 181 189 \noindent 182 190 \newterm{General overloading} occurs when the type-system \emph{knows} a function's parameters and return types (or a variable's type for variable overloading). 183 In functional programming-languages, there is always a return type (except for a monad).191 In functional programming-languages, there is always a return type. 184 192 If a return type is specified, the compiler does not have to inference the function body. 185 193 For example, the compiler has complete knowledge about builtin types and their overloaded arithmetic operators. … … 214 222 Hence, parametric overloading requires additional information about the universal types to make them useful. 215 223 216 This additional information often comes as a set of operations a type must supply (@trait@/-@concept@) and these operations can then be used in the body of the function. 217 \begin{cfa} 218 forall( T | T ?@++@( T, T ) ) T inc( T t ) { return t@++@; } 224 This additional information often comes as a set of operations that must be supply for a type, \eg \CFA/Rust/Go have traits, \CC template has concepts, Haskell has type-classes. 225 These operations can then be used in the body of the function to manipulate the type's value. 226 Here, a type binding to @T@ must have available a @++@ operation with the specified signature. 227 \begin{cfa} 228 forall( T | @T ?++( T, T )@ ) // trait 229 T inc( T t ) { return t@++@; } // change type value 219 230 int i = 3 220 231 i = inc( i ) … … 222 233 \end{cfa} 223 234 Given a qualifying trait, are its elements inferred or declared? 224 In the above example, the type system infers @int@ for @T@, infers it needs a @++@ operator that takes an @int@ and returns an @int@, and finds this function in the enclosing environment (\eg standard prelude).235 In the example, the type system infers @int@ for @T@, infers it needs an appropriately typed @++@ operator, and finds it in the enclosing environment, possibly in the language's prelude defining basic types and their operations. 225 236 This implicit inferencing is expensive if matched with implicit conversions when there is no exact match. 226 237 Alternatively, types opt-in to traits via declarations. … … 420 431 \subsection{Operator Overloading} 421 432 422 Virtually all programming languages provide general overloading of the arithmetic operatorsacross the basic computational types using the number and type of parameters and returns.433 Many programming languages provide general overloading of the arithmetic operators~\cite{OperOverloading} across the basic computational types using the number and type of parameters and returns. 423 434 However, in some languages, arithmetic operators may not be first class, and hence, cannot be overloaded. 424 435 Like \CC, \CFA allows general operator overloading for user-defined types. … … 438 449 \subsection{Function Overloading} 439 450 440 Both \CFA and \CC allow general overloading for functions, as long as their prototypes differ in the number and type of parameters and returns. 451 Many programming languages provide general overloading for functions~\cite{FuncOverloading}, as long as their prototypes differ in the number and type of parameters. 452 A few programming languages also use the return type for selecting overloaded functions \see{below}. 441 453 \begin{cfa} 442 454 void f( void ); $\C[2in]{// (1): no parameter}$ … … 445 457 f( 'A' ); $\C{// select (2)}\CRT$ 446 458 \end{cfa} 447 The type system examines each call si ze and first looks for an exact match and then a best match using conversions.459 The type system examines each call site and first looks for an exact match and then a best match using conversions. 448 460 449 461 Ada, Scala, and \CFA type-systems also use the return type in resolving a call, to pinpoint the best overloaded name. 450 Essent ailly, the return types are \emph{reversed curried} into output parameters of the function.462 Essentially, the return types are \emph{reversed curried} into output parameters of the function. 451 463 For example, in many programming languages with overloading, the following functions are ambiguous without using the return type. 452 464 \begin{cfa} … … 478 490 \begin{cfa} 479 491 void foo( double d ); 480 int v; $\C[2in]{// (1)}$492 int v; $\C[2in]{// (1)}$ 481 493 double v; $\C{// (2) variable overloading}$ 482 494 foo( v ); $\C{// select (2)}$ … … 487 499 } 488 500 \end{cfa} 489 It is interesting that shadow overloadingis considered a normal programming-language feature with only slight software-engineering problems.501 It is interesting that shadowing \see{namespace pollution in \VRef{s:Overloading}} is considered a normal programming-language feature with only slight software-engineering problems. 490 502 However, variable overloading within a scope is often considered dangerous, without any evidence to corroborate this claim. 491 503 In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems. … … 554 566 The following covers these issues, and why this scheme is not amenable with the \CFA type system. 555 567 556 One of the first and powerful type-inferencing systemis Hindley--Milner~\cite{Damas82}.568 One of the first and most powerful type-inferencing systems is Hindley--Milner~\cite{Damas82}. 557 569 Here, the type resolver starts with the types of the program constants used for initialization and these constant types flow throughout the program, setting all variable and expression types. 558 570 \begin{cfa} … … 579 591 Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match. 580 592 581 In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages.593 There are multiple ways to indirectly specify a variable's type, \eg from a prior variable or expression. 582 594 \begin{cquote} 583 595 \setlength{\tabcolsep}{10pt} … … 606 618 \end{tabular} 607 619 \end{cquote} 608 The two important capabilities are: 620 Here, @type(expr)@ computes the same type as @auto@ righ-hand expression. 621 The advantages are: 609 622 \begin{itemize}[topsep=0pt] 610 623 \item … … 616 629 This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages. 617 630 \item 618 Ensuring the type of secondary variables ,match a primary variable.631 Ensuring the type of secondary variables match a primary variable. 619 632 \begin{cfa} 620 633 int x; $\C{// primary variable}$ … … 625 638 \end{itemize} 626 639 Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing. 640 \begin{cquote} 641 \setlength{\tabcolsep}{20pt} 642 \begin{tabular}{@{}ll@{}} 627 643 \begin{cfa} 628 644 int x; … … 630 646 type(x) z = ... // complex expression 631 647 \end{cfa} 632 Here, the types of @y@ and @z@ are fixed (branded), whereas with type inferencing, the types of @y@ and @z@ are potentially unknown. 648 & 649 \begin{cfa} 650 int x; 651 auto y = ... // complex expression 652 auto z = ... // complex expression 653 \end{cfa} 654 \end{tabular} 655 \end{cquote} 656 On the left, the types of @y@ and @z@ are fixed (branded), whereas on the right, the types of @y@ and @z@ can fluctuate. 633 657 634 658 635 659 \subsection{Type-Inferencing Issues} 636 660 637 Each kind of type-inferencing system has its own set of issues that flow ontothe programmer in the form of convenience, restrictions, or confusions.661 Each kind of type-inferencing system has its own set of issues that affect the programmer in the form of convenience, restrictions, or confusions. 638 662 639 663 A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience. … … 643 667 For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors. 644 668 At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes. 645 Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the comp lier can report a mismatch with the constant initialization.669 Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the compiler can report a mismatch with the constant initialization. 646 670 \begin{cfa} 647 671 void f( @int@ x, @int@ y ) { // brand function prototype … … 657 681 As a result, understanding and changing the code becomes almost impossible. 658 682 Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code. 659 In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancyIDE that can re-engineer types for them.683 In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on an IDE that can re-engineer types for them. 660 684 For example, given: 661 685 \begin{cfa} … … 670 694 In this situation, having the type name or its short alias is essential. 671 695 672 \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.696 \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint correct types within an expression. 673 697 Type inferencing defeats this goal because there is no left-hand type. 674 Fundamentally, type inferencing tries to magic away variable types from the programmer. 675 However, this results in lazy programming with the potential for poor performance and safety concerns. 676 Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think! 677 A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{ 678 There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.} 679 The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming. 680 Understanding space and time issues is an essential part of the programming craft. 681 Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system. 698 Fundamentally, type inferencing tries to remove explicit typing from programming. 699 However, writing down types is an important aspect of good programming, as it provides a check of the programmer's expected type and the actual type. 700 Thinking carefully about types is similar to thinking carefully about date structures, often resulting in better performance and safety. 701 Similarly, thinking carefully about storage management in unmanaged languages is an important aspect of good programming, versus implicit storage management (garbage collection) in managed language.\footnote{ 702 There are full-time Java consultants, who are hired to find memory-management problems in large Java programs, \eg Monika Beckworth.} 703 Given @typedef@ and @typeof@, and the strong desire to use the left-hand type in resolution, no attempt has been made in \CFA to support implicit type-inferencing. 682 704 Should a significant need arise, this decision can be revisited. 683 705 … … 702 724 int i, * ip = identity( &i ); 703 725 \end{cfa} 704 Unlike \CC template functions, \CFA polymorphic functions are compatible with C\emph{separate compilation}, preventing compilation and code bloat.726 Unlike \CC template functions, \CFA polymorphic functions are compatible with \emph{separate compilation}, preventing compilation and code bloat. 705 727 706 728 To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. … … 710 732 int val = twice( twice( 3 ) ); $\C{// val == 12}$ 711 733 \end{cfa} 712 Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) Cfunctions, like @qsort@ for sorting an array of unknown values.734 The closest approximation to parametric polymorphism and assertions in C is type-unsafe (@void *@) functions, like @qsort@ for sorting an array of unknown values. 713 735 \begin{cfa} 714 736 void qsort( void * base, size_t nmemb, size_t size, int (*cmp)( const void *, const void * ) ); … … 761 783 The @sized@ assertion passes size and alignment as a data object has no implicit assertions. 762 784 Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@. 763 In practi se, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@.785 In practice, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@. 764 786 765 787 This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions. … … 787 809 forall( T @| sumable( T )@ ) // use trait 788 810 T sum( T a[$\,$], size_t size ) { 789 @T@ total = 0; // initialize by 0 constructor811 @T@ total = 0; // initialize by 0 constructor 790 812 for ( i; size ) 791 total @+=@ a[i]; // select appropriate +813 total @+=@ a[i]; // select appropriate + 792 814 return total; 793 815 } … … 795 817 \end{tabular} 796 818 \end{cquote} 797 Traits are implemented by flatten them at use points, as if written in full by the programmer.819 Traits are implemented by flattening them at use points, as if written in full by the programmer. 798 820 Flattening often results in overlapping assertions, \eg operator @+@. 799 821 Hence, trait names play no part in type equivalence. … … 821 843 Write bespoke data structures for each context. 822 844 While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures. 845 823 846 \item 824 847 Use @void *@-based polymorphism, \eg the C standard library functions @bsearch@ and @qsort@, which allow for the reuse of code with common functionality. 825 848 However, this approach eliminates the type checker's ability to ensure argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary. 849 826 850 \item 827 Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret. 828 Furthermore, writing and using complex preprocessor macros is difficult and inflexible. 851 Use an internal macro capability, like \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret. 852 Furthermore, writing complex template macros is difficult and complex. 853 854 \item 855 Use an external macro capability, like M4~\cite{M4}, to generate code that is generic code, but errors may be difficult to interpret. 856 Like internal macros, writing and using external macros is equally difficult and complex. 829 857 \end{enumerate} 830 858 … … 857 885 \end{tabular} 858 886 \end{cquote} 887 \label{s:GenericImplementation} 859 888 \CFA generic types are \newterm{fixed} or \newterm{dynamic} sized. 860 889 Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters. … … 883 912 For software-engineering reasons, the set assertions would be refactored into a trait to allow alternative implementations, like a Java \lstinline[language=java]{interface}. 884 913 885 In summation, the \CFA type system inherits \newterm{nominal typing} for concrete types from C, and adds \newterm{structural typing} for polymorphic types. 886 Traits are used like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance relationships. 887 Instead, each polymorphic function or generic type defines the structural type needed for its execution, which is fulfilled at each call site from the lexical environment, like Go~\cite{Go} or Rust~\cite{Rust} interfaces. 888 Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal inheritance hierarchy. 914 In summation, the \CFA type system inherits \newterm{nominal typing} for concrete types from C; 915 however, without inheritance in \CFA, nominal typing cannot be extended to polymorphic subtyping. 916 Instead, \CFA adds \newterm{structural typing} and uses it to generate polymorphism. 917 Here, traits are like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance relationships. 918 Instead, each polymorphic function or generic type defines the structural requirements needed for its execution, which is fulfilled at each call site from the lexical environment, like Go~\cite{Go} or Rust~\cite{Rust} interfaces. 919 Hence, lexical scopes and nested functions are used extensively to mimic subtypes, as in the @qsort@ example, without managing a nominal inheritance hierarchy. 889 920 890 921 … … 902 933 general\footnote{overloadable entities: V $\Rightarrow$ variable, O $\Rightarrow$ operator, F $\Rightarrow$ function, M $\Rightarrow$ member} 903 934 & O\footnote{except assignment}/F & O/F/M & V/O/F & M\footnote{not universal} & O/M & O/F/M & no & no \\ 904 general constraints\footnote{T $\Rightarrow$ parameter type, \# $\Rightarrow$ parameter number, N $\Rightarrow$ parameter name; R $\Rightarrow$ return type}935 general constraints\footnote{T $\Rightarrow$ parameter type, \# $\Rightarrow$ parameter count, N $\Rightarrow$ parameter name; R $\Rightarrow$ return type} 905 936 & T/\#//R\footnote{parameter names can be used to disambiguate among overloads but not create overloads} 906 937 & T/\# & T/\#/R & T/\# & T/\#/N/R & T/\#/N/R & T/\#/N & T/R \\ … … 999 1030 1000 1031 However, the parameter operations are severely restricted because universal types have few operations. 1001 For example, swift provides a @print@ operation for its universal type, and the java @Object@ class provides general methods: @toString@, @hashCode@, @equals@, @finalize@, \etc.1032 For example, Swift provides a @print@ operation for its universal type, and the Java @Object@ class provides general methods: @toString@, @hashCode@, @equals@, @finalize@, \etc. 1002 1033 This restricted mechanism still supports a few useful functions, where the parameters are abstract entities, \eg: 1003 1034 \begin{swift} … … 1009 1040 \end{swift} 1010 1041 To make a universal function useable, an abstract description is needed for the operations used on the parameters within the function body. 1011 Type matching these operations can occur by discoverusing techniques like \CC template expansion, or explicit stating, \eg interfaces, subtyping (inheritance), assertions (traits), type classes, type bounds.1042 Type matching these operations can be done by using techniques like \CC template expansion, or explicit stating, \eg interfaces, subtyping (inheritance), assertions (traits), type classes, type bounds. 1012 1043 The mechanism chosen can affect separate compilation or require runtime type information (RTTI). 1013 1044 \begin{description} … … 1030 1061 1031 1062 \begin{figure} 1032 \setlength{\tabcolsep}{1 5pt}1063 \setlength{\tabcolsep}{12pt} 1033 1064 \begin{tabular}{@{}ll@{}} 1034 1065 \multicolumn{1}{c}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{Haskell}} \\ … … 1036 1067 forall( T ) trait sumable { 1037 1068 void ?{}( T &, zero_t ); 1038 T ?+=?( T &, T ); 1039 }; 1069 T ?+=?( T &, T ); }; 1040 1070 forall( T | sumable( T ) ) 1041 1071 T sum( T a[], size_t size ) { 1042 1072 T total = 0; 1043 1073 for ( i; size ) total += a[i]; 1044 return total; 1045 } 1074 return total; } 1046 1075 struct S { int i, j; }; 1047 1076 void ?{}( S & s, zero_t ) { s.[i, j] = 0; } … … 1049 1078 void ?{}( S & s, int i, int j ) { s.[i, j] = [i, j]; } 1050 1079 S ?+=?( S & l, S r ) { l.[i, j] += r.[i, j]; } 1051 1052 int main() { 1053 int ia[] = { 1, 2, 3 }; 1054 sout | sum( ia, 3 ); // trait inference 1055 double da[] = { 1.5, 2.5, 3.5 }; 1056 sout | sum( da, 3 ); // trait inference 1057 S sa[] = { {1, 1}, {2, 2}, {3, 3 } }; 1058 sout | sum( sa, 3 ).[i, j]; // trait inference 1059 } 1080 int main() { // trait inferencing 1081 sout | sum( (int []){ 1, 2, 3 }, 3 ); 1082 sout | sum( (double []){ 1.5, 2.5, 3.5 }, 3 ); 1083 sout | sum( (S []){ {1,1}, {2,2}, {3,3} }, 3 ).[i, j]; } 1084 1085 1086 1060 1087 \end{cfa} 1061 1088 & … … 1064 1091 szero :: a 1065 1092 sadd :: a -> a -> a 1066 1067 1093 ssum :: Sumable a $=>$ [a] -> a 1068 1094 ssum (x:xs) = sadd x (ssum xs) 1069 1095 ssum [] = szero 1070 1071 1072 1073 1096 data S = S Int Int deriving Show 1074 1097 @instance Sumable Int@ where … … 1077 1100 @instance Sumable Float@ where 1078 1101 szero = 0.0 1079 sadd = (+)1102 sadd = (+) 1080 1103 @instance Sumable S@ where 1081 1104 szero = S 0 0 … … 1087 1110 \end{haskell} 1088 1111 \end{tabular} 1089 \caption{Implicit ly/ExplicitlyTrait Inferencing}1090 \label{f:Implicit lyExplicitlyTraitInferencing}1112 \caption{Implicit/Explicit Trait Inferencing} 1113 \label{f:ImplicitExplicitTraitInferencing} 1091 1114 \end{figure} 1092 1115 1093 1116 One differentiating feature among these specialization techniques is the ability to implicitly or explicitly infer the trait information at a class site. 1094 \VRef[Figure]{f:Implicit lyExplicitlyTraitInferencing} compares the @sumable@ trait and polymorphic @sum@ function \see{\VRef{s:Traits}} for \CFA and Haskell.1117 \VRef[Figure]{f:ImplicitExplicitTraitInferencing} compares the @sumable@ trait and polymorphic @sum@ function \see{\VRef{s:Traits}} for \CFA and Haskell. 1095 1118 Here, the \CFA type system inferences the trait functions at each call site, so no additional specification is necessary by the programmer. 1096 1119 The Haskell program requires the programmer to explicitly bind the trait and to each type that can be summed. … … 1102 1125 \end{ada} 1103 1126 Finally, there is a belief that certain type systems cannot support general overloading, \eg Haskell. 1104 As \VRef[Table]{t:OverloadingFeatures} shows, there are multiple languages with both general and parametric overloading, so the decision to not support general overloading is based on the opinion of the language designers and the type system they choose,not any reason in type theory.1127 As \VRef[Table]{t:OverloadingFeatures} shows, there are multiple languages with both general and parametric overloading, so the decision to not support general overloading is based on design choices made by the language designers not any reason in type theory. 1105 1128 1106 1129 The fourth row classifies if conversions are attempted beyond exact match. … … 1121 1144 The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis. 1122 1145 \item 1123 Th ethesis presents a systematic review of the new features added to the \CFA language and its type system.1146 This thesis presents a systematic review of the new features added to the \CFA language and its type system. 1124 1147 Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance. 1125 1148 Several issues coming from the interactions of various language features are identified and discussed in this thesis; -
doc/theses/fangren_yu_MMath/resolution.tex
r7d02d35 rbd72f517 2 2 \label{c:content2} 3 3 4 Recapping, the\CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with assertion restrictions;4 Recapping, \CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with assertion restrictions; 5 5 in addition, C's multiple implicit type-conversions must be respected. 6 6 This generality leads to internal complexity and correspondingly higher compilation cost directly related to type resolution. … … 24 24 \end{enumerate} 25 25 \VRef[Table]{t:SelectedFileByCompilerBuild} shows improvements for selected tests with accumulated reductions in compile time across each of the 5 fixes. 26 T o this day, the large reduction in compilation time significantly improves the development of the \CFA's runtime because of its frequent compilation cycles.26 The large reduction in compilation time significantly improves the development of the \CFA's runtime because of its frequent compilation cycles. 27 27 28 28 \begin{table}[htb] … … 54 54 Some of those problems arise from the newly introduced language features described in the previous chapter. 55 55 In addition, fixing unexpected interactions within the type system has presented challenges. 56 This chapter describes in detail the type-resolution rules currently in use and some major problems that have beenidentified.56 This chapter describes in detail the type-resolution rules currently in use and some major problems \PAB{I} have identified. 57 57 Not all of those problems have immediate solutions, because fixing them may require redesigning parts of the \CFA type system at a larger scale, which correspondingly affects the language design. 58 58 … … 69 69 \begin{enumerate}[leftmargin=*] 70 70 \item \textbf{Unsafe} cost representing a narrowing conversion of arithmetic types, \eg @int@ to @short@, and qualifier-dropping conversions for pointer and reference types. 71 Narrowing conversions have the potential to lose (truncat ion) data.71 Narrowing conversions have the potential to lose (truncate) data. 72 72 A programmer must decide if the computed data-range can safely be shorted in the smaller storage. 73 73 Warnings for unsafe conversions are helpful. … … 86 86 87 87 \item \textbf{Safe} cost representing a widening conversion \eg @short@ to @int@, qualifier-adding conversions for pointer and reference types, and value conversion for enumeration constants. 88 Even when conversions are safe, the fewest conversions itranked better, \eg @short@ to @int@ versus @short@ to @long int@.88 When all conversions are safe, closer conversions are ranked better, \eg @short@ to @int@ versus @short@ to @long int@. 89 89 \begin{cfa} 90 90 void f( long int p ); $\C[2.5in]{// 1}$ … … 103 103 104 104 \item \textbf{Specialization} cost counting the number of restrictions introduced by type assertions. 105 Fewer restriction means few sparametric variables passed at the function call giving better performance.105 Fewer restriction means fewer parametric variables passed at the function call giving better performance. 106 106 \begin{cfa} 107 107 forall( T | { T ?+?( T, T ) } ) void f( T ); $\C[3.25in]{// 1}$ … … 110 110 \end{cfa} 111 111 \end{enumerate} 112 Cost tuples are compared bylexicographical order, from unsafe (highest) to specialization (lowest), with ties moving to the next lowest item.112 Cost tuples are compared in lexicographical order, from unsafe (highest) to specialization (lowest), with ties moving to the next lowest item. 113 113 At a subexpression level, the lowest cost candidate for each result type is included as a possible interpretation of the expression; 114 114 at the top level, all possible interpretations of different types are considered (generating a total ordering) and the overall lowest cost is selected as the final interpretation of the expression. 115 115 Glen Ditchfield first proposed this costing model~\cite[\S~4.4.5]{Ditchfield92} to generate a resolution behaviour that is reasonable to C programmers based on existing conversions in the C programming language. 116 116 This model carried over into the first implementation of the \CFA type-system by Richard Bilson~\cite[\S~2.2]{Bilson03}, and was extended but not redesigned by Aaron Moss~\cite[chap.~4]{Moss19}. 117 Moss's work began to show problems with the underlying cost ingmodel;117 Moss's work began to show problems with the underlying cost model; 118 118 these design issues are part of this work. 119 119 … … 152 152 Therefore, at each resolution step, the arguments are already given unique interpretations, so the ordering only needs to compare different sets of conversion targets (function parameter types) on the same set of input. 153 153 154 In \CFA, trying to use such a system is problematic because of the presence of return-type overloading of functions and variable.154 \PAB{My conclusion} is that trying to use such a system in \CFA is problematic because of the presence of return-type overloading of functions and variables. 155 155 Specifically, \CFA expression resolution considers multiple interpretations of argument subexpressions with different types, \eg: 156 156 so it is possible that both the selected function and the set of arguments are different, and cannot be compared with a partial-ordering system. … … 165 165 \end{quote} 166 166 However, I was unable to generate any Ada example program that demonstrates this preference. 167 In contrast, the \CFA overload resolution-system is at the other end of the spectrum, as it tries to order everylegal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results rather than an ambiguity.167 In contrast, the \CFA overload resolution-system is at the other end of the spectrum, as it tries to order all legal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results rather than an ambiguity. 168 168 169 169 Interestingly, the \CFA cost-based model can sometimes make expression resolution too permissive because it always attempts to select the lowest cost option, and only when there are multiple options tied at the lowest cost does it report the expression is ambiguous. … … 171 171 Other than the case of multiple exact matches, where all have cost zero, incomparable candidates under a partial ordering can often have different expression costs since different kinds of implicit conversions are involved, resulting in seemingly arbitrary overload selections. 172 172 173 There are currently at least three different situations where the polymorphic cost element of the cost model does not yield a candidate selection that is clearly justifiable, and one of them is straight upwrong.173 There are currently at least three different situations where the polymorphic cost element of the cost model does not yield a candidate selection that is justifiable, and one of them is clearly wrong. 174 174 \begin{enumerate}[leftmargin=*] 175 175 \item Polymorphic exact match versus non-polymorphic inexact match. … … 193 193 \end{itemize} 194 194 In this example, option 1 produces the prototype @void f( int )@, which gives an exact match and therefore takes priority. 195 The \CC resolution rules effectively make s option 2 a specialization that only applies to type @long@ exactly,\footnote{\CC does have explicit template specializations, however they do not participate directly in overload resolution and can sometimes lead to unintuitive results.} while the current \CFA rules make option 2 apply for all integral types below @long@.195 The \CC resolution rules effectively make option 2 a specialization that only applies to type @long@ exactly,\footnote{\CC does have explicit template specializations, however they do not participate directly in overload resolution and can sometimes lead to unintuitive results.} while the current \CFA rules make option 2 apply for all integral types ranked lower than @long@ as well. 196 196 This difference could be explained as compensating for \CFA polymorphic functions being separately compiled versus template inlining; 197 197 hence, calling them requires passing type information and assertions increasing the runtime cost. … … 211 211 Although it is true that both the sequence 1, 2 and 1, 3, 4 are increasingly more constrained on the argument types, option 2 is not comparable to either of option 3 or 4; 212 212 they actually describe independent constraints on the two arguments. 213 Specifically, option 2 says the two arguments must have the same type, while option 3 states the second argument must have type @int@ ,213 Specifically, option 2 says the two arguments must have the same type, while option 3 states the second argument must have type @int@. 214 214 Because two constraints can independently be satisfied, neither should be considered a better match when trying to resolve a call to @f@ with argument types @(int, int)@; 215 215 reporting such an expression as ambiguous is more appropriate. … … 227 227 Passing a @pair@ variable to @f@ 228 228 \begin{cfa} 229 pair p;229 pair(int, double) p; 230 230 f( p ); 231 231 \end{cfa} 232 232 gives a cost of 1 poly, 2 variable for the @pair@ overload, versus a cost of 1 poly, 1 variable for the unconstrained overload. 233 233 Programmer expectation is to select option 1 because of the exact match, but the cost model selects 2; 234 while either could work, the type system should select a call that meets expectation of say the call is ambiguous, forcing the programmer to mediate.234 it is not possible to write a specialization for @f@ that works on any pair type and gets selected by the type resolver as intended. 235 235 As a result, simply counting the number of polymorphic type variables is no longer correct to order the function candidates as being more constrained. 236 236 \end{enumerate} 237 237 238 These inconsistencies are not easily solvable in the current cost-model, meaning th e currently\CFA codebase has to workaround these defects.238 These inconsistencies are not easily solvable in the current cost-model, meaning that currently the \CFA codebase has to workaround these defects. 239 239 One potential solution is to mix the conversion cost and \CC-like partial ordering of specializations. 240 240 For example, observe that the first three elements (unsafe, polymorphic and safe conversions) in the \CFA cost-tuple are related to the argument/parameter types, while the other two elements (polymorphic variable and assertion counts) are properties of the function declaration. … … 352 352 Here, the unsafe cost of signed to unsigned is factored into the ranking, so the safe conversion is selected over an unsafe one. 353 353 Furthermore, an integral option is taken before considering a floating option. 354 This model locally matches the C approach, but provides an ordering when there are many overload ed alternative.354 This model locally matches the C approach, but provides an ordering when there are many overload alternatives. 355 355 However, as Moss pointed out overload resolution by total cost has problems, \eg handling cast expressions. 356 356 \begin{cquote} … … 379 379 if an expression has any legal interpretations as a C builtin operation, only the lowest cost one is kept, regardless of the result type. 380 380 381 \VRef[Figure]{f:CFAArithmeticConversions} shows analternative \CFA partial-order arithmetic-conversions graphically.381 \VRef[Figure]{f:CFAArithmeticConversions} shows \PAB{my} alternative \CFA partial-order arithmetic-conversions graphically. 382 382 The idea here is to first look for the best integral alternative because integral calculations are exact and cheap. 383 383 If no integral solution is found, than there are different rules to select among floating-point alternatives. … … 387 387 \section{Type Unification} 388 388 389 Type unification is the algorithm that assigns values to each (free) type parameter ssuch that the types of the provided arguments and function parameters match.389 Type unification is the algorithm that assigns values to each (free) type parameter such that the types of the provided arguments and function parameters match. 390 390 391 391 \CFA does not attempt to do any type \textit{inference} \see{\VRef{s:IntoTypeInferencing}}: it has no anonymous functions (\ie lambdas, commonly found in functional programming and also used in \CC and Java), and the variable types must all be explicitly defined (no auto typing). … … 408 408 With the introduction of generic record types, the parameters must match exactly as well; currently there are no covariance or contravariance supported for the generics. 409 409 410 One simplification was madeto the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed.410 \PAB{I made} one simplification to the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed. 411 411 The polymorphic function declarations themselves are still treated as function pointer types internally, however the change means that formal parameter types can no longer be polymorphic. 412 412 Previously it was possible to write function prototypes such as … … 418 418 A function operates on the call-site arguments together with any local and global variables. 419 419 When the function is polymorphic, the types are inferred at each call site. 420 On each invocation, the types to be operate on are determined from the arguments provided, and therefore, there is no need to pass a polymorphic function pointer, which can take any type in principle.420 On each invocation, the types to be operated on are determined from the arguments provided, and therefore, there is no need to pass a polymorphic function pointer, which can take any type in principle. 421 421 For example, consider a polymorphic function that takes one argument of type @T@ and polymorphic function pointer. 422 422 \begin{cfa} … … 441 441 The assertion set that needs to be resolved is just the declarations on the function prototype, which also simplifies the assertion satisfaction algorithm, which is discussed further in the next section. 442 442 443 Animplementation sketch stores type unification results in a type-environment data-structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and information such as whether the bound type is allowed to be opaque (\ie a forward declaration without definition in scope) and whether the bounds are allowed to be widened.443 \PAB{My} implementation sketch stores type unification results in a type-environment data-structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and information such as whether the bound type is allowed to be opaque (\ie a forward declaration without definition in scope) and whether the bounds are allowed to be widened. 444 444 In the general approach commonly used in functional languages, the unification variables are given a lower bound and an upper bound to account for covariance and contravariance of types. 445 445 \CFA does not implement any variance with its generic types and does not allow polymorphic function types, therefore no explicit upper bound is needed and one binding value for each equivalence class suffices. … … 469 469 470 470 471 In previous versions of \CFA, this number was set at 4; as the compiler becomes more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and it does not lead to problems.471 In previous versions of \CFA, this number was set at 4; as the compiler has become more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and it has not led to problems. 472 472 Only rarely is there a case where the infinite recursion produces an exponentially growing assertion set, causing minutes of time wasted before the limit is reached. 473 473 Fortunately, it is very hard to generate this situation with realistic \CFA code, and the ones that have occurred have clear characteristics, which can be prevented by alternative approaches. … … 475 475 One example is analysed in this section. 476 476 477 While the assertion satisfaction problem in isolation looks like just another expression to resolve, its recursive nature makes some techniques for expression resolution no longer possible.477 \PAB{My analysis shows that} while the assertion satisfaction problem in isolation looks like just another expression to resolve, its recursive nature makes some techniques for expression resolution no longer possible. 478 478 The most significant impact is that type unification has a side effect, namely editing the type environment (equivalence classes and bindings), which means if one expression has multiple associated assertions it is dependent, as the changes to the type environment must be compatible for all the assertions to be resolved. 479 479 Particularly, if one assertion parameter can be resolved in multiple different ways, all of the results need to be checked to make sure the change to type variable bindings are compatible with other assertions to be resolved. … … 481 481 In many cases, these problems can be avoided by examining other assertions that provide insight on the desired type binding: if one assertion parameter can only be matched by a unique option, the type bindings can be updated confidently without the need for backtracking. 482 482 483 The Moss algorithm currently used in \CFA was developed using a simplified type -simulator that capture most of \CFA type-system features.483 The Moss algorithm currently used in \CFA was developed using a simplified type system that captures most of \CFA's type system features. 484 484 The simulation results were then ported back to the actual language. 485 485 The simulator used a mix of breadth- and depth-first search in a staged approach. … … 494 494 If any new assertions are introduced by the selected candidates, the algorithm is applied recursively, until there are none pending resolution or the recursion limit is reached, which results in a failure. 495 495 496 However, in practice the efficiency of this algorithm can be sensitive to the order of resolving assertions.496 However, \PAB{I identify that} in practice the efficiency of this algorithm can be sensitive to the order of resolving assertions. 497 497 Suppose an unbound type variable @T@ appears in two assertions: 498 498 \begin{cfa} … … 517 517 A type variable introduced by the @forall@ clause of function declaration can appear in parameter types, return types and assertion variables. 518 518 If it appears in parameter types, it can be bound when matching the arguments to parameters at the call site. 519 If it only appears in the return type, it can be eventually bedetermined from the call-site context.519 If it only appears in the return type, it can be eventually determined from the call-site context. 520 520 Currently, type resolution cannot do enough return-type inferencing while performing eager assertion resolution: the return type information is unknown before the parent expression is resolved, unless the expression is an initialization context where the variable type is known. 521 521 By delaying the assertion resolution until the return type becomes known, this problem can be circumvented. 522 The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in assertions or variables ( associate types).522 The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in assertions or variables (\newterm{associate types}). 523 523 \begin{cfa} 524 524 forall( T | { void foo( @T@ ) } ) int f( float ) { … … 526 526 } 527 527 \end{cfa} 528 This case is rare so forcing every type variable to appear at least once in parameter or return types limitsdoes not limit the expressiveness of \CFA type system to a significant extent.529 The next section presents a proposal for including type declarations in traits rather than having all type variables appear in the trait parameter list, which isprovides equivalent functionality to an unbound type parameter in assertion variables, and also addresses some of the variable cost issue discussed in \VRef{s:ExpressionCostModel}.528 This case is rare so forcing every type variable to appear at least once in parameter or return types does not limit the expressiveness of \CFA type system to a significant extent. 529 \VRef{s:AssociatedTypes} presents a proposal for including type declarations in traits rather than having all type variables appear in the trait parameter list, which provides equivalent functionality to an unbound type parameter in assertion variables, and also addresses some of the variable cost issue discussed in \VRef{s:ExpressionCostModel}. 530 530 531 531 … … 535 535 Based on the experiment results, this approach can improve the performance of expression resolution in general, and sometimes allow difficult instances of assertion resolution problems to be solved that are otherwise infeasible, \eg when the resolution encounters an infinite loop. 536 536 537 The tricky problem in implementing this approach is that the resolution algorithm has side effects, namely modifying the type bindings in the environment.537 \PAB{I identify that} the tricky problem in implementing this approach is that the resolution algorithm has side effects, namely modifying the type bindings in the environment. 538 538 If the modifications are cached, \ie the results that cause the type bindings to be modified, it is also necessary to store the changes to type bindings, too. 539 539 Furthermore, in cases where multiple candidates can be used to satisfy one assertion parameter, all of them must be cached including those that are not eventually selected, since the side effect can produce different results depending on the context. … … 583 583 However, the implementation of the type environment is simplified; 584 584 it only stores a tentative type binding with a flag indicating whether \emph{widening} is possible for an equivalence class of type variables. 585 Formally speaking, this meansthe type environment used in \CFA is only capable of representing \emph{lower-bound} constraints.585 Formally speaking, \PAB{I concluded} the type environment used in \CFA is only capable of representing \emph{lower-bound} constraints. 586 586 This simplification works most of the time, given the following properties of the existing \CFA type system and the resolution algorithms: 587 587 \begin{enumerate} … … 599 599 \end{enumerate} 600 600 601 \CFA does attempt to incorporate upstream type information propagated from variable adeclaration with initializer, since the type of the variable being initialized is known.601 \CFA does attempt to incorporate upstream type information propagated from a variable declaration with initializer, since the type of the variable being initialized is known. 602 602 However, the current type-environment representation is flawed in handling such type inferencing, when the return type in the initializer is polymorphic. 603 603 Currently, an inefficient workaround is performed to create the necessary effect. -
doc/theses/fangren_yu_MMath/uw-ethesis.bib
r7d02d35 rbd72f517 2 2 % For use with BibTeX 3 3 4 @misc{OperOverloading, 5 contributer = {pabuhr@plg}, 6 key = {Operator Overloading}, 7 title = {Operator Overloading}, 8 author = {{WikipediA}}, 9 howpublished= {\url{https://en.wikipedia.org/wiki/Operator_overloading}}, 10 year = 2025, 11 } 12 13 @misc{FuncOverloading, 14 contributer = {pabuhr@plg}, 15 key = {Function Overloading}, 16 title = {Function Overloading}, 17 author = {{WikipediA}}, 18 howpublished= {\url{https://en.wikipedia.org/wiki/Function_overloading}}, 19 year = 2025, 20 } -
doc/theses/fangren_yu_MMath/uw-ethesis.tex
r7d02d35 rbd72f517 100 100 \lstnewenvironment{ada}[1][]{\lstset{language=Ada,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{} 101 101 102 \newcommand{\PAB}[1]{{\color{ red}PAB:#1}}102 \newcommand{\PAB}[1]{{\color{magenta}#1}} 103 103 \newcommand{\newtermFont}{\emph} 104 104 \newcommand{\Newterm}[1]{\newtermFont{#1}}
Note:
See TracChangeset
for help on using the changeset viewer.