Changeset 399074a
- Timestamp:
- Mar 25, 2025, 12:24:05 PM (4 days ago)
- Branches:
- master
- Children:
- d73e667
- Parents:
- f5bf3c2
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
TabularUnified doc/theses/fangren_yu_MMath/intro.tex ¶
rf5bf3c2 r399074a 23 23 A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware. 24 24 These language types are thrust upon programmers with their syntactic/semantic rules and restrictions. 25 These rules are then used to transform a language expression to a hardware expression.25 These rules are then used to transform a language expression to a unique hardware expression. 26 26 Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism. 27 27 Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation. … … 40 40 \newterm{Namespace pollution} refers to loading the global or other namespaces with many names, resulting in paranoia that the compiler could make wrong choices for overloaded names causing failure. 41 41 This fear leads to coding styles where names are partitioned by language mechanisms and qualification is used to make names unique. 42 This approach defeats the purpose of overloading and places an additional coding burden on both the codedeveloper and user.42 This approach defeats the purpose of overloading and places an additional coding burden on both the developer and user. 43 43 As well, many namespace systems provide a mechanism to open their scope returning to normal overloading, \ie no qualification. 44 44 While namespace mechanisms are very important and provide a number of crucial program-development features, protection from overloading is overstated. 45 45 Similarly, lexical nesting is another place where overloading occurs. 46 For example, in object-oriented programming, class mem eber names \newterm{shadow} names within members.46 For example, in object-oriented programming, class member names \newterm{shadow} names within members. 47 47 Some programmers, qualify all member names with @class::@ or @this->@ to make them unique from names defined in members. 48 48 Even nested lexical blocks result in shadowing, \eg multiple nested loop-indices called @i@. … … 115 115 In the following C example: 116 116 \begin{cfa} 117 struct S {}; 118 struct S S; 117 struct S { int S; }; 118 struct S s; 119 void S( struct S S ) { S.S = 1; }; 119 120 enum E { E }; 120 \end{cfa} 121 the names @S@ and @E@ exist is the type and object domain, and C uses the type kinds @struct@ and @enum@ to disambiguate the names. 122 121 enum E e = E; 122 \end{cfa} 123 the overloaded names @S@ and @E@ are separated into the type and object domain, and C uses the type kinds @struct@ and @enum@ to disambiguate the names. 124 In general, types are not overloaded because inferencing them is difficult to imagine in a statically programming language. 125 \begin{cquote} 126 \setlength{\tabcolsep}{26pt} 127 \begin{tabular}{@{}ll@{}} 128 \begin{cfa} 129 struct S { int i; }; // 1 130 struct S { int i, j }; // 2 131 union S { int i; double d; }; // 3 132 typedef int S; // 4 133 \end{cfa} 134 & 135 \begin{cfa} 136 S s; 137 if ( ... ) s = (S){ 3 }; // 1 138 else s = 3; // 4 139 140 \end{cfa} 141 \end{tabular} 142 \end{cquote} 123 143 On the other hand, in ad-hoc overloading of variables and/or functions, the names are in the object domain and each overloaded object name has an anonymous associated type. 124 144 \begin{cfa} 125 @double@ foo( @int@ ); @ int@ foo( @void@ ); @int@ foo( @double, double@ );145 @double@ foo( @int@ ); @char@ foo( @void@ ); @int@ foo( @double, double@ ); 126 146 @double@ foo; @char@ foo; @int@ foo; 127 147 \end{cfa} 128 Notice, the associated type cannot be extracted using @typeof@/\lstinline[language={[11]C++}]{decltype} for typing purposes, as @typeof( foo )@ is always ambiguous. 129 Hence, overloading may not be polymorphism, as no single overloaded entity represents multiple types. 148 Here, the associated type cannot be extracted using @typeof@/\lstinline[language={[11]C++}]{decltype} for typing purposes, as @typeof( foo )@ is always ambiguous. 149 To disambiguate the type of @foo@ requires additional information from the call-site or a cast. 150 \begin{cfa} 151 double d = foo( 3 ); char c = foo(); int i = foo( 3.5, 4.1 ); $\C{// exact match}$ 152 d = foo; c = foo; i = foo; $\C{// exact match}$ 153 i = ((double (*)( int ))foo)( 3.5 ); $\C{// no exact match, explicit cast => conversions}$ 154 \end{cfa} 155 Hence, depending on your perspective, overloading may not be polymorphism, as no single overloaded entity represents multiple types. 130 156 131 157 \item … … 135 161 For exact type matches in overloading, there is a systematic way of matching arguments to parameters, and a function denotes its return type rather than using type inferencing. 136 162 This matching is just as robust as other polymorphic analysis. 137 The ad-hoc aspect isthe implicit transfer functions (conversions) applied to arguments to create an exact parameter type-match, as there may be multiple conversions leading to different exact matches.138 Note, conversion issues apply to non-overloaded and overloaded functions.139 Here, the selection of the conversion functions is based onthe \emph{opinion} of the language (type system), even if the technique used is based on sound rules, like maximizing conversion accuracy (non-lossy).163 The ad-hoc aspect might be the implicit transfer functions (conversions) applied to arguments to create an exact parameter type-match, as there may be multiple conversions leading to different exact matches. 164 Note, conversion issues apply to both non-overloaded and overloaded functions. 165 Here, the selection of the conversion functions can be considered the \emph{opinion} of the language (type system), even if the technique used is based on sound rules, like maximizing conversion accuracy (non-lossy). 140 166 The difference in opinion results when the language conversion rules differ from a programmer's expectations. 141 167 However, without implicit conversions, programmers may have to write an exponential number of functions covering all possible exact-match cases among all reasonable types. 142 168 \CFA's \emph{opinion} on conversions must match C's and then rely on programmers to understand the effects. 143 That is, let the compiler do the heavy-lifting of selecting a \emph{best} set of conversions that m inimizes safety concerns.144 Hence, removing implicit conversions from \CFA is not an option , so it must do the best possible job to get it right.145 146 \item 147 Why are there two forms of \emph{overloading} ( regular and type class) in different programming languages?169 That is, let the compiler do the heavy-lifting of selecting a \emph{best} set of conversions that maximize safety. 170 Hence, removing implicit conversions from \CFA is not an option (as is true in most other programming languages), so it must do the best possible job to get it right. 171 172 \item 173 Why are there two forms of \emph{overloading} (general and parametric) in different programming languages? 148 174 149 175 \noindent 150 \newterm{Regular overloading} occurs when the type-system \emph{knows} a function's argument and return types (or a variable's type for variable overloading). 176 \newterm{General overloading} occurs when the type-system \emph{knows} a function's parameters and return types (or a variable's type for variable overloading). 177 In functional programming-languages, there is always a return type (except for a monad). 151 178 If a return type is specified, the compiler does not have to inference the routine body. 152 179 For example, the compiler has complete knowledge about builtin types and their overloaded arithmetic operators. … … 154 181 Overload resolution then involves finding an exact match between a call and the overload prototypes based on argument type(s) and possibly return context. 155 182 If an \emph{exact} match is not found, the call is either ill formed (ambiguous) or further attempts are made to find a \emph{best} match using transfer functions (conversions). 156 As a consequence, no additional runtime information is needed per call, \ie the call is a direct transfer (branch) with pushed arguments. 157 158 \newterm{Type-class overloading} occurs when the compiler is using currying for type inferencing. 183 As a consequence, no additional runtime information is needed per call, \ie the call is a direct transfer (branch) with possibly converted arguments. 184 185 \noindent 186 \newterm{Parametric overloading} occurs when the type-system does not know a function's parameter and return types. 187 Types are inferred from constant types and/or constraint information that includes typing. 188 Parametric overloading starts with a universal/generic type used to define parameters and returns. 189 \begin{cfa} 190 forall( T ) T identity( T t ) { return t; } 191 int i = identify( 3 ); 192 double d; 193 double e = identity( d ); 194 \end{cfa} 195 Here, the type system is substituting the argument type for @T@, and ensuring the return type also matches. 196 This mechanism can be extended to overloaded parametric functions with multiple type parameters. 197 \begin{cfa} 198 forall( T ) [T, T] identity( T t1, T t2 ) { return [t1, t2 ]; } // return tuple 199 int j; 200 [i, j] = identity( 3, 4 ); 201 forall( T, U ) [T, U] identity( T t1, U t2 ) { return [t1, t2 ]; } 202 [i, d] = identity( i, d ); 203 \end{cfa} 204 Here, the type system is using techniques from general overloading, number and argument types, to disambiguate the overloaded parametric functions. 205 However, this exercise is moot because the types are so universal they have no practical operations within the function, modulo any general operations carried by every type, like copy or print. 206 This kind of polymorphism is restricted to moving type instances as abstract entities; 207 if type erasure is used, there is no way to recover the original type to perform useful actions on its value(s). 208 Hence, parametric overloading requires additional information about the universal types to make them useful. 209 210 This additional information often comes as a set of operations a type must supply (@trait@/-@concept@) and these operations can then be used in the body of the function. 211 \begin{cfa} 212 forall( T | T ?@++@( T, T ) ) T inc( T t ) { return t@++@; } 213 int i = 3 214 i = inc( i ) 215 double d = inc( 3.5 ); 216 \end{cfa} 217 Given a qualifying trait, are its elements inferred or declared? 218 In the above example, the type system infers @int@ for @T@, infers it needs a @++@ operator that takes an @int@ and returns an @int@, and finds this function in the enclosing environment (\eg standard prelude). 219 This implicit inferencing is expensive if matched with implicit conversions when there is no exact match. 220 Alternatively, types opt-in to traits via declarations. 221 \begin{cfa} 222 trait postinc { T ?@++@( T, T ) } 223 type int provide postinc; 224 \end{cfa} 225 In this approach, after inferring @int@ for @T@, @int@ is examined for any necessary traits required in the function, so there is no searching. 226 Once all traits are satisfied, the compiler can inline them, pass functions pointers directly, or pass a virtual table composing the trait. 227 \end{enumerate} 228 The following sections illustrate how overloading is provided in \CFA. 229 230 \begin{comment} 159 231 \begin{lstlisting}[language=Haskell] 160 232 f( int, int ); f( int, float ); -- return types to be determined … … 185 257 \end{cfa} 186 258 187 188 259 In this case, the compiler implicitly changes the overloaded function to a parametrically polymorphic one. 189 260 Hence, the programmer does specify any additional information for the overloading to work. 190 261 Explicit overloading occurs when the compiler has to be told what operations are associated with a type programmer separately defines the associate type and subsequently associates the type with overloaded name. 191 \end{ enumerate}262 \end{comment} 192 263 193 264 \begin{comment} … … 343 414 \subsection{Operator Overloading} 344 415 345 Virtually all programming languages overloadthe arithmetic operators across the basic computational types using the number and type of parameters and returns.346 However, in many programming languages, arithmetic operators are not first class, and hence, they cannot be overloaded by programmers.347 Like \CC, \CFA maps operators to named functions allowing them to be overloaded withuser-defined types.348 The syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.416 Virtually all programming languages provide general overloading of the arithmetic operators across the basic computational types using the number and type of parameters and returns. 417 However, in some languages, arithmetic operators may not be first class, and hence, cannot be overloaded. 418 Like \CC, \CFA does allow general operator overloading for user-defined types. 419 The \CFA syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@. 349 420 Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type. 350 421 \begin{cfa} … … 359 430 Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type. 360 431 Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process. 361 This approach does not matchwith programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.432 This approach matches with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values. 362 433 Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast. 363 434 … … 365 436 \subsection{Function Overloading} 366 437 367 Both \CFA and \CC allow function names to be overloaded, as long as their prototypes differ in the number and type of parameters and returns.438 Both \CFA and \CC allow general overloading for functions, as long as their prototypes differ in the number and type of parameters and returns. 368 439 \begin{cfa} 369 440 void f( void ); $\C[2in]{// (1): no parameter}$ … … 372 443 f( 'A' ); $\C{// select (2)}\CRT$ 373 444 \end{cfa} 374 In this case, the name @f@ is overloaded depending on the number and parameter types. 375 The type system examines each call size and selects the best match based on the number and types of the arguments. 376 Here, there is a perfect match for the call, @f( 'A' )@ with the number and parameter type of function (2). 445 The type system examines each call size and first looks for an exact match and then a best match using conversions. 377 446 378 447 Ada, Scala, and \CFA type-systems also use the return type in resolving a call, to pinpoint the best overloaded name. 448 Essentailly, the return types are \emph{reversed curried} into output parameters of the function. 379 449 For example, in many programming languages with overloading, the following functions are ambiguous without using the return type. 380 450 \begin{cfa} … … 393 463 \end{cfa} 394 464 Again, there is an exact match for each call. 395 As for operator overloading, if there is no exact match, a set of minimal , an implicit conversioncan be added to find a best match.465 As for operator overloading, if there is no exact match, a set of minimal implicit conversions can be added to find a best match. 396 466 \begin{cfa} 397 467 short int = random(); $\C[2in]{// select (1), unsafe}$ … … 416 486 \end{cfa} 417 487 It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems. 418 However, variable overloading within a scope is considered extremelydangerous, without any evidence to corroborate this claim.488 However, variable overloading within a scope is often considered dangerous, without any evidence to corroborate this claim. 419 489 In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems. 420 490 … … 430 500 \end{cfa} 431 501 Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc. 432 The result is a significant reduction in names to access typed constants. 433 434 As an aside, C has a separate namespace for types and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate. 435 \begin{cfa} 436 void S() { 437 struct @S@ { int S; }; 438 @struct S@ S; 439 void S( @struct S@ S ) { S.S = 1; }; 440 } 441 \end{cfa} 442 Here the name @S@ is an aggregate type and field, and a variable and parameter of type @S@. 502 The result is a significant reduction in names to access common constants. 443 503 444 504 … … 472 532 473 533 474 \section{Overload Resolution Strategies}475 476 For languages with user-defined overloading,477 Given an overloaded constant, variable, or (generic) function, there must exist strategies for differentiating among them and selecting the most appropriate one in a given context.478 The criteria commonly used to match operator/function/method names with definitions are: number of parameters, parameter types, parameter order or name, return type, implicit argument type conversions (safe/unsafe), generic, where some features are missing in certain programming languages.479 \VRef[Table]{t:OverloadingFeatures} shows a subset of popular programming languages with overloading and the discriminating features used to disambiguate among overloadings.480 Language C, Go and Rust have no overloading beyond basic types and operators.481 482 \begin{table}483 \caption{Overload Discriminating Features in Programming Languages}484 \label{t:OverloadingFeatures}485 \centering486 487 % https://doc.rust-lang.org/rust-by-example/trait/impl_trait.html488 % https://dl.acm.org/doi/10.1145/75277.75283489 490 \begin{minipage}{\linewidth}491 \setlength{\tabcolsep}{5pt}492 \begin{tabular}{@{}r|cccccccc@{}}493 Feature\,{\textbackslash}\,Language & Ada & \CC & \CFA & Java & Scala & Swift & Rust & Haskell \\494 \hline495 Operator/Function/Method name & O\footnote{except assignment}/F & O/F/M & O/F & M & O/M & O/F/M & X & X \\496 generic name & no & yes\footnote{compile-time only, using template expansion} & yes & yes & yes & yes & X & X \\497 parameter number & yes & yes & yes & yes & yes & yes & X & X \\498 parameter types & yes & yes & yes & yes & yes & yes & X & X \\499 parameter name & no & no & no & no & yes & yes & X & X \\500 return type & yes & no & yes & no & no & yes & X & X \\501 Safe/Unsafe argument conversion & none & yes\footnote{no conversions allowed during template parameter deduction} & S/U502 & S\footnote{unsafe (narrowing) conversion only allowed in assignment or initialization to a primitive variable} & S503 & no\footnote{literals only, Int -> Double (Safe)} & X & X504 \end{tabular}505 \end{minipage}506 \end{table}507 508 509 534 \section{Type Inferencing} 510 535 \label{s:IntoTypeInferencing} … … 861 886 862 887 888 \section{Language Comparison} 889 890 Because \CFA's type system is focused on overloading and overload resolution, \VRef[Table]{t:OverloadingFeatures} compares \CFA with a representive set of popular programming languages and their take on overloading. 891 The first row classifies whether there is general overloading, and what enities may be overloaded. 892 \begin{cfa} 893 int foo( int ); int foo; 894 double foo( int, int ); double foo; 895 \end{cfa} 896 The second row classifies the specialzation mechanisms used distinished among the general overloads. 897 The third row classifies if generic functions can be overloaded based on the number and differing type variables. 898 \begin{cfa} 899 forall( T ) T foo( T t ); 900 forall( T ) T foo( T t, T s ); 901 forall( T, U ) T foo( T t, U s ); 902 \end{cfa} 903 The fourth row classifies the mechnaism used to specialize provide if generic functions can be overloaded based on the number and differing type variables. 904 The fifth row classifies if conversions are attempted beyond exact match. 905 906 \begin{table} 907 \caption{Overload Discriminating Features in Programming Languages} 908 \label{t:OverloadingFeatures} 909 \centering 910 \begin{minipage}{\linewidth} 911 \setlength{\tabcolsep}{5pt} 912 \begin{tabular}{@{}r|cccccccc@{}} 913 Feature\,{\textbackslash}\,Language & Ada & \CC & \CFA & Java & Scala & Swift & Rust & Haskell \\ 914 \hline 915 general\footnote{overloadable entities: V $\Rightarrow$ variable, O $\Rightarrow$ operator, F $\Rightarrow$ function, M $\Rightarrow$ member} 916 & O\footnote{except assignment}/F & O/F/M & V/O/F & M\footnote{not universal} & O/M & O/F/M & no & no \\ 917 general constraints\footnote{\# $\Rightarrow$ number, T $\Rightarrow$ type, N $\Rightarrow$ name, R $\Rightarrow$ return type} 918 & \#/T/R & \#/T & \#/T/R & \#/T & \#/T/N/R & \#/T/N/R & \#/T/N & T/R \\ 919 type parameters & no & yes & yes & yes & yes & yes & yes & yes \\ 920 type constraint\footnote{T $\Rightarrow$ trait/concept, B $\Rightarrow$ type bounds, TC $\Rightarrow$ type class} 921 & no & T & T & T & B & T & T & TC \\ 922 %parameter number & yes & yes & yes & yes & yes & yes & maybe & no \\ 923 %parameter types & yes & yes & yes & yes & yes & yes & yes & yes \\ 924 %parameter name & no & no & no & no & yes & yes & no & no \\ 925 %return type & yes & no & yes & no & no & yes & no & yes \\ 926 Safe/Unsafe arg. conv. & no & S/U\footnote{no conversions allowed during template parameter deduction} & S/U 927 & S\footnote{unsafe (narrowing) conversion only allowed in assignment or initialization to a primitive (builtin) type} & S 928 & no\footnote{literals only, Int $\rightarrow$ Double (Safe)} & no & no 929 \end{tabular} 930 \end{minipage} 931 % https://doc.rust-lang.org/rust-by-example/trait/impl_trait.html 932 % https://dl.acm.org/doi/10.1145/75277.75283 933 \end{table} 934 935 863 936 \section{Contributions} 864 937
Note: See TracChangeset
for help on using the changeset viewer.