Ignore:
Timestamp:
Dec 7, 2024, 6:47:05 PM (11 days ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
0b98381
Parents:
9861ef2
Message:

thesis proofreading

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/fangren_yu_MMath/intro.tex

    r9861ef2 rb4c6e10  
    22
    33This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the type resolver used to select polymorphic types among overloaded names.
    4 The \CFA type-system has a number of unique features making it different from all other programming languages.
    5 
    64Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.
    75\begin{quote}
     
    108Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
    119In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process.
    12 Depending on the language, ambiguous cases are resolved using some form of qualification and/or casting.
    13 
    14 One of the key goals in \CFA is to push the boundary on overloading, and hence, overload resolution.
     10Depending on the language, any ambiguous cases are resolved using some form of qualification and/or casting.
     11
     12Therefore, one of the key goals in \CFA is to push the boundary on overloading, and hence, overload resolution.
     13As well, \CFA follows the current trend of replacing nominal inheritance with traits.
     14Together, the resulting \CFA type-system has a number of unique features making it different from all other programming languages.
    1515
    1616
     
    2222
    2323All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings.
    24 A programming language and its compiler present ways to declare types that ultimately map into those provided by the underlying hardware.
    25 These language types are thrust upon programmers, with their syntactic and semantic rules, and resulting restrictions.
    26 A language type-system defines these rules and uses them to understand how an expression is to be evaluated by the hardware.
     24A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware.
     25These language types are thrust upon programmers, with their syntactic and semantic rules and restrictions.
     26These rules are used to transform a language expression to a hardware expression.
    2727Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism.
    2828Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation.
     
    3434Virtually all programming languages overload the arithmetic operators across the basic computational types using the number and type of parameters and returns.
    3535Like \CC, \CFA also allows these operators to be overloaded with user-defined types.
    36 The syntax for operator names uses the @'?'@ character to denote a parameter, \eg unary operators: @?++@, @++?@, binary operator @?+?@.
     36The syntax for operator names uses the @'?'@ character to denote a parameter, \eg unary left and right operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
    3737Here, a user-defined type is extended with an addition operation with the same syntax as builtin types.
    3838\begin{cfa}
     
    4343s1 = @?+?@( s1, s2 );   $\C{// direct call}\CRT$
    4444\end{cfa}
    45 The type system examines each call size and selects the best matching overloaded function based on the number and types of arguments.
     45The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
    4646If there are mixed-mode operands, @2 + 3.5@, the type system, like in C/\CC, attempts (safe) conversions, converting the argument type(s) to the parameter type(s).
    4747Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be the same type.
     
    7171double d = f( 3 );              $\C{// select (2)}\CRT$
    7272\end{cfa}
    73 However, if the type system looks at the return type, there is an exact match for each call, which matches with programmer intuition and expectation.
     73Alternatively, if the type system looks at the return type, there is an exact match for each call, which matches with programmer intuition and expectation.
    7474This capability can be taken to the extreme, where there are no function parameters.
    7575\begin{cfa}
     
    9898}
    9999\end{cfa}
    100 It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems, but variable overloading within a scope is often considered extremely dangerous.
     100It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems.
     101However, variable overloading within a scope is considered extremely dangerous, without any evidence to corroborate this claim.
     102Similarly, function overloading occurs silently within the global scope in \CC from @#include@ files all the time without problems.
    101103
    102104In \CFA, the type system simply treats overloaded variables as an overloaded function returning a value with no parameters.
    103 Hence, no significant effort is required to support this feature.
    104 Leveraging the return type to disambiguate is essential because variables have no parameters.
     105Hence, no significant effort is required to support this feature by leveraging the return type to disambiguate as variables have no parameters.
    105106\begin{cfa}
    106107int MAX = 2147483647;   $\C[2in]{// (1); overloaded on return type}$
    107 double MAX = ...;               $\C{// (2); overloaded on return type}$
     108long int MAX = ...;             $\C{// (2); overloaded on return type}$
     109double MAX = ...;               $\C{// (3); overloaded on return type}$
    108110int i = MAX;                    $\C{// select (1)}$
    109 double d = MAX;                 $\C{// select (2)}\CRT$
    110 \end{cfa}
     111long int i = MAX;               $\C{// select (2)}$
     112double d = MAX;                 $\C{// select (3)}\CRT$
     113\end{cfa}
     114Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc.
     115The result is a significant reduction in names to access typed constants.
     116% Paraphrasing Shakespeare, ``The \emph{name} is the thing.''.
     117
     118
     119\section{Constant Overloading}
     120
     121\CFA is unique in providing restricted constant overloading for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
     122In addition, several operations are defined in terms of values @0@ and @1@, \eg:
     123\begin{cfa}
     124if (x) x++                                                                      $\C{// if (x != 0) x += 1;}$
     125\end{cfa}
     126Every @if@ and iteration statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
     127These two constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts.
     128The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work.
     129\begin{cfa}
     130struct S { int i, j; };
     131void ?{}( S & s, zero_t ) { s.[i,j] = 0; }
     132void ?{}( S & s, one_t ) { s.[i,j] = 1; }
     133int ?!=?( S s, zero_t ) { return s.i != 0 && s.j != 0; }
     134S ?+=?( S & s, one_t ) { s.i++; s.j++; return s; }
     135S ?-=?( S & s, one_t ) { s.i--; s.j--; return s; }
     136S ?=?( S & dst, one_t ) { return dst.[i,j] = 1; }
     137void foo() {
     138        S s = @0@;
     139        s = @1@;
     140        if ( @s@ ) @s++@;
     141}
     142\end{cfa}
     143Hence, type @S@ is first-class with respect to the basic types, working with all existing implicit C mechanisms.
    111144
    112145
     
    121154z = "abc";                              $\C{// assignment}$
    122155\end{cfa}
     156For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
    123157For type-and-initialization, the specified and initialization types may not agree.
    124 Similarity, for assignment the current variable and expression types may not agree.
    125 For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages.
    126 In the other cases, the compiler may select the type by melding programmer and context information.
     158For initialization-only, the compiler may select the type by melding programmer and context information.
    127159When the compiler participates in type selection, it is called \newterm{type inferencing}.
    128160Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed values and performs a (possibly lossy) action to convert one value to the type of the other variable.
     161Finally, for assignment, the current variable and expression types may not agree.
    129162
    130163One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}.
     
    147180\end{cfa}
    148181In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions.
    149 Note, like template meta-programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of the @f@.
     182Note, like template meta-programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.
    150183Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types;
    151184in this case, parametric polymorphism can allow separate compilation.
    152185In languages with type inferencing, there is often limited overloading to reduce the search space, which introduces the naming problem.
    153 Return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.
    154 
    155 In simpler type inferencing systems, such as C/\CC/\CFA, there are more specific usages.
     186Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.
     187
     188In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages.
    156189\begin{cquote}
    157190\setlength{\tabcolsep}{10pt}
     
    190223This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages.
    191224\item
    192 Ensuring the type of secondary variables, always matches a primary variable.
     225Ensuring the type of secondary variables, always match a primary variable.
    193226\begin{cfa}
    194227int x; $\C{// primary variable}$
    195228typeof(x) y, z, w; $\C{// secondary variables match x's type}$
    196229\end{cfa}
    197 If the type of @x@ changes, the types of the secondary variables correspondingly update.
     230If the type of @x@ changes, the type of the secondary variables correspondingly updates.
    198231\end{itemize}
    199232Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing.
     
    208241\section{Type-Inferencing Issues}
    209242
    210 Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions or confusions.
     243Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions, or confusions.
    211244
    212245A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience.
     
    227260In Haskell, it is common for programmers to brand (type) function parameters.
    228261
    229 A confusion is large blocks of code where all declarations are @auto@.
     262A confusion is large blocks of code where all declarations are @auto@, as is now common in \CC.
    230263As a result, understanding and changing the code becomes almost impossible.
    231264Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code.
    232 In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types.
     265In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types for them.
    233266For example, given:
    234267\begin{cfa}
     
    243276In this situation, having the type name or its short alias is essential.
    244277
    245 \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
     278The \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.
    246279Type inferencing defeats this goal because there is no left-hand type.
    247280Fundamentally, type inferencing tries to magic away variable types from the programmer.
    248281However, this results in lazy programming with the potential for poor performance and safety concerns.
    249282Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think!
    250 A similar issue is garbage collection, where storage management is masked, resulting in poor program design and performance.
     283A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{
     284There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.}
    251285The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming.
    252286Understanding space and time issues are an essential part of the programming craft.
Note: See TracChangeset for help on using the changeset viewer.