Changeset 0f070a4
- Timestamp:
- Jan 26, 2025, 6:33:14 PM (6 weeks ago)
- Branches:
- master
- Children:
- de8a0a4
- Parents:
- c0beea3
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
TabularUnified doc/theses/fangren_yu_MMath/intro.tex ¶
rc0beea3 r0f070a4 6 6 \begin{cfa} 7 7 T sum( T a[$\,$], size_t size ) { 8 @T@ total = { @0@ }; // size, 0 for type T8 @T@ total = { @0@ }; $\C[1.75in]{// size, 0 for type T}$ 9 9 for ( size_t i = 0; i < size; i += 1 ) 10 total @+=@ a@[@i@]@; // + and subscript for T10 total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$ 11 11 return total; 12 12 } … … 22 22 All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings. 23 23 A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware. 24 These language types are thrust upon programmers , with their syntactic andsemantic rules and restrictions.25 These rules are used to transform a language expression to a hardware expression.24 These language types are thrust upon programmers with their syntactic/semantic rules and restrictions. 25 These rules are then used to transform a language expression to a hardware expression. 26 26 Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism. 27 27 Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation. … … 31 31 \section{Overloading} 32 32 33 Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files.34 33 \begin{quote} 35 34 There are only two hard things in Computer Science: cache invalidation and \emph{naming things}. --- Phil Karlton 36 35 \end{quote} 36 Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files. 37 37 Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions. 38 38 In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process. 39 Depending on the language, any ambiguous cases are resolved using some form of qualification and/or casting.39 Depending on the language, any ambiguous cases are resolved explicitly using some form of qualification and/or cast. 40 40 41 41 … … 45 45 Like \CC, \CFA maps operators to named functions and allows these operators to be overloaded with user-defined types. 46 46 The syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@. 47 Here, a user-defined type is extended with an addition operation with the same syntax as builtin types.47 Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type. 48 48 \begin{cfa} 49 49 struct S { int i, j }; … … 55 55 The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments. 56 56 If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s). 57 Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be the same type. 58 Note, without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types. 57 Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type. 58 Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process. 59 Without implicit conversions, programmers must write an exponential number of functions covering all possible exact-match cases among all possible types. 59 60 This approach does not match with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values. 61 Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast. 60 62 61 63 … … 81 83 double d = f( 3 ); $\C{// select (2)}\CRT$ 82 84 \end{cfa} 83 Alternatively, if the type system looks atthe return type, there is an exact match for each call, which again matches with programmer intuition and expectation.84 This capability can be taken to the extreme, where the re are no function parameters.85 Alternatively, if the type system uses the return type, there is an exact match for each call, which again matches with programmer intuition and expectation. 86 This capability can be taken to the extreme, where the only differentiating factor is the return type. 85 87 \begin{cfa} 86 88 int random( void ); $\C[2in]{// (1); overloaded on return type}$ … … 90 92 \end{cfa} 91 93 Again, there is an exact match for each call. 92 If there is no exact match, a set of minimal, safe conversions can be added to find a best match, as for operator overloading. 94 As for operator overloading, if there is no exact match, a set of minimal, an implicit conversion can be added to find a best match. 95 \begin{cfa} 96 short int = random(); $\C[2in]{// select (1), unsafe}$ 97 long double = random(); $\C{// select (2), safe}\CRT$ 98 \end{cfa} 93 99 94 100 … … 96 102 97 103 Unlike most programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes. 98 (Shadow overloading is also possible for functions, if a language supports nested function declarations, \eg \CC named, nested, lambda functions.) 104 Shadow overloading is also possible for functions, in languages supporting nested-function declarations, \eg \CC named, nested, lambda functions. 99 105 \begin{cfa} 100 106 void foo( double d ); … … 109 115 \end{cfa} 110 116 It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems. 111 However, variable overloading within a scope is oftenconsidered extremely dangerous, without any evidence to corroborate this claim.117 However, variable overloading within a scope is considered extremely dangerous, without any evidence to corroborate this claim. 112 118 In contrast, function overloading in \CC occurs silently within the global scope from @#include@ files all the time without problems. 113 119 114 In \CFA, the type system simply treats overloaded variablesas an overloaded function returning a value with no parameters.115 Hence, no significant effort is required to support this feature by leveraging the return type to disambiguate as variables haveno parameters.120 In \CFA, the type system simply treats an overloaded variable as an overloaded function returning a value with no parameters. 121 Hence, no effort is required to support this feature as it is available for differentiating among overloaded functions with no parameters. 116 122 \begin{cfa} 117 123 int MAX = 2147483647; $\C[2in]{// (1); overloaded on return type}$ … … 125 131 The result is a significant reduction in names to access typed constants. 126 132 127 As an aside, C has a separate namespace for type and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate.133 As an aside, C has a separate namespace for types and variables allowing overloading between the namespaces, using @struct@ (qualification) to disambiguate. 128 134 \begin{cfa} 129 135 void S() { … … 133 139 } 134 140 \end{cfa} 141 Here the name @S@ is an aggregate type and field, and a variable and parameter of type @S@. 135 142 136 143 … … 145 152 for ( ; x; --x ) => for ( ; x @!= 0@; x @-= 1@ ) 146 153 \end{cfa} 147 To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work with the special @0@ and @1@ contexts.154 To generalize this feature, both constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly work within the special @0@ and @1@ contexts. 148 155 The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work. 149 156 \begin{cfa} … … 176 183 \end{cfa} 177 184 For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages. 178 For type-and-initialization, the specified and initialization types may not agree .185 For type-and-initialization, the specified and initialization types may not agree requiring an implicit/explicit conversion. 179 186 For initialization-only, the compiler may select the type by melding programmer and context information. 180 187 When the compiler participates in type selection, it is called \newterm{type inferencing}. 181 Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed va lues and performs a (possibly lossy) action to convert one value to the type of the other variable.188 Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed variables and performs a (possibly lossy) value conversion from one type to the other. 182 189 Finally, for assignment, the current variable and expression types may not agree. 183 190 Discovering a variable or function type is complex and has limitations. 184 The following covers these issues, and why some schemes arenot amenable with the \CFA type system.191 The following covers these issues, and why this scheme is not amenable with the \CFA type system. 185 192 186 193 One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}. … … 203 210 \end{cfa} 204 211 In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions. 205 Note, like template meta programming, there couldbe a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@.212 Like template meta-programming, there can be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@. 206 213 Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types; 207 214 in this case, parametric polymorphism can allow separate compilation. … … 246 253 This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages. 247 254 \item 248 Ensuring the type of secondary variables, match a primary variable (s).255 Ensuring the type of secondary variables, match a primary variable. 249 256 \begin{cfa} 250 257 int x; $\C{// primary variable}$ … … 252 259 \end{cfa} 253 260 If the type of @x@ changes, the type of the secondary variables correspondingly updates. 261 There can be strong software-engineering reasons for binding the types of these variables. 254 262 \end{itemize} 255 263 Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing. … … 269 277 270 278 A restriction is the conundrum in type inferencing of when to \emph{brand} a type. 271 That is, when is the type of the variable/function more important than the type of its initialization expression .272 For example, if a change is made in an initialization expression, it can ca use cascading typechanges and/or errors.273 At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or in error when it changes.279 That is, when is the type of the variable/function more important than the type of its initialization expression(s). 280 For example, if a change is made in an initialization expression, it can cascade type changes producing many other changes and/or errors. 281 At some point, a variable's type needs to remain constant and the initializing expression needs to be modified or be in error when it changes. 274 282 Often type-inferencing systems allow restricting (\newterm{branding}) a variable or function type, so the complier can report a mismatch with the constant initialization. 275 283 \begin{cfa} … … 283 291 In Haskell, it is common for programmers to brand (type) function parameters. 284 292 285 A confusion is largeblocks of code where all declarations are @auto@, as is now common in \CC.293 A confusion is blocks of code where all declarations are @auto@, as is now common in \CC. 286 294 As a result, understanding and changing the code becomes almost impossible. 287 295 Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code. … … 299 307 In this situation, having the type name or its short alias is essential. 300 308 301 The\CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.309 \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression. 302 310 Type inferencing defeats this goal because there is no left-hand type. 303 311 Fundamentally, type inferencing tries to magic away variable types from the programmer. … … 308 316 The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming. 309 317 Understanding space and time issues is an essential part of the programming craft. 310 Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, implicit type-inferencing is unsupported.311 Should a significant need arise, this featurecan be revisited.318 Given @typedef@ and @typeof@ in \CFA, and the strong desire to use the left-hand type in resolution, the decision was made not to support implicit type-inferencing in the type system. 319 Should a significant need arise, this decision can be revisited. 312 320 313 321 … … 334 342 335 343 To constrain polymorphic types, \CFA uses \newterm{type assertions}~\cite[pp.~37-44]{Alphard} to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. 336 For example, the function @twice@ works for any type @T@ with a matching addition operator.344 Here, the function @twice@ works for any type @T@ with a matching addition operator. 337 345 \begin{cfa} 338 346 forall( T @| { T ?+?(T, T); }@ ) T twice( T x ) { return x @+@ x; } 339 347 int val = twice( twice( 3 ) ); $\C{// val == 12}$ 340 348 \end{cfa} 341 For example. parametric polymorphism and assertions occurs in existing type-unsafe (@void *@) C @qsort@ to sort an array.349 Parametric polymorphism and assertions occur in existing type-unsafe (@void *@) C functions, like @qsort@ for sorting an array of unknown values. 342 350 \begin{cfa} 343 351 void qsort( void * base, size_t nmemb, size_t size, int (*cmp)( const void *, const void * ) ); … … 386 394 } 387 395 // select type and size from left-hand side 388 int * ip = malloc(); double * dp = malloc(); $@$[aligned(64)] struct S {...} * sp = malloc();396 int * ip = malloc(); double * dp = malloc(); [[aligned(64)]] struct S {...} * sp = malloc(); 389 397 \end{cfa} 390 398 The @sized@ assertion passes size and alignment as a data object has no implicit assertions. 391 399 Both assertions are used in @malloc@ via @sizeof@ and @_Alignof@. 392 393 These mechanism are used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions. 394 Hence, existing C legacy code is leveraged as much as possible; 400 In practise, this polymorphic @malloc@ is unwrapped by the C compiler and the @if@ statement is elided producing a type-safe call to @malloc@ or @memalign@. 401 402 This mechanism is used to construct type-safe wrapper-libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions. 403 Here, existing C legacy code is leveraged as much as possible; 395 404 other programming languages must build supporting libraries from scratch, even in \CC. 396 405 … … 422 431 \end{tabular} 423 432 \end{cquote} 424 Traits are simply flatten at the use point, as if written in full by the programmer, where traits often contain overlapping assertions, \eg operator @+@. 433 Traits are implemented by flatten them at use points, as if written in full by the programmer. 434 Flattening often results in overlapping assertions, \eg operator @+@. 425 435 Hence, trait names play no part in type equivalence. 426 Note, thetype @T@ is an object type, and hence, has the implicit internal trait @otype@.436 In the example, type @T@ is an object type, and hence, has the implicit internal trait @otype@. 427 437 \begin{cfa} 428 438 trait otype( T & | sized(T) ) { … … 433 443 }; 434 444 \end{cfa} 435 The implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.445 These implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return. 436 446 437 447 If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting. … … 445 455 \begin{enumerate}[leftmargin=*] 446 456 \item 447 Write bespoke data structures for each context they are needed.457 Write bespoke data structures for each context. 448 458 While this approach is flexible and supports integration with the C type checker and tooling, it is tedious and error prone, especially for more complex data structures. 449 459 \item … … 452 462 \item 453 463 Use preprocessor macros, similar to \CC @templates@, to generate code that is both generic and type checked, but errors may be difficult to interpret. 454 Furthermore, writing and using preprocessor macros is difficult and inflexible.464 Furthermore, writing and using complex preprocessor macros is difficult and inflexible. 455 465 \end{enumerate} 456 466 457 467 \CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types. 458 468 \CFA generic types integrate efficiently and naturally with the existing polymorphic functions, while retaining backward compatibility with C and providing separate compilation. 459 However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.469 For concrete parameters, the generic-type definition can be inlined, like \CC templates, if its definition appears in a header file (\eg @static inline@). 460 470 461 471 A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration and instantiated using a parenthesized list of types after the type name. … … 484 494 \end{cquote} 485 495 \CFA generic types are \newterm{fixed} or \newterm{dynamic} sized. 486 Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the irtype parameters.496 Fixed-size types have a fixed memory layout regardless of type parameters, whereas dynamic types vary in memory layout depending on the type parameters. 487 497 For example, the type variable @T *@ is fixed size and is represented by @void *@ in code generation; 488 498 whereas, the type variable @T@ is dynamic and set at the point of instantiation. 489 499 The difference between fixed and dynamic is the complexity and cost of field access. 490 500 For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions. 491 For dynamic, field offsets are comp uted at compile timeat the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.501 For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine. 492 502 See~\cite[\S~3.2]{Moss19} for complete implementation details. 493 503 … … 517 527 \section{Contributions} 518 528 529 The \CFA compiler performance and type capability have been greatly improved through my development work. 519 530 \begin{enumerate} 520 \item The \CFA compiler performance and capability have been greatly improved through recent development. The compilation time of various \CFA library units and test programs have been reduced from the order of minutes down to 10-20 seconds, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features. The details of compiler optimization work are covered in a previous technical report. 521 \item The thesis presents a systematic review of the new features that have been added to the \CFA language and its type system. Some of the more recent inclusions to \CFA such as tuples and generic structure types were not well tested when they were being developed, due to the limitation of compiler performance. Several issues coming from the interactions of various language features are identified and discussed in this thesis; some of them are now fully resolved, while others are given temporary fixes and need to be reworked in the future. 522 \item Finally, this thesis provides constructive ideas of fixing the outstanding issues in \CFA language design and implementation, and gives a path for future improvements to \CFA language and compiler. 531 \item 532 The compilation time of various \CFA library units and test programs has been reduced by an order of magnitude, from minutes to seconds \see{\VRef[Table]{t:SelectedFileByCompilerBuild}}, which made it possible to develop and test more complicated \CFA programs that utilize sophisticated type system features. 533 The details of compiler optimization work are covered in a previous technical report~\cite{Yu20}, which essentially forms part of this thesis. 534 \item 535 The thesis presents a systematic review of the new features added to the \CFA language and its type system. 536 Some of the more recent inclusions to \CFA, such as tuples and generic structure types, were not well tested during development due to the limitation of compiler performance. 537 Several issues coming from the interactions of various language features are identified and discussed in this thesis; 538 some of them I have resolved, while others are given temporary fixes and need to be reworked in the future. 539 \item 540 Finally, this thesis provides constructive ideas for fixing a number of high-level issues in the \CFA language design and implementation, and gives a path for future improvements to the language and compiler. 523 541 \end{enumerate} 524 542
Note: See TracChangeset
for help on using the changeset viewer.