Changeset b4c6e10
- Timestamp:
- Dec 7, 2024, 6:47:05 PM (9 months ago)
- Branches:
- master
- Children:
- 0b98381
- Parents:
- 9861ef2
- Location:
- doc
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/bibliography/pl.bib
r9861ef2 rb4c6e10 6305 6305 } 6306 6306 6307 @techreport{Yu20, 6308 keywords = {Cforall, cfa-cc, transpiler}, 6309 contributer = {pabuhr@plg}, 6310 title = {Optimization of \textsf{C}$\mathbf{\forall}$ Compiler with Case Studies}, 6311 author = {Fangren Yu}, 6312 institution = {School of Computer Science}, 6313 address = {University of Waterloo, Waterloo, Ontario, Canada}, 6314 month = jan, 6315 year = {2021}, 6316 note = {\url{https://cforall.uwaterloo.ca/doc/Fangren_Yu_Report_F20.pdf}}, 6317 } 6318 6307 6319 @article{Ford82, 6308 6320 keywords = {}, -
doc/theses/fangren_yu_MMath/content1.tex
r9861ef2 rb4c6e10 674 674 In addition, a semi-non-compatible change is made so that Plan-9 syntax means a forward declaration in a nested type. 675 675 Since the Plan-9 extension is not part of C and rarely used, this change has minimal impact. 676 Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which good ``eye-candy'' when reading a structure definition to spot Plan-9 definitions.676 Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which is good ``eye-candy'' when reading a structure definition to spot Plan-9 definitions. 677 677 Finally, the following code shows the value and pointer polymorphism. 678 678 \begin{cfa} -
doc/theses/fangren_yu_MMath/content2.tex
r9861ef2 rb4c6e10 2 2 \label{c:content2} 3 3 4 \CFA's type system is powerful but fairly complicated, which leads to higher compilation cost. 5 The compiler needs to analyze each expression with many possible forms of overloading. 6 Variables can be overloaded in \CFA, and functions can be overloaded by the argument types as well as the return types. 7 Combined with the polymorphism introduced by forall clauses and generic types, the complexity of expression analysis can go up very quickly. 8 Designing a rule set that behaves mostly as expected, and implementing it as an efficient algorithm for actual use, are both very challenging tasks. 9 As the \CFA translator's performance improves to a level that can compile a mid-sized program in a reasonable amount of time, the development of \CFA's standard library also speeds up and many new features utilizing the expressiveness of \CFA's type system has been implemented, such as generic container types similar to those in \CC's standard template library. 10 During the process of development, many weaknesses and design flaws of \CFA type system have been discovered. 11 Some of those problems arise from the newly introduced language features described in the previous chapter, and fixing those unexpected interactions with the type system is especially difficult. 12 This chapter describes the type resolution rules currently in use and some major problems that have been identified. 13 Not all of those problems have got solutions yet, because fixing them may require redesigning parts of the \CFA language at a larger scale. 14 15 16 \section{Expression Cost Model} 17 18 \CFA has been using an expression cost model to resolve ambiguity of overloaded expressions from the very beginning. 19 Since most operators can be overloaded in \CFA (excluding a few operators that have special semantics, such as the comma operator, and the short-circuit logical operators \&\& and ||, which require the operands to be evaluated in order), they are treated the same way as other function calls and the same rules for overload resolution must apply to them as well. 20 21 In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, that accounts for the type conversions needed from the argument type to the corresponding declared function parameter type, as well as the polymorphic types and restrictions introduces by the forall clause. 4 The \CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with associated restrictions, and C's multiple implicit type-conversions must be respected. 5 This generality leads to internal complexity and correspondingly higher compilation cost. 6 The reason is that the type resolver must analyze \emph{each} component of an expression with many possible forms of overloading, type restrictions, and conversions. 7 Designing a ruleset that behaves as expected, \ie matches with C programmer expectations, and implementing an efficient algorithm is a very challenging task. 8 9 My first work on the \CFA type-system was as a co-op student. 10 At that time, compiling a medium-sized \CFA program using advanced polymorphism took multiple minutes (5+ minutes)~\cite[\S~5]{Yu20}. 11 After finding and fixing the following resolution problems: 12 \begin{enumerate}[itemsep=0pt] 13 \item 14 new AST data structure 15 \item 16 special symbol table and argument-dependent lookup 17 \item 18 late assertion-satisfaction 19 \item 20 revised function-type representation 21 \item 22 skip pruning on expressions for function types 23 \end{enumerate} 24 \VRef[Table]{t:SelectedFileByCompilerBuild} shows the improvement of selected tests with accumulated reductions in compile time across each of the 5 fixes. 25 The large reduction in compilation times, significantly improved the development of the \CFA's runtime because of its frequent compilation cycles. 26 27 \begin{table}[htb] 28 \centering 29 \caption{Compile time of selected files by compiler build in seconds} 30 \label{t:SelectedFileByCompilerBuild} 31 \lstset{deletekeywords={mutex,thread}} 32 \setlength{\extrarowheight}{1pt} 33 \vspace*{-4pt} 34 \begin{tabular}{l|r|rrrrr} 35 & \multicolumn{1}{c|}{Original} & \multicolumn{5}{c}{Accumulative fixes} \\ 36 \cline{3-7} 37 Test case & \multicolumn{1}{c|}{resolver} & \multicolumn{1}{c}{1} & \multicolumn{1}{c}{2} & \multicolumn{1}{c}{3} & \multicolumn{1}{c}{4} & \multicolumn{1}{c}{5} \\ 38 \hline 39 @lib/fstream@ & 86.4 & 25.2 & 10.8 & 9.5 & 7.8 & 7.1 \\ 40 @lib/mutex@ & 210.4 & 77.4 & 16.7 & 15.1 & 12.6 & 11.7 \\ 41 @lib/vector@ & 17.2 & 8.9 & 3.1 & 2.8 & 2.4 & 2.2 \\ 42 @lib/stdlib@ & 16.6 & 8.3 & 3.2 & 2.9 & 2.6 & 2.4 \\ 43 @test/io2@ & 300.8 & 53.6 & 43.2 & 27.9 & 19.1 & 16.3 \\ 44 @test/thread@ & 210.9 & 73.5 & 17.0 & 15.1 & 12.6 & 11.8 \\ 45 \end{tabular} 46 \smallskip 47 \newline 48 Results are average of 5 runs (3 runs if time exceeds 100 seconds) 49 \end{table} 50 51 Since then, many new features utilizing the expressiveness of \CFA's type system have been implemented, such as generic container types similar to those in \CC's standard template library. 52 During the development of multiple \CFA programs and libraries, more weaknesses and design flaws have been discovered within the type system. 53 Some of those problems arise from the newly introduced language features described in the previous chapter. 54 In addition, fixing unexpected interactions within the type system has presented challenges. 55 This chapter describes in detail the type-resolution rules currently in use and some major problems that have been identified. 56 Not all of those problems have solutions, because fixing them may require redesigning parts of the \CFA type system at a larger scale, which correspondingly affects the language design. 57 58 59 \section{Expression Cost-Model} 60 61 \CFA has been using an expression cost-model to resolve ambiguity of overloaded expressions from the very beginning. 62 Most \CFA operators can be overloaded in \CFA\footnote{ 63 Excluding the comma operator, short-circuit logical operators \lstinline{&&} and \lstinline{||} (control structures) and ternary conditional operator \lstinline{?} (control structure). 64 While \CC overloads the short-circuit logical operators, the overloaded operators do not exhibit the short-circuit semantics, which is confusing.}; 65 hence, they are treated the same way as other function calls, and the same rules for overload resolution must apply to them as well. 66 67 In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, that accounts for the type conversions needed from the argument type to the corresponding declared function parameter type, as well as the polymorphic types and restrictions introduces by the @forall@ clause. 22 68 Currently the expression cost used in \CFA has the following components, ranked from higher to lower by importance: 23 69 \begin{enumerate} -
doc/theses/fangren_yu_MMath/intro.tex
r9861ef2 rb4c6e10 2 2 3 3 This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the type resolver used to select polymorphic types among overloaded names. 4 The \CFA type-system has a number of unique features making it different from all other programming languages.5 6 4 Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources, like include files. 7 5 \begin{quote} … … 10 8 Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions. 11 9 In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process. 12 Depending on the language, ambiguous cases are resolved using some form of qualification and/or casting. 13 14 One of the key goals in \CFA is to push the boundary on overloading, and hence, overload resolution. 10 Depending on the language, any ambiguous cases are resolved using some form of qualification and/or casting. 11 12 Therefore, one of the key goals in \CFA is to push the boundary on overloading, and hence, overload resolution. 13 As well, \CFA follows the current trend of replacing nominal inheritance with traits. 14 Together, the resulting \CFA type-system has a number of unique features making it different from all other programming languages. 15 15 16 16 … … 22 22 23 23 All computers have multiple types because computer architects optimize the hardware around a few basic types with well defined (mathematical) operations: boolean, integral, floating-point, and occasionally strings. 24 A programming language and its compiler present ways to declare types that ultimately map into th oseprovided by the underlying hardware.25 These language types are thrust upon programmers, with their syntactic and semantic rules , and resultingrestrictions.26 A language type-system defines these rules and uses them to understand how an expression is to be evaluated by the hardware.24 A programming language and its compiler present ways to declare types that ultimately map into the ones provided by the underlying hardware. 25 These language types are thrust upon programmers, with their syntactic and semantic rules and restrictions. 26 These rules are used to transform a language expression to a hardware expression. 27 27 Modern programming-languages allow user-defined types and generalize across multiple types using polymorphism. 28 28 Type systems can be static, where each variable has a fixed type during execution and an expression's type is determined once at compile time, or dynamic, where each variable can change type during execution and so an expression's type is reconstructed on each evaluation. … … 34 34 Virtually all programming languages overload the arithmetic operators across the basic computational types using the number and type of parameters and returns. 35 35 Like \CC, \CFA also allows these operators to be overloaded with user-defined types. 36 The syntax for operator names uses the @'?'@ character to denote a parameter, \eg unary operators: @?++@, @++?@, binary operator @?+?@.36 The syntax for operator names uses the @'?'@ character to denote a parameter, \eg unary left and right operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@. 37 37 Here, a user-defined type is extended with an addition operation with the same syntax as builtin types. 38 38 \begin{cfa} … … 43 43 s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$ 44 44 \end{cfa} 45 The type system examines each call si ze and selects the best matching overloaded function based on the number and types of arguments.45 The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments. 46 46 If there are mixed-mode operands, @2 + 3.5@, the type system, like in C/\CC, attempts (safe) conversions, converting the argument type(s) to the parameter type(s). 47 47 Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be the same type. … … 71 71 double d = f( 3 ); $\C{// select (2)}\CRT$ 72 72 \end{cfa} 73 However, if the type system looks at the return type, there is an exact match for each call, which matches with programmer intuition and expectation.73 Alternatively, if the type system looks at the return type, there is an exact match for each call, which matches with programmer intuition and expectation. 74 74 This capability can be taken to the extreme, where there are no function parameters. 75 75 \begin{cfa} … … 98 98 } 99 99 \end{cfa} 100 It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems, but variable overloading within a scope is often considered extremely dangerous. 100 It is interesting that shadow overloading is considered a normal programming-language feature with only slight software-engineering problems. 101 However, variable overloading within a scope is considered extremely dangerous, without any evidence to corroborate this claim. 102 Similarly, function overloading occurs silently within the global scope in \CC from @#include@ files all the time without problems. 101 103 102 104 In \CFA, the type system simply treats overloaded variables as an overloaded function returning a value with no parameters. 103 Hence, no significant effort is required to support this feature. 104 Leveraging the return type to disambiguate is essential because variables have no parameters. 105 Hence, no significant effort is required to support this feature by leveraging the return type to disambiguate as variables have no parameters. 105 106 \begin{cfa} 106 107 int MAX = 2147483647; $\C[2in]{// (1); overloaded on return type}$ 107 double MAX = ...; $\C{// (2); overloaded on return type}$ 108 long int MAX = ...; $\C{// (2); overloaded on return type}$ 109 double MAX = ...; $\C{// (3); overloaded on return type}$ 108 110 int i = MAX; $\C{// select (1)}$ 109 double d = MAX; $\C{// select (2)}\CRT$ 110 \end{cfa} 111 long int i = MAX; $\C{// select (2)}$ 112 double d = MAX; $\C{// select (3)}\CRT$ 113 \end{cfa} 114 Hence, the name @MAX@ can replace all the C type-specific names, \eg @INT_MAX@, @LONG_MAX@, @DBL_MAX@, \etc. 115 The result is a significant reduction in names to access typed constants. 116 % Paraphrasing Shakespeare, ``The \emph{name} is the thing.''. 117 118 119 \section{Constant Overloading} 120 121 \CFA is unique in providing restricted constant overloading for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context. 122 In addition, several operations are defined in terms of values @0@ and @1@, \eg: 123 \begin{cfa} 124 if (x) x++ $\C{// if (x != 0) x += 1;}$ 125 \end{cfa} 126 Every @if@ and iteration statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result. 127 These two constants are given types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts. 128 The types @zero_t@ and @one_t@ have special builtin implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work. 129 \begin{cfa} 130 struct S { int i, j; }; 131 void ?{}( S & s, zero_t ) { s.[i,j] = 0; } 132 void ?{}( S & s, one_t ) { s.[i,j] = 1; } 133 int ?!=?( S s, zero_t ) { return s.i != 0 && s.j != 0; } 134 S ?+=?( S & s, one_t ) { s.i++; s.j++; return s; } 135 S ?-=?( S & s, one_t ) { s.i--; s.j--; return s; } 136 S ?=?( S & dst, one_t ) { return dst.[i,j] = 1; } 137 void foo() { 138 S s = @0@; 139 s = @1@; 140 if ( @s@ ) @s++@; 141 } 142 \end{cfa} 143 Hence, type @S@ is first-class with respect to the basic types, working with all existing implicit C mechanisms. 111 144 112 145 … … 121 154 z = "abc"; $\C{// assignment}$ 122 155 \end{cfa} 156 For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages. 123 157 For type-and-initialization, the specified and initialization types may not agree. 124 Similarity, for assignment the current variable and expression types may not agree. 125 For type-only, the programmer specifies the initial type, which remains fixed for the variable's lifetime in statically typed languages. 126 In the other cases, the compiler may select the type by melding programmer and context information. 158 For initialization-only, the compiler may select the type by melding programmer and context information. 127 159 When the compiler participates in type selection, it is called \newterm{type inferencing}. 128 160 Note, type inferencing is different from type conversion: type inferencing \emph{discovers} a variable's type before setting its value, whereas conversion has two typed values and performs a (possibly lossy) action to convert one value to the type of the other variable. 161 Finally, for assignment, the current variable and expression types may not agree. 129 162 130 163 One of the first and powerful type-inferencing system is Hindley--Milner~\cite{Damas82}. … … 147 180 \end{cfa} 148 181 In both overloads of @f@, the type system works from the constant initializations inwards and/or outwards to determine the types of all variables and functions. 149 Note, like template meta-programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of the@f@.182 Note, like template meta-programming, there could be a new function generated for the second @f@ depending on the types of the arguments, assuming these types are meaningful in the body of @f@. 150 183 Inferring type constraints, by analysing the body of @f@ is possible, and these constraints must be satisfied at each call site by the argument types; 151 184 in this case, parametric polymorphism can allow separate compilation. 152 185 In languages with type inferencing, there is often limited overloading to reduce the search space, which introduces the naming problem. 153 Return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match.154 155 In simpler type 186 Note, return-type inferencing goes in the opposite direction to Hindley--Milner: knowing the type of the result and flowing back through an expression to help select the best possible overloads, and possibly converting the constants for a best match. 187 188 In simpler type-inferencing systems, such as C/\CC/\CFA, there are more specific usages. 156 189 \begin{cquote} 157 190 \setlength{\tabcolsep}{10pt} … … 190 223 This issue is exaggerated with \CC templates, where type names are 100s of characters long, resulting in unreadable error messages. 191 224 \item 192 Ensuring the type of secondary variables, always match esa primary variable.225 Ensuring the type of secondary variables, always match a primary variable. 193 226 \begin{cfa} 194 227 int x; $\C{// primary variable}$ 195 228 typeof(x) y, z, w; $\C{// secondary variables match x's type}$ 196 229 \end{cfa} 197 If the type of @x@ changes, the type s of the secondary variables correspondingly update.230 If the type of @x@ changes, the type of the secondary variables correspondingly updates. 198 231 \end{itemize} 199 232 Note, the use of @typeof@ is more restrictive, and possibly safer, than general type-inferencing. … … 208 241 \section{Type-Inferencing Issues} 209 242 210 Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions or confusions.243 Each kind of type-inferencing system has its own set of issues that flow onto the programmer in the form of convenience, restrictions, or confusions. 211 244 212 245 A convenience is having the compiler use its overarching program knowledge to select the best type for each variable based on some notion of \emph{best}, which simplifies the programming experience. … … 227 260 In Haskell, it is common for programmers to brand (type) function parameters. 228 261 229 A confusion is large blocks of code where all declarations are @auto@ .262 A confusion is large blocks of code where all declarations are @auto@, as is now common in \CC. 230 263 As a result, understanding and changing the code becomes almost impossible. 231 264 Types provide important clues as to the behaviour of the code, and correspondingly to correctly change or add new code. 232 In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types .265 In these cases, a programmer is forced to re-engineer types, which is fragile, or rely on a fancy IDE that can re-engineer types for them. 233 266 For example, given: 234 267 \begin{cfa} … … 243 276 In this situation, having the type name or its short alias is essential. 244 277 245 \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression.278 The \CFA's type system tries to prevent type-resolution mistakes by relying heavily on the type of the left-hand side of assignment to pinpoint the right types within an expression. 246 279 Type inferencing defeats this goal because there is no left-hand type. 247 280 Fundamentally, type inferencing tries to magic away variable types from the programmer. 248 281 However, this results in lazy programming with the potential for poor performance and safety concerns. 249 282 Types are as important as control-flow in writing a good program, and should not be masked, even if it requires the programmer to think! 250 A similar issue is garbage collection, where storage management is masked, resulting in poor program design and performance. 283 A similar issue is garbage collection, where storage management is magicked away, often resulting in poor program design and performance.\footnote{ 284 There are full-time Java consultants, who are hired to find memory-management problems in large Java programs.} 251 285 The entire area of Computer-Science data-structures is obsessed with time and space, and that obsession should continue into regular programming. 252 286 Understanding space and time issues are an essential part of the programming craft.
Note:
See TracChangeset
for help on using the changeset viewer.