- Timestamp:
- Aug 4, 2024, 8:45:45 AM (5 months ago)
- Branches:
- master
- Children:
- 1e12f07
- Parents:
- e15293b
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/jiada_liang_MMath/background.tex
re15293b rd39d8a4 1 1 \chapter{Background} 2 2 3 \vspace*{-8pt} 4 5 \CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations. 6 The following discussion covers C enumerations. 3 This chapter covers background material for C enumerations and \CFA features used in later discussion. 4 5 6 \section{C} 7 7 8 8 As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants. … … 18 18 \item 19 19 The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{ 20 C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occup ystorage.20 C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupies storage. 21 21 \begin{clang} 22 22 $\$$ nm test.o … … 28 28 29 29 30 \s ection{C \lstinline{const}}30 \subsection{C \lstinline{const}} 31 31 \label{s:Cconst} 32 32 … … 34 34 \begin{cquote} 35 35 \begin{tabular}{@{}ll@{}} 36 \multicolumn{1}{@{}c}{\textbf{static initialization}} & \multicolumn{1}{c@{}}{\textbf{dynamic in tialization}} \\36 \multicolumn{1}{@{}c}{\textbf{static initialization}} & \multicolumn{1}{c@{}}{\textbf{dynamic initialization}} \\ 37 37 \begin{clang} 38 38 static const int one = 0 + 1; … … 58 58 Again, this form of aliasing is not an enumeration. 59 59 60 \section{C Enumeration} 60 61 \subsection{C Enumeration} 61 62 \label{s:CEnumeration} 62 63 … … 78 79 79 80 80 \subs ection{Type Name}81 \subsubsection{Type Name} 81 82 \label{s:TypeName} 82 83 … … 88 89 Here, the aliased constants are: 20, 10, 20, 21, and -7. 89 90 Direct initialization is by a compile-time expression generating a constant value. 90 Indirect initialization (without initializ ation, @Max10Plus1@) is \newterm{auto-initialized}:from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.91 Indirect initialization (without initializer, @Max10Plus1@) is called \newterm{auto-initialization}, where enumerators are assigned values from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@. 91 92 Because multiple independent enumerators can be combined, enumerators with the same values can occur. 92 93 The enumerators are rvalues, so assignment is disallowed. 93 Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type.94 Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) outside into the enclosing scope of the @enum@ type. 94 95 For unnamed enumerations, this semantic is required because there is no type name for scoped qualification. 95 96 … … 126 127 127 128 128 \subs ection{Implementation}129 \subsubsection{Implementation} 129 130 \label{s:CenumImplementation} 130 131 … … 135 136 A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11} 136 137 \end{quote} 137 Howeve er, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.138 However, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture. 138 139 Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures. 139 In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization. 140 \VRef[Figure]{f:gccEnumerationStorageSize} shows both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization. 141 Hence, initialization in the range @INT_MIN@..@INT_MAX@ results in a 4-byte enumerator, and outside this range the enumerator is 8 bytes. 142 Note that @sizeof( typeof( IMin ) ) != sizeof( E )@, making the size of an enumerator different than is containing enumeration type, which seems inconsistent, \eg @sizeof( typeof( 3 ) ) == sizeof( int )@. 143 144 \begin{figure} 140 145 \begin{cfa} 141 146 enum E { IMin = INT_MIN, IMax = INT_MAX, … … 143 148 ILLMin = LLONG_MIN, ILLMax = LLONG_MAX }; 144 149 int main() { 145 printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n", 146 sizeof(IMin), IMin, IMax, 147 sizeof(ILMin), ILMin, ILMax, 148 sizeof(ILLMin), ILLMin, ILLMax ); 149 } 150 printf( "%zd %zd\n%zd %zd\n%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n", 151 sizeof(enum E), sizeof(typeof(IMin)), 152 sizeof(int), sizeof(long int), 153 sizeof(IMin), IMin, IMax, 154 sizeof(ILMin), ILMin, ILMax, 155 sizeof(ILLMin), ILLMin, ILLMax ); 156 } 157 8 4 150 158 4 -2147483648 2147483647 151 159 8 -9223372036854775808 9223372036854775807 152 160 8 -9223372036854775808 9223372036854775807 153 161 \end{cfa} 154 Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes. 155 156 \subsection{Usage} 162 \caption{\lstinline{gcc}/\lstinline{clang} Enumeration Storage Size} 163 \label{f:gccEnumerationStorageSize} 164 \end{figure} 165 166 167 \subsubsection{Usage} 157 168 \label{s:Usage} 158 169 159 C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type, and between two different enumeration .170 C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type, and between two different enumerations. 160 171 \begin{clang} 161 172 enum Week week = Mon; $\C{// week == 0}$ … … 164 175 @week = 10000;@ $\C{// UNDEFINED! implicit conversion to Week}$ 165 176 166 enum Season { Spring, Summer, Fall, Winter };177 enum Season { Spring, Summer, Fall, Winter }; 167 178 @week = Winter;@ $\C{// UNDEFINED! implicit conversion to Week}$ 168 179 \end{clang} 169 While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration typeis a common source of error.180 While converting an enumerator to its underlying type is useful, the implicit conversion from the base or another enumeration type to an enumeration is a common source of error. 170 181 171 182 Enumerators can appear in @switch@ and looping statements. … … 182 193 } 183 194 \end{cfa} 184 For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.185 For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.195 For iterating using arithmetic to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values. 196 For example, a previous gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values. 186 197 Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@. 187 198 For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@. … … 192 203 for ( enum E e = A; e < @N@; e += 1 ) ... 193 204 \end{cfa} 194 Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.195 @N@ is often used as the dimension for an array assocated with the enumeration.205 Here, serendipitously the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@. 206 This @N@ is often used as the dimension for an array assocated with the enumeration. 196 207 \begin{cfa} 197 208 E array[@N@]; … … 200 211 } 201 212 \end{cfa} 202 However, for non- integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.203 204 This idiom is used inanother C idiom for matching companion information.205 For example, an enumeration islinked with a companion array of printable strings.213 However, for non-consecutive ordering and non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails. 214 215 This idiom is often used with another C idiom for matching companion information. 216 For example, an enumeration may be linked with a companion array of printable strings. 206 217 \begin{cfa} 207 218 enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES }; … … 211 222 "signed int", "unsigned int", ... 212 223 }; 213 enum Integral_Type integral_type= ...224 enum Integral_Type @integral_type@ = ... 214 225 printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name 215 226 \end{cfa} 216 227 However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}. 217 The needto harmonize is at best indicated by a comment before the enumeration.228 The requirement to harmonize is at best indicated by a comment before the enumeration. 218 229 This issue is exacerbated if enumeration and companion array are in different translation units. 219 230 220 231 \bigskip 221 While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages. 222 223 \section{\CFA Polymorphism} 232 While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful/advanced enumeration features found in other programming languages. 233 234 235 \section{\CFA} 236 237 \CFA in \emph{not} an object-oriented programming-language, \ie functions cannot be nested in aggregate types, and hence, there is no receive notation for calling functions, \eg @obj.method(...)@, where the first argument proceeds the call. 238 The following section provide short descriptions of \CFA features mentioned further in the thesis. 239 240 241 \subsection{Operator Overloading} 242 243 Virtually all programming languages overload the arithmetic operators across the basic types using the number and type of parameters and returns. 244 Like \CC, \CFA also allows these operators to be overloaded with user-defined types. 245 The syntax for operator names uses the @'?'@ character to denote a parameter, \eg prefix and infix increment operators: @?++@, @++?@, and @?+?@. 246 \begin{cfa} 247 struct S { int i, j }; 248 S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; } 249 S s1, s2; 250 s1 = s1 @+@ s2; $\C[1.75in]{// infix call}$ 251 s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$ 252 \end{cfa} 253 The type system examines each call size and selects the best matching overloaded function based on the number and types of the arguments. 254 If there are intermixed operands, @2 + 3.5@, the type system attempts (safe) conversions changing the arguments to one or more of the parameter type(s). 255 256 224 257 \subsection{Function Overloading} 225 Function overloading is programming languages feature wherein functions may share the same name, but with different function signatures. In both C++ and \CFA, function names can be overloaded 226 with different entities as long as they are different in terms of the number and type of parameters. 227 228 \begin{cfa} 229 void f(); // (1) 230 void f(int); // (2); Overloaded on the number of parameters 231 void f(char); // (3); Overloaded on parameter type 232 233 f('A'); 234 \end{cfa} 235 In this case, the name f is overloaded with a nullity function and two arity functions with different parameters types. Exactly which precedures being executed 236 is determined based on the passing arguments. The last expression of the preceding example calls f with one arguments, narrowing the possible candidates down to (2) and (3). 237 Between those, function argument 'A' is an exact match to the parameter expected by (3), while needing an @implicit conversion@ to call (2). The compiler determines (3) is the better candidates among 238 and procedure (3) is being executed. 239 240 \begin{cfa} 241 int f(int); // (4); Overloaded on return type 242 [int, int] f(int); // (5) Overloaded on the number of return value 243 \end{cfa} 244 The function declarations (4) and (5) show the ability of \CFA functions overloaded with different return value, a feature that is not shared by C++. 245 246 247 \subsection{Operator Overloading} 248 Operators in \CFA are specialized function and are overloadable by with specially-named functions represents the syntax used to call the operator. 249 % For example, @bool ?==?T(T lhs, T rhs)@ overloads equality operator for type T, where @?@ is the placeholders for operands for the operator. 250 \begin{cfa} 251 enum Weekday { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday }; 252 bool ?<?(const Weekday a, const Weekday b) { 253 return ((int)a + 1); 254 } 255 Monday < Sunday; // False 256 ?<?( Monday, Sunday ); // Equivalent syntax 257 \end{cfa} 258 Unary operators are functions that takes one argument and have name @operator?@ or @?operator@, where @?@ is the placeholders for operands. 259 Binary operators are function with two parameters. They are overloadable with function name @?operator?@. 258 259 Both \CFA and \CC allow function names to be overloaded, as long as their prototypes differ in the number and type of parameters and returns. 260 \begin{cfa} 261 void f( void ); $\C[1.75in]{// (1): no parameter}$ 262 void f( char ); $\C{// (2): overloaded on the number and parameter type}$ 263 void f( int, int ); $\C{// (3): overloaded on the number and parameter type}$ 264 f( 'A' ); $\C{// select (2)}\CRT$ 265 \end{cfa} 266 In this case, the name @f@ is overloaded depending on the number and parameter types. 267 The type system examines each call size and selects the best matching based on the number and types of the arguments. 268 Here, there is a perfect match for the call, @f( 'A' )@ with the number and parameter type of function (2). 269 270 Ada, Scala, and \CFA type-systems also use the return type in resolving a call. 271 \begin{cfa} 272 int f( void ); $\C[1.75in]{// (4); overloaded on return type}$ 273 double f( void ); $\C{// (5); overloaded on return type}$ 274 int i = f(); $\C{// select (4)}$ 275 double d = f(); $\C{// select (5)}\CRT$ 276 \end{cfa} 277 278 279 \subsection{Variable Overloading} 280 Unlike almost all programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes. 281 \begin{cfa} 282 void foo( double d ); 283 int v; $\C[1.75in]{// (1)}$ 284 double v; $\C{// (2) variable overloading}$ 285 foo( v ); $\C{// select (2)}$ 286 { 287 int v; $\C{// (3) shadow overloading}$ 288 double v; $\C{// (4) and variable overloading}$ 289 foo( v ); $\C{// select (4)}\CRT$ 290 } 291 \end{cfa} 292 The \CFA type system simply treats overloaded variables as an overloaded function returning a value with no parameters. 293 260 294 261 295 \subsection{Constructor and Destructor} 262 In \CFA, all objects are initialized by @constructors@ during its allocation, including basic types, 263 which are initialized by auto-generated basic type constructors. 264 265 Constructors are overloadable functions with name @?{}@, return @void@, and have at least one parameter, which is a reference 266 to the object being constructored (Colloquially referred to "this" or "self" in other language). 267 296 297 While \CFA in not object oriented, it adopts many language features commonly used in object-oriented languages; 298 these features are independent from object-oriented programming. 299 300 All objects in \CFA are initialized by @constructors@ \emph{after} allocation and de-initialized \emph{before} deallocation. 301 \CC cannot have constructors for basic-types because they have no aggregate type \lstinline[language=C++]{struct/class} in which to insert a constructor definition. 302 Like \CC, \CFA has multiple auto-generated constructors for every type. 303 304 The prototype for the constructor/destructor are @void ?{}( T &, ... )@ and @void ^?{}( T &, ... )@, respectively. 305 The first parameter is logically, the \lstinline[language=C++]{this} or \lstinline[language=java]{self} in other object-oriented languages, and implicitly passed. 268 306 \begin{cfa} 269 307 struct Employee { 270 c onst char * name;308 char * name; 271 309 double salary; 272 310 }; 273 274 void ?{}( Employee& this, const char * name, double salary ) { 275 this.name = name; 276 this.salary = salary; 277 } 278 279 Employee Sara { "Sara Schmidt", 20.5 }; 280 \end{cfa} 281 Like Python, the "self" reference is implicitly passed to a constructor. The Employee constructors takes two additional arugments used in its 282 field initialization. 283 284 A destructor in \CFA is a function that has name @^?{}@. It returns void, and take only one arugment as its "self". 285 \begin{cfa} 286 void ^?{}( Employee& this ) { 287 free(this.name); 288 this.name = 0p; 289 this.salary = 0; 290 } 291 \end{cfa} 292 Destructor can be explicitly evoked as a function call, or implicitly called at the end of the block in which the object is delcared. 293 \begin{cfa} 311 void @?{}@( Employee & this, char * name, double salary ) { 312 this.name = aalloc( sizeof(name) ); 313 strcpy( this.name, name ); 314 this.salary = salary; 315 } 316 void @^?{}@( Employee & this ) { 317 free( this.name ); 318 } 294 319 { 295 ^Sara{}; 296 Sara{ "Sara Craft", 20 }; 297 } // ^Sara{} 298 \end{cfa} 299 300 \subsection{Variable Overloading} 301 C and C++ disallow more than one variable declared in the same scope with the same name. When a variable declare in a inner scope has the same name as 302 a variable in an outer scope, the outer scope variable is "shadowed" by the inner scope variable and cannot be accessed directly. 303 304 \CFA has variable overloading: multiple variables can share the same name in the same scope, as long as they have different type. Name shadowing only 305 happens when the inner scope variable and the outer scope ones have the same type. 306 \begin{cfa} 307 double i = 6.0; 308 int i = 5; 309 void foo( double i ) { sout | i; } // 6.0 310 \end{cfa} 320 Employee name = { "Sara Schmidt", 20.5 }; 321 } // implicit destructor call 322 \end{cfa} 323 Both constructor and destructor can be explicitly called. 324 \begin{cfa} 325 Employee name = { "Sara Schmidt", 20.5 }; 326 ... // use name 327 ^?{}( name ); // de-initialize 328 ?{}( name, "Jack Smith", 10.5 }; // re-initialize 329 ... // use name 330 \end{cfa} 331 311 332 312 333 \subsection{Special Literals} 313 Literal 0 has special meanings within different contexts: it can means "nothing" or "empty", an additive identity in arithmetic, a default value as in C (null pointer), 314 or an initial state. 315 Awaring of its significance, \CFA provides a special type for the 0 literal, @zero_t@, to define the logical @zero@ for custom types. 334 335 The C constants @0@ and @1@ have special meaning. 336 @0@ is the null pointer and used in conditional expressions, where @if ( p )@ is rewritten @if ( p != 0 )@; 337 @1@ is an additive identity in unary operators @++@ and @--@. 338 Aware of their significance, \CFA provides a special type @zero_t@ and @one_t@ for custom types. 316 339 \begin{cfa} 317 340 struct S { int i, j; }; 318 341 void ?{}( S & this, @zero_t@ ) { this.i = 0; this.j = 0; } // zero_t, no parameter name allowed 319 S s0 = @0@;320 \end{cfa}321 Overloading @zero_t@ for S provides new definition for @zero@ of type S.322 323 According to the C standard, @0@ is the @only@ false value. Any values compares equals to @0@ is false, and not euqals @0@ is true. As a consequence, control structure324 such as @if()@ and @while()@ only runs it true clause when its predicate @not equals@ to @0@.325 326 \CFA generalizes this concept and allows to logically overloads the boolean value for any type by overloading @not equal@ comparison against @zero_t@.327 \begin{cfa}328 342 int ?@!=@?( S this, @zero_t@ ) { return this.i != 0 && this.j != 0; } 329 \end{cfa} 330 331 % In C, the literal 0 represents the Boolean value false. The expression such as @if (x)@ is equivalent to @if (x != 0)@ . 332 % \CFA allows user to define the logical zero for a custom type by overloading the @!=@ operation against a special type, @zero_t@, 333 % so that an expression with the custom type can be used as a predicate without the need of conversion to the literal 0. 334 % \begin{cfa} 335 % struct S s; 336 % int ?!=?(S, zero_t); 337 % if (s) {} 338 % \end{cfa} 339 Literal 1 is also special. Particularly in C, the pre-increment operator and post-increment operator can be interpreted in terms of @+= 1@. 340 The logical @1@ in \CFA is represented by special type @one_t@. 341 \begin{cfa} 342 void ?{}( S & this, one_t ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed 343 S & ?+=?( S & this, one_t ) { this.i += 1; this.j += 1; return op; } 344 \end{cfa} 345 Without explictly overloaded by a user, \CFA uses the user-defined @+=(S&, one_t)@ to interpret @?++@ and @++?@, as both are polymorphic functions in \CFA. 346 347 \subsection{Polymorphics Functions} 348 Parametric-Polymorphics functions are the functions that applied to all types. \CFA functions are parametric-polymorphics when 349 they are written with the @forall@ clause. 350 351 \begin{cfa} 352 forall(T) 353 T identity(T x) { return x; } 354 identity(42); 355 \end{cfa} 356 The identity function accepts a value from any type as an arugment, and the type parameter @T@ is bounded to @int@ when the function 357 is called with 42. 358 359 The forall clause can takes @type assertions@ that restricts the polymorphics type. 360 \begin{cfa} 361 forall( T | { void foo(T); } ) 362 void bar(T t) { foo(t); } 363 364 struct S {} s; 365 void foo(struct S); 366 367 bar(s); 368 \end{cfa} 369 The assertion on @T@ restricts the range of types for bar to only those implements foo with the matching a signature, so that bar() 370 can call @foo@ in its body with type safe. 371 Calling on type with no mathcing @foo()@ implemented, such as int, causes a compile time type assertion error. 372 373 \subsection{trait} 374 A @forall@ clause can asserts on multiple types and with multiple asserting functions. A common practice in \CFA is to group 375 the asserting functions in to a named \newterm{trait}. 376 377 \begin{cfa} 378 trait Bird(T) { 379 int days_can_fly(T i); 380 void fly(T t); 343 S s = @0@; 344 if ( s @!= 0@ ) ... 345 \end{cfa} 346 Similarity, for @one_t@. 347 \begin{cfa} 348 void ?{}( S & this, @one_t@ ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed 349 S & ?++( S & this, @one_t@ ) { return (S){ this.i++, this.j++ }; } 350 \end{cfa} 351 352 353 \subsection{Polymorphic Functions} 354 355 Polymorphic functions are the functions that apply to all types. 356 \CFA provides \newterm{parametric polymorphism} written with the @forall@ clause. 357 \begin{cfa} 358 @forall( T )@ T identity( T v ) { return v; } 359 identity( 42 ); 360 \end{cfa} 361 The @identity@ function accepts a value from any type as an argument and returns that value. 362 At the call size, the type parameter @T@ is bounded to @int@ from the argument @42@. 363 364 For polymorphic functions to be useful, the @forall@ clause needs \newterm{type assertion}s that restricts the polymorphic types it accepts. 365 \begin{cfa} 366 forall( T @| { void foo( T ); }@ ) void bar( T t ) { @foo( t );@ } 367 struct S { ... } s; 368 void foo( struct S ); 369 bar( s ); 370 \end{cfa} 371 The assertion on @T@ restricts the range of types that can be manipulated by @bar@ to only those that have an implementation of @foo@ with the matching signature, allowing @bar@'s call to @foo@ in its body. 372 373 374 \subsection{Trait} 375 376 A @forall@ clause can assert many restrictions on multiple types. 377 A common practice is to refactor the assertions into a named \newterm{trait}. 378 \begin{cfa} 379 forall(T) trait @Bird@ { 380 int days_can_fly( T ); 381 void fly( T ); 381 382 }; 382 383 forall(B | Bird(B)) { 384 void bird_fly(int days_since_born, B bird) { 385 if (days_since_born > days_can_fly(bird)) { 386 fly(bird); 387 } 388 } 389 } 390 391 struct Robin {} r; 392 int days_can_fly(Robin r) { return 23; } 393 void fly(Robin r) {} 394 395 bird_fly( r ); 396 \end{cfa} 397 398 Grouping type assertions into named trait effectively create a reusable interface for parametrics polymorphics types. 383 forall( B | @Bird@( B ) ) 384 void bird_fly( int days_since_born, B bird ) { 385 if ( days_since_born > days_can_fly( bird )) fly( bird ); 386 } 387 struct Robin {} robin; 388 int days_can_fly( Robin robin ) { return 23; } 389 void fly( Robin robin ) {} 390 bird_fly( 23, robin ); 391 \end{cfa} 392 Grouping type assertions into a named trait effectively creates a reusable interface for parametric-polymorphic types. 393 399 394 400 395 \section{Expression Resolution} 401 396 402 The overloading feature poses a challenge in \CFA expression resolution. Overloadeded identifiers can refer multiple 403 candidates, with multiples being simultaneously valid. The main task of \CFA resolver is to identity a best candidate that 404 involes less implicit conversion and polymorphism. 397 Overloading poses a challenge for all expression-resolution systems. 398 Multiple overloaded names give multiple candidates at a call site, and a resolver must pick a \emph{best} match, where ``best'' is defined by a series of heuristics based on safety and programmer intuition/expectation. 399 When multiple best matches exist, the resolution is ambiguous. 400 401 The \CFA resolver attempts to identity a best candidate based on: first, the number of parameters and types, and second, when no exact match exists, the fewest implicit conversions and polymorphic variables. 402 Finding an exact match is not discussed here, because the mechanism is fairly straightforward, even when the search space is large; 403 only finding a non-exact match is discussed in detail. 404 405 405 406 406 \subsection{Conversion Cost} 407 407 \label{s:ConversionCost} 408 In C, function call arguments and function parameters do not need to be a exact match. When types mismatch, C performs an \newterm{implicit conversion} 409 on argument to parameter type. The process is trivial with the exception on binary operators; When types of operands are different, 410 C nees to decide which operands need implicit conversion. C defines the resolution pattern as "usual arithmetic conversion", 411 in which C looks for a \newterm{common type} between operands, and convert either one or both operands to the common type. 412 Loosely defined, a common type is a the smallest type in terms of size of representation that both operands can be converted into without losing their precision. 413 Such conversion is called "widening" or "safe conversion". 414 415 C binary operators can be restated as 2-arity functions that overloaded with different parameters. "Usual arithmetic conversion" is to find a overloaded 416 instance that for both arguments, the conversion to parameter type is a widening conversion to the smallest type. 417 418 \CFA generalizes "usual arithmetic conversion" to \newterm{conversion cost}. In the first design by Bilson, conversion cost is a 3-tuple, 419 @(unsafe, poly, safe)@, where @unsafe@ the number of unsafe (narrorow conversion) from argument to parameter, 420 @poly@ is the number of polymorphic function parameter, 421 and @safe@ is sum of degree of safe (widening) conversion. 408 409 Most programming languages perform some implicit conversions among basic types to facilitate mixed-mode arithmetic; 410 otherwise, the program becomes littered with many explicit casts, which is not match programmer expectation. 411 C is an aggressive language as it provides conversions among almost all of the basic types, even when the conversion is potentially unsafe or not meaningful, \ie @float@ to @bool@. 412 C defines the resolution pattern as ``usual arithmetic conversion''~\cite[\S~6.3.1.8]{C11}, in which C looks for a \newterm{common type} between operands, and converts one or both operands to the common type. 413 Loosely defined, a common type is a the smallest type in terms of size of representation that both operands can be converted into without losing their precision, called a \newterm{widening} or \newterm{safe conversion}. 414 415 \CFA generalizes ``usual arithmetic conversion'' to \newterm{conversion cost}. 416 In the first design by Bilson~\cite{Bilson03}, conversion cost is a 3-tuple, @(unsafe, poly, safe)@ applied between each argument/parameter type, where: 417 \begin{enumerate} 418 \item 419 @unsafe@ is the number of precision losing (\newterm{narrowing} conversions), 420 \item 421 @poly@ is the number of polymorphic function parameters, and 422 \item 423 @safe@ is sum of the degree of safe (widening) conversions. 424 \end{enumerate} 422 425 Sum of degree is a method to quantify C's integer and floating-point rank. 423 Every pair of widening conversion types has been assigned with a \newterm{distance}, and distance between the two same type is 0. 424 For example, the distance from char to int is 2, distance from integer to long is 1, and distance from int to long long int is 2. 425 The distance does not mirror C's rank system. For example, the rank of char and signed char are the same in C, but the distance from char to signed char is assigned with 1. 426 @safe@ cost is summing all pair of argument to parameter safe conversion distance. 427 Among the three in Bilson's model, @unsafe@ is the most significant cost and @safe@ is the least significant one, with an implication that \CFA always choose 428 a candidate with the lowest @unsafe@ if possible. 429 430 Assume the overloaded function @foo@ is called with two @int@ parameter. The cost for every overloaded @foo@ has been list along: 431 \begin{cfa} 432 void foo(char, char); // (2, 0, 0) 433 void foo(char, int); // (1, 0, 0) 434 forall(T, V) void foo(T, V); // (0, 2, 0) 435 forall(T) void foo(T, T); // (0, 2, 0) 436 forall(T) void foo(T, int); // (0, 1, 0) 437 void foo(long long, long); // (0, 0, 3) 438 void foo(long, long); // (0, 0, 2) 439 void foo(int, long); // (0, 0, 1) 440 441 int i, j; foo(i, j); 442 \end{cfa} 443 The overloaded instances are ordered from the highest to the lowest cost, and \CFA select the last candidate. 444 445 In the later iteration of \CFA, Schluntz and Aaron expanded conversion cost to a 7-tuple with 4 additional categories, 446 @@(unsafe, poly, safe, sign, vars, specialization, reference)@@. 447 with interpretation listed below: 448 \begin{itemize} 449 \item Unsafe 450 \item Poly 451 \item Safe 452 \item Sign is the number of sign/unsign variable conversion. 453 \item Vars is the number of polymorphics type variable. 454 \item Specialization is negative value of the number of type assertion. 455 \item Reference is number of reference-to-rvalue conversion. 456 \end{itemize} 457 The extended conversion cost models looks for candidates that are more specific and less generic. 458 @Var@s was introduced by Aaron to disambugate @forall(T, V) void foo(T, V)@ and @forall(T) void foo(T, T)@. The extra type parameter @V@ 459 makes it more generic and less specific. More generic type means less constraints on types of its parameters. \CFA is in favor of candidates with more 460 restrictions on polymorphism so @forall(T) void foo(T, T)@ has lower cost. @Specialization@ is a value that always less than or equal to zero. For every type assertion in @forall@ clause, \CFA subtracts one from 461 @specialization@, starting from zero. More type assertions often means more constraints on argument type, and making the function less generic. 462 463 \CFA defines two special cost value: @zero@ and @infinite@ A conversion cost is @zero@ when argument and parameter has exact match, and a conversion cost is @infinite@ when 464 there is no defined conversion between two types. For example, the conversion cost from int to a struct type S is @infinite@. 426 Every pair of widening conversion types is assigned a \newterm{distance}, and distance between the two same type is 0. 427 For example, the distance from @char@ to @int@ is 2, distance from @int@ to @long@ is 1, and distance from @int@ to @long long int@ is 2. 428 This distance does not mirror C's rank system. 429 For example, the rank of @char@ and @signed char@ are the same in C, but the distance from @char@ to @signed char@ is assigned 1. 430 @safe@ cost is summing all pairs of argument to parameter safe conversion distances. 431 Among the three costs in Bilson's model, @unsafe@ is the most significant cost and @safe@ is the least significant, with an implication that \CFA always choose a candidate with the lowest @unsafe@, if possible. 432 433 For example, assume the overloaded function @foo@ is called with two @int@ parameter. 434 The cost for every overloaded @foo@ has been list along: 435 \begin{cfa} 436 void foo( char, char ); $\C[2.5in]{// (1) (2, 0, 0)}$ 437 void foo( char, int ); $\C{// (2) (1, 0, 0)}$ 438 forall( T, V ) void foo( T, V ); $\C{// (3) (0, 2, 0)}$ 439 forall( T ) void foo( T, T ); $\C{// (4) (0, 2, 0)}$ 440 forall( T ) void foo( T, int ); $\C{// (5) (0, 1, 0)}$ 441 void foo( long long, long ); $\C{// (6) (0, 0, 3)}$ 442 void foo( long, long ); $\C{// (7) (0, 0, 2)}$ 443 void foo( int, long ); $\C{// (8) (0, 0, 1)}$ 444 int i, j; 445 foo( i, j ); $\C{// convert j to long and call (8)}\CRT$ 446 \end{cfa} 447 The overloaded instances are ordered from the highest to the lowest cost, and \CFA select the last candidate (8). 448 449 In the next iteration of \CFA, Schluntz and Aaron~\cite{Moss18} expanded conversion cost to a 7-tuple with 4 additional categories, @(unsafe, poly, safe, sign, vars, specialization, reference)@, with the following interpretations: 450 \begin{enumerate} 451 \item @unsafe@ from Bilson 452 \item @poly@ 453 \item @safe@ 454 \item @sign@ is the number of sign/unsigned variable conversions 455 \item @vars@ is the number of polymorphic type variables 456 \item @specialization@ is a negative value of the number of type assertions 457 \item @reference@ is the number of reference-to-rvalue conversions 458 \end{enumerate} 459 The extended conversion-cost model looks for candidates that are more specific and less generic. 460 @vars@ disambiguates @forall( T, V ) foo( T, V )@ and @forall( T ) void foo( T, T )@, where the extra type parameter @V@ makes is more generic. 461 A more generic type means less constraints on its parameter types. 462 \CFA favours candidates with more restrictions on polymorphism, so @forall( T ) void foo( T, T )@ has lower cost. 463 @specialization@ is an arbitrary count-down value starting at zero. 464 For every type assertion in @forall@ clause (no assertions in the above example), \CFA subtracts one from @specialization@. 465 More type assertions means more constraints on argument types, making the function less generic. 466 467 \CFA defines two special cost values: @zero@ and @infinite@. 468 A conversion cost is @zero@ when argument and parameter has an exact match, and a conversion cost is @infinite@ when there is no defined conversion between two types. 469 For example, the conversion cost from @int@ to a @struct S@ is @infinite@.
Note: See TracChangeset
for help on using the changeset viewer.