source: doc/theses/jiada_liang_MMath/background.tex @ ad47ec4

Last change on this file since ad47ec4 was 7568e5c, checked in by JiadaL <j82liang@…>, 3 months ago

Minor update on the thesis (add auto initialization and update future work

  • Property mode set to 100644
File size: 24.8 KB
RevLine 
[18ebc28]1\chapter{Background}
[956299b]2
[7568e5c]3This chapter covers background material for C enumerations and \CFA features used in later discussions.
[736a38d]4
[d39d8a4]5
6\section{C}
[956299b]7
[10a99d87]8As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
[7d9a805b]9\begin{clang}
10#define Mon 0
11static const int Mon = 0;
12enum { Mon };
13\end{clang}
14\begin{enumerate}[leftmargin=*]
15\item
[7568e5c]16For @#define@, the programmer must explicitly manage the constant name and value.
17Furthermore, these C preprocessor macro names are outside the C type system and can incorrectly change random text in a program.
[7d9a805b]18\item
19The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
[7568e5c]20C allows variable-length array declarations (VLA), so this case does work. Still, it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions and occupies storage.
[7d9a805b]21\begin{clang}
22$\$$ nm test.o
230000000000000018 r Mon
24\end{clang}
25\item
[736a38d]26Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
[7d9a805b]27\end{enumerate}
28
29
[9d3a4cc]30\subsection{C \texorpdfstring{\lstinline{const}}{const}}
[f632117]31\label{s:Cconst}
[7d9a805b]32
[f632117]33C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
[736a38d]34\begin{cquote}
[10a99d87]35\begin{tabular}{@{}ll@{}}
[d39d8a4]36\multicolumn{1}{@{}c}{\textbf{static initialization}} &  \multicolumn{1}{c@{}}{\textbf{dynamic initialization}} \\
[7d9a805b]37\begin{clang}
[736a38d]38static const int one = 0 + 1;
[7d9a805b]39static const void * NIL = NULL;
40static const double PI = 3.14159;
41static const char Plus = '+';
42static const char * Fred = "Fred";
[736a38d]43static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
44        Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
45\end{clang}
46&
47\begin{clang}
[7d9a805b]48void foo() {
[736a38d]49        // auto scope only
50        const int r = random() % 100;
51        int va[r];
[7d9a805b]52}
53\end{clang}
[736a38d]54\end{tabular}
55\end{cquote}
[10a99d87]56However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
[736a38d]57Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
[41c4b5e]58Again, this form of aliasing is not an enumeration.
[956299b]59
[d39d8a4]60
61\subsection{C Enumeration}
[4da9142]62\label{s:CEnumeration}
[7d9a805b]63
[4da9142]64The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
65\begin{clang}[identifierstyle=\linespread{0.9}\it]
66$\it enum$-specifier:
67        enum identifier$\(_{opt}\)$ { enumerator-list }
68        enum identifier$\(_{opt}\)$ { enumerator-list , }
69        enum identifier
70enumerator-list:
71        enumerator
72        enumerator-list , enumerator
73enumerator:
74        enumeration-constant
75        enumeration-constant = constant-expression
76\end{clang}
77The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
[f632117]78The C enumeration semantics are discussed using examples.
[4da9142]79
[f632117]80
[d39d8a4]81\subsubsection{Type Name}
[f632117]82\label{s:TypeName}
83
84An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
85However, it is restricted to integral values.
[4da9142]86\begin{clang}
[ec20ab9]87enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
[4da9142]88\end{clang}
[7568e5c]89Here, the aliased constants are 20, 10, 20, 21, and -7.
90Direct initialization is achieved by a compile-time expression that generates a constant value.
[ab11ab1]91Indirect initialization (without an initializer, @Max10Plus1@) is called \newterm{auto-initialization}, where enumerators are assigned values from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
[f632117]92Because multiple independent enumerators can be combined, enumerators with the same values can occur.
[7568e5c]93The enumerators are @rvalues@, so the assignment is disallowed.
[d39d8a4]94Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) outside into the enclosing scope of the @enum@ type.
[7568e5c]95This semantic is required for unnamed enumerations because there is no type name for scoped qualification.
[f632117]96
[7568e5c]97As noted, this aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
[f632117]98While the semantics is misleading, this enumeration form matches with aggregate types:
99\begin{cfa}
[736a38d]100typedef struct @/* unnamed */@  { ... } S;
101struct @/* unnamed */@  { ... } x, y, z;        $\C{// questionable}$
[f632117]102struct S {
[736a38d]103        union @/* unnamed */@ {                                 $\C{// unscoped fields}$
[f632117]104                int i;  double d ;  char ch;
105        };
106};
107\end{cfa}
108Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
[4da9142]109
[f632117]110A \emph{named} enumeration is an enumeration:
[7d9a805b]111\begin{clang}
[f632117]112enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
[7d9a805b]113\end{clang}
[ab11ab1]114and adopts the same semantics as direct and auto initialization.
[7d9a805b]115For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
[f632117]116As well, initialization may occur in any order.
[7d9a805b]117\begin{clang}
[f632117]118enum Week {
119        Thu@ = 10@, Fri, Sat, Sun,
[ec20ab9]120        Mon@ = 0@, Tue, Wed@,@                  $\C{// terminating comma}$
121};
[7d9a805b]122\end{clang}
[7568e5c]123Note the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
[f9da761]124A terminating comma appears in other C syntax, \eg the initializer list.}
[ab11ab1]125This feature allows enumerator lines to be interchanged without moving a comma.
[f632117]126Named enumerators are also unscoped.
127
128
[d39d8a4]129\subsubsection{Implementation}
[10a99d87]130\label{s:CenumImplementation}
[f632117]131
[7568e5c]132Theoretically, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
[10a99d87]133In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
134However, type @int@ is defined as:
135\begin{quote}
136A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
137\end{quote}
[ab11ab1]138However, @int@ means 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
[7568e5c]139Whereas @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
140\VRef[Figure]{f:gccEnumerationStorageSize} shows both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based on its initialization.
141Hence, initialization in the range @INT_MIN@..@INT_MAX@ results in a 4-byte enumerator, and outside this range, the enumerator is 8 bytes.
[d39d8a4]142Note that @sizeof( typeof( IMin ) ) != sizeof( E )@, making the size of an enumerator different than is containing enumeration type, which seems inconsistent, \eg @sizeof( typeof( 3 ) ) == sizeof( int )@.
143
144\begin{figure}
[10a99d87]145\begin{cfa}
146enum E { IMin = INT_MIN, IMax = INT_MAX,
147                         ILMin = LONG_MIN, ILMax = LONG_MAX,
148                         ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
149int main() {
[d39d8a4]150        printf( "%zd %zd\n%zd %zd\n%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
151                        sizeof(enum E), sizeof(typeof(IMin)),
152                        sizeof(int), sizeof(long int),
153                        sizeof(IMin), IMin, IMax,
154                        sizeof(ILMin), ILMin, ILMax,
155                        sizeof(ILLMin), ILLMin, ILLMax );
[10a99d87]156}
[d39d8a4]1578 4
[10a99d87]1584 -2147483648 2147483647
1598 -9223372036854775808 9223372036854775807
1608 -9223372036854775808 9223372036854775807
161\end{cfa}
[d39d8a4]162\caption{\lstinline{gcc}/\lstinline{clang} Enumeration Storage Size}
163\label{f:gccEnumerationStorageSize}
164\end{figure}
165
[f632117]166
[d39d8a4]167\subsubsection{Usage}
[f632117]168\label{s:Usage}
169
[7568e5c]170C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type and between two different enumerations.
[7d9a805b]171\begin{clang}
[f632117]172enum Week week = Mon;                           $\C{// week == 0}$
173week = Fri;                                                     $\C{// week == 11}$
174int i = Sun;                                            $\C{// implicit conversion to int, i == 13}$
175@week = 10000;@                                         $\C{// UNDEFINED! implicit conversion to Week}$
[fcf3493]176
[d39d8a4]177enum Season { Spring, Summer, Fall, Winter };
[fcf3493]178@week = Winter;@                                        $\C{// UNDEFINED! implicit conversion to Week}$
[7d9a805b]179\end{clang}
[7568e5c]180While converting an enumerator to its underlying type is sound, the implicit conversion from the base or another enumeration type to an enumeration is a common source of error.
[f632117]181
182Enumerators can appear in @switch@ and looping statements.
183\begin{cfa}
184enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
185switch ( week ) {
[10a99d87]186        case Mon ... Fri:                               $\C{// gcc case range}$
[f632117]187                printf( "weekday\n" );
188        case Sat: case Sun:
189                printf( "weekend\n" );
190}
[10a99d87]191for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
[f632117]192        printf( "day %d\n", day ); // 0-6
193}
194\end{cfa}
[d39d8a4]195For iterating using arithmetic to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
[7568e5c]196For example, a previous gap introduced by @Thu = 10@ results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
197Note that the bidirectional conversion allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
[f632117]198For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
199
[7568e5c]200There is a C idiom that computes the number of enumerators in an enumeration automatically.
[f632117]201\begin{cfa}
202enum E { A, B, C, D, @N@ };  // N == 4
203for ( enum E e = A; e < @N@; e += 1 ) ...
204\end{cfa}
[7568e5c]205Serendipitously, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
[ab11ab1]206This @N@ is often used as the dimension for an array associated with the enumeration.
[f632117]207\begin{cfa}
208E array[@N@];
209for ( enum E e = A; e < N; e += 1 ) {
210        array[e] = e;
211}
212\end{cfa}
[d39d8a4]213However, for non-consecutive ordering and non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
[f632117]214
[d39d8a4]215This idiom is often used with another C idiom for matching companion information.
216For example, an enumeration may be linked with a companion array of printable strings.
[f632117]217\begin{cfa}
218enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
219char * Integral_Name[@NO_OF_ITYPES@] = {
220        "char", "signed char", "unsigned char",
221        "signed short int", "unsigned short int",
222        "signed int", "unsigned int", ...
223};
[d39d8a4]224enum Integral_Type @integral_type@ = ...
[f632117]225printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
226\end{cfa}
227However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
[7568e5c]228The requirement to harmonize is, at best, indicated by a comment before the enumeration.
[f632117]229This issue is exacerbated if enumeration and companion array are in different translation units.
230
231\bigskip
[7568e5c]232While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide helpful/advanced enumeration features in other programming languages.
[e561551]233
234
[9d3a4cc]235\section{\texorpdfstring{\CFA}{Cforall}}
[d39d8a4]236
[7568e5c]237\CFA in \emph{not} an object-oriented programming language, \ie functions cannot be nested in aggregate types, and hence, there is no \newterm{receiver} notation for calling functions, \eg @obj.method(...)@, where the first argument proceeds the call and becomes an implicit first (\lstinline[language=C++]{this}) parameter.
[503c350]238The following sections provide short descriptions of \CFA features needed further in the thesis.
[7568e5c]239Other \CFA features are presented in situ with short or no explanation because the feature is obvious to C programmers.
[503c350]240
241
242\subsection{Overloading}
243
[7568e5c]244Overloading allows programmers to use the most meaningful names without fear of name clashes within a program or from external sources like included files.
[503c350]245\begin{quote}
246There are only two hard things in Computer Science: cache invalidation and naming things. --- Phil Karlton
247\end{quote}
248Experience from \CC and \CFA developers is that the type system implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
249In many cases, a programmer has no idea there are name clashes, as they are silently resolved, simplifying the development process.
[0c51c8b4]250Depending on the language, ambiguous cases are resolved using some form of qualification and/or casting.
[d39d8a4]251
252
253\subsection{Operator Overloading}
254
255Virtually all programming languages overload the arithmetic operators across the basic types using the number and type of parameters and returns.
256Like \CC, \CFA also allows these operators to be overloaded with user-defined types.
257The syntax for operator names uses the @'?'@ character to denote a parameter, \eg prefix and infix increment operators: @?++@, @++?@, and @?+?@.
[e561551]258\begin{cfa}
[d39d8a4]259struct S { int i, j };
260S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; }
261S s1, s2;
262s1 = s1 @+@ s2;                 $\C[1.75in]{// infix call}$
263s1 = @?+?@( s1, s2 );   $\C{// direct call}\CRT$
264\end{cfa}
[ab11ab1]265The type system examines each call size and selects the best matching overloaded function based on the number and types of arguments.
[0c51c8b4]266If there are mixed-mode operands, @2 + 3.5@, the type system, like in C/\CC, attempts (safe) conversions, converting the argument type(s) to the parameter type(s).
[d39d8a4]267
268
269\subsection{Function Overloading}
[e561551]270
[7568e5c]271Both \CFA and \CC allow function names to be overloaded as long as their prototypes differ in the number and type of parameters and returns.
[d39d8a4]272\begin{cfa}
273void f( void );                 $\C[1.75in]{// (1): no parameter}$
274void f( char );                 $\C{// (2): overloaded on the number and parameter type}$
275void f( int, int );             $\C{// (3): overloaded on the number and parameter type}$
276f( 'A' );                               $\C{// select (2)}\CRT$
[e561551]277\end{cfa}
[d39d8a4]278In this case, the name @f@ is overloaded depending on the number and parameter types.
[7568e5c]279The type system examines each call size and selects the best match based on the number and types of arguments.
280Here, the call @f( 'A' )@ is a perfect match for the number and parameter type of function (2).
[e561551]281
[7568e5c]282Ada, Scala, and \CFA type-systems also use the return type to pinpoint the best-overloaded name in resolving a call.
[e561551]283\begin{cfa}
[d39d8a4]284int f( void );                  $\C[1.75in]{// (4); overloaded on return type}$
285double f( void );               $\C{// (5); overloaded on return type}$
286int i = f();                    $\C{// select (4)}$
287double d = f();                 $\C{// select (5)}\CRT$
[e561551]288\end{cfa}
289
290
[d39d8a4]291\subsection{Variable Overloading}
292Unlike almost all programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
[e561551]293\begin{cfa}
[d39d8a4]294void foo( double d );
295int v;                              $\C[1.75in]{// (1)}$
296double v;                               $\C{// (2) variable overloading}$
297foo( v );                               $\C{// select (2)}$
298{
299        int v;                          $\C{// (3) shadow overloading}$
300        double v;                       $\C{// (4) and variable overloading}$
301        foo( v );                       $\C{// select (4)}\CRT$
[e561551]302}
303\end{cfa}
[7568e5c]304The \CFA type system treats overloaded variables as an overloaded function returning a value with no parameters.
[503c350]305Hence, no significant effort is required to support this feature.
[d39d8a4]306
[e561551]307
308\subsection{Constructor and Destructor}
309
[ab11ab1]310While \CFA is not object-oriented, it adopts many language features commonly used in object-oriented languages;
311these features are independent of object-oriented programming.
[d39d8a4]312
313All objects in \CFA are initialized by @constructors@ \emph{after} allocation and de-initialized \emph{before} deallocation.
[7568e5c]314\CC cannot have constructors for basic types because they have no aggregate type \lstinline[language=C++]{struct/class} in which to insert a constructor definition.
[d39d8a4]315Like \CC, \CFA has multiple auto-generated constructors for every type.
[e561551]316
[d39d8a4]317The prototype for the constructor/destructor are @void ?{}( T &, ... )@ and @void ^?{}( T &, ... )@, respectively.
[7568e5c]318The first parameter is logically the \lstinline[language=C++]{this} or \lstinline[language=Python]{self} in other object-oriented languages and implicitly passed.
[503c350]319\VRef[Figure]{f:CFAConstructorDestructor} shows an example of creating and using a constructor and destructor.
320Both constructor and destructor can be explicitly called to reuse a variable.
321
322\begin{figure}
[e561551]323\begin{cfa}
324struct Employee {
[d39d8a4]325        char * name;
[e561551]326        double salary;
327};
[503c350]328void @?{}@( Employee & emp, char * nname, double nsalary ) with( emp ) { // auto qualification
329        name = aalloc( sizeof(nname) );
330        strcpy( name, nname );
331        salary = nsalary;
[e561551]332}
[503c350]333void @^?{}@( Employee & emp ) {
334        free( emp.name );
[e561551]335}
336{
[503c350]337        Employee emp = { "Sara Schmidt", 20.5 }; $\C{// initialize with implicit constructor call}$
338        ... // use emp
339        ^?{}( emp ); $\C{// explicit de-initialize}$
340        ?{}( emp, "Jack Smith", 10.5 ); $\C{// explicit re-initialize}$
341        ... // use emp
342} $\C{// de-initialize with implicit destructor call}$
[e561551]343\end{cfa}
[503c350]344\caption{\CFA Constructor and Destructor}
345\label{f:CFAConstructorDestructor}
346\end{figure}
[e561551]347
[d39d8a4]348
[e561551]349\subsection{Special Literals}
[d39d8a4]350
351The C constants @0@ and @1@ have special meaning.
[7568e5c]352@0@ is the null pointer and is used in conditional expressions, where @if ( p )@ is rewritten @if ( p != 0 )@;
[d39d8a4]353@1@ is an additive identity in unary operators @++@ and @--@.
354Aware of their significance, \CFA provides a special type @zero_t@ and @one_t@ for custom types.
[e561551]355\begin{cfa}
356struct S { int i, j; };
357void ?{}( S & this, @zero_t@ ) { this.i = 0; this.j = 0; } // zero_t, no parameter name allowed
358int ?@!=@?( S this, @zero_t@ ) { return this.i != 0 && this.j != 0; }
[d39d8a4]359S s = @0@;
360if ( s @!= 0@ ) ...
[e561551]361\end{cfa}
[d39d8a4]362Similarity, for @one_t@.
[e561551]363\begin{cfa}
[d39d8a4]364void ?{}( S & this, @one_t@ ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed
365S & ?++( S & this, @one_t@ ) { return (S){ this.i++, this.j++ }; }
[e561551]366\end{cfa}
367
368
[d39d8a4]369\subsection{Polymorphic Functions}
370
371Polymorphic functions are the functions that apply to all types.
372\CFA provides \newterm{parametric polymorphism} written with the @forall@ clause.
[e561551]373\begin{cfa}
[d39d8a4]374@forall( T )@ T identity( T v ) { return v; }
375identity( 42 );
[e561551]376\end{cfa}
[d39d8a4]377The @identity@ function accepts a value from any type as an argument and returns that value.
[2514d3d7]378At the call size, the type parameter @T@ is bounded to @int@ from the argument @42@.
[e561551]379
[ab11ab1]380For polymorphic functions to be useful, the @forall@ clause needs \newterm{type assertion}s that restrict the polymorphic types it accepts.
[e561551]381\begin{cfa}
[d39d8a4]382forall( T @| { void foo( T ); }@ ) void bar( T t ) { @foo( t );@ }
383struct S { ... } s;
384void foo( struct S );
385bar( s );
[e561551]386\end{cfa}
[7568e5c]387The assertion on @T@ restricts the range of types that can be manipulated by @bar@ to only those that implement @foo@ with the matching signature, allowing @bar@'s call to @foo@ in its body.
[0c51c8b4]388Unlike templates in \CC, which are macro expansions at the call site, \CFA polymorphic functions are compiled, passing the call-site assertion functions as hidden parameters.
[e561551]389
390
[d39d8a4]391\subsection{Trait}
392
393A @forall@ clause can assert many restrictions on multiple types.
[7568e5c]394A common practice is refactoring the assertions into a named \newterm{trait}, similar to other languages like Go and Rust.
[e561551]395\begin{cfa}
[d39d8a4]396forall(T) trait @Bird@ {
397        int days_can_fly( T );
398        void fly( T );
[e561551]399};
[d39d8a4]400forall( B | @Bird@( B ) )
401void bird_fly( int days_since_born, B bird ) {
402        if ( days_since_born > days_can_fly( bird )) fly( bird );
[e561551]403}
[d39d8a4]404struct Robin {} robin;
405int days_can_fly( Robin robin ) { return 23; }
406void fly( Robin robin ) {}
407bird_fly( 23, robin );
[e561551]408\end{cfa}
[7568e5c]409Grouping type assertions into a named trait effectively creates a reusable interface for parametric polymorphic types.
[e561551]410
411
412\section{Expression Resolution}
413
[d39d8a4]414Overloading poses a challenge for all expression-resolution systems.
415Multiple overloaded names give multiple candidates at a call site, and a resolver must pick a \emph{best} match, where ``best'' is defined by a series of heuristics based on safety and programmer intuition/expectation.
416When multiple best matches exist, the resolution is ambiguous.
417
[ab11ab1]418The \CFA resolver attempts to identify the best candidate based on: first, the number of parameters and types, and second, when no exact match exists, the fewest implicit conversions and polymorphic variables.
[7568e5c]419Finding an exact match is not discussed here, because the mechanism is fairly straightforward, even when the search space is ample;
[d39d8a4]420only finding a non-exact match is discussed in detail.
421
[e561551]422
423\subsection{Conversion Cost}
[38e20a80]424\label{s:ConversionCost}
[d39d8a4]425
426Most programming languages perform some implicit conversions among basic types to facilitate mixed-mode arithmetic;
[7568e5c]427otherwise, the program becomes littered with many explicit casts which do not match the programmer's expectations.
428C is an aggressive language, providing conversions among almost all basic types, even when the conversion is potentially unsafe or not meaningful, \ie @float@ to @bool@.
[2514d3d7]429C defines the resolution pattern as ``usual arithmetic conversion''~\cite[\S~6.3.1.8]{C11}, in which C looks for a \newterm{common type} between operands, and converts one or both operands to the common type.
[7568e5c]430A common type is the smallest type in terms of the size of representation that both operands can be converted into without losing their precision, called a \newterm{widening} or \newterm{safe conversion}.
[d39d8a4]431
432\CFA generalizes ``usual arithmetic conversion'' to \newterm{conversion cost}.
433In the first design by Bilson~\cite{Bilson03}, conversion cost is a 3-tuple, @(unsafe, poly, safe)@ applied between each argument/parameter type, where:
434\begin{enumerate}
435\item
436@unsafe@ is the number of precision losing (\newterm{narrowing} conversions),
437\item
438@poly@ is the number of polymorphic function parameters, and
439\item
[7568e5c]440@safe@ is the sum of the degree of safe (widening) conversions.
[d39d8a4]441\end{enumerate}
[2514d3d7]442Sum of degree is a method to quantify C's integer and floating-point rank.
[ab11ab1]443Every pair of widening conversion types is assigned a \newterm{distance}, and the distance between the two same types is 0.
[7568e5c]444For example, the distance from @char@ to @int@ is 2, from @int@ to @long@ is 1, and from @int@ to @long long int@ is 2.
[d39d8a4]445This distance does not mirror C's rank system.
[7568e5c]446For example, the @char@ and @signed char@ ranks are the same in C, but the distance from @char@ to @signed char@ is assigned 1.
[ab11ab1]447@safe@ cost is summing all pairs of arguments to parameter safe conversion distances.
[7568e5c]448Among the three costs in Bilson's model, @unsafe@ is the most significant cost, and @safe@ is the least significant, implying that \CFA always chooses a candidate with the lowest @unsafe@, if possible.
[d39d8a4]449
[7568e5c]450For example, assume the overloaded function @foo@ is called with two @int@ parameters.
451The cost for every overloaded @foo@ has been listed along with the following:
[e561551]452\begin{cfa}
[d39d8a4]453void foo( char, char );                         $\C[2.5in]{// (1) (2, 0, 0)}$
454void foo( char, int );                          $\C{// (2) (1, 0, 0)}$
455forall( T, V ) void foo( T, V );        $\C{// (3) (0, 2, 0)}$
456forall( T ) void foo( T, T );           $\C{// (4) (0, 2, 0)}$
457forall( T ) void foo( T, int );         $\C{// (5) (0, 1, 0)}$
458void foo( long long, long );            $\C{// (6) (0, 0, 3)}$
459void foo( long, long );                         $\C{// (7) (0, 0, 2)}$
460void foo( int, long );                          $\C{// (8) (0, 0, 1)}$
461int i, j;
462foo( i, j );                                            $\C{// convert j to long and call (8)}\CRT$
[e561551]463\end{cfa}
[ab11ab1]464The overloaded instances are ordered from the highest to the lowest cost, and \CFA selects the last candidate (8).
[d39d8a4]465
466In the next iteration of \CFA, Schluntz and Aaron~\cite{Moss18} expanded conversion cost to a 7-tuple with 4 additional categories, @(unsafe, poly, safe, sign, vars, specialization, reference)@, with the following interpretations:
[1697c40]467\begin{itemize}
468\item \textit{Unsafe}
469\item \textit{Poly}
470\item \textit{Safe}
[ab11ab1]471\item \textit{Sign} is the number of sign/unsign variable conversions.
472\item \textit{Vars} is the number of polymorphic type variables.
473\item \textit{Specialization} is the negative value of the number of type assertions.
[1697c40]474\item \textit{Reference} is number of reference-to-rvalue conversion.
475\end{itemize}
[d39d8a4]476The extended conversion-cost model looks for candidates that are more specific and less generic.
477@vars@ disambiguates @forall( T, V ) foo( T, V )@ and @forall( T ) void foo( T, T )@, where the extra type parameter @V@ makes is more generic.
[ab11ab1]478A more generic type means fewer constraints on its parameter types.
[d39d8a4]479\CFA favours candidates with more restrictions on polymorphism, so @forall( T ) void foo( T, T )@ has lower cost.
480@specialization@ is an arbitrary count-down value starting at zero.
[7568e5c]481For every type assertion in the @forall@ clause (no assertions in the above example), \CFA subtracts one from @specialization@.
482More type assertions mean more constraints on argument types, making the function less generic.
[d39d8a4]483
484\CFA defines two special cost values: @zero@ and @infinite@.
[7568e5c]485A conversion cost is @zero@ when the argument and parameter have an exact match, and a conversion cost is @infinite@ when there is no defined conversion between the two types.
[1697c40]486For example, the conversion cost from @int@ to a @struct S@ is @infinite@.
487
[3b10778]488In \CFA, the meaning of a C-style cast is determined by its @Cast Cost@.
[7568e5c]489For most cast-expression resolutions, a cast cost equals a conversion cost.
490Cast cost exists as an independent matrix for conversion that cannot happen implicitly while being possible with an explicit cast.
491These conversions are often defined as having an infinite conversion cost and a non-infinite cast cost.
Note: See TracBrowser for help on using the repository browser.