\chapter{Background} \vspace*{-8pt} \CFA is a backwards-compatible extension of the C programming language, therefore, it must support C-style enumerations. The following discussion covers C enumerations. As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants. \begin{clang} #define Mon 0 static const int Mon = 0; enum { Mon }; \end{clang} \begin{enumerate}[leftmargin=*] \item For @#define@, the programmer has to explicitly manage the constant name and value. Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program. \item The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{ C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupy storage. \begin{clang} $\$$ nm test.o 0000000000000018 r Mon \end{clang} \item Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage. \end{enumerate} \section{C \lstinline{const}} \label{s:Cconst} C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization. \begin{cquote} \begin{tabular}{@{}ll@{}} \multicolumn{1}{@{}c}{\textbf{static initialization}} & \multicolumn{1}{c@{}}{\textbf{dynamic intialization}} \\ \begin{clang} static const int one = 0 + 1; static const void * NIL = NULL; static const double PI = 3.14159; static const char Plus = '+'; static const char * Fred = "Fred"; static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1, Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1; \end{clang} & \begin{clang} void foo() { // auto scope only const int r = random() % 100; int va[r]; } \end{clang} \end{tabular} \end{cquote} However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@. Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack. Again, this form of aliasing is not an enumeration. \section{C Enumeration} \label{s:CEnumeration} The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}. \begin{clang}[identifierstyle=\linespread{0.9}\it] $\it enum$-specifier: enum identifier$\(_{opt}\)$ { enumerator-list } enum identifier$\(_{opt}\)$ { enumerator-list , } enum identifier enumerator-list: enumerator enumerator-list , enumerator enumerator: enumeration-constant enumeration-constant = constant-expression \end{clang} The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar. The C enumeration semantics are discussed using examples. \subsection{Type Name} \label{s:TypeName} An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages. However, it is restricted to integral values. \begin{clang} enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 }; \end{clang} Here, the aliased constants are: 20, 10, 20, 21, and -7. Direct initialization is by a compile-time expression generating a constant value. Indirect initialization (without initialization, @Max10Plus1@) is \newterm{auto-initialized}: from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@. Because multiple independent enumerators can be combined, enumerators with the same values can occur. The enumerators are rvalues, so assignment is disallowed. Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) into the enclosing scope of the @enum@ type. For unnamed enumerations, this semantic is required because there is no type name for scoped qualification. As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C. While the semantics is misleading, this enumeration form matches with aggregate types: \begin{cfa} typedef struct @/* unnamed */@ { ... } S; struct @/* unnamed */@ { ... } x, y, z; $\C{// questionable}$ struct S { union @/* unnamed */@ { $\C{// unscoped fields}$ int i; double d ; char ch; }; }; \end{cfa} Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form. A \emph{named} enumeration is an enumeration: \begin{clang} enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun }; \end{clang} and adopts the same semantics with respect to direct and auto intialization. For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@. As well, initialization may occur in any order. \begin{clang} enum Week { Thu@ = 10@, Fri, Sat, Sun, Mon@ = 0@, Tue, Wed@,@ $\C{// terminating comma}$ }; \end{clang} Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{ A terminating comma appears in other C syntax, \eg the initializer list.} This feature allow enumerator lines to be interchanged without moving a comma. Named enumerators are also unscoped. \subsection{Implementation} \label{s:CenumImplementation} In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values. In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix). However, type @int@ is defined as: \begin{quote} A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @@).~\cite[\S~6.2.5(5)]{C11} \end{quote} Howeveer, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture. Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures. In reality, both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization. \begin{cfa} enum E { IMin = INT_MIN, IMax = INT_MAX, ILMin = LONG_MIN, ILMax = LONG_MAX, ILLMin = LLONG_MIN, ILLMax = LLONG_MAX }; int main() { printf( "%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n", sizeof(IMin), IMin, IMax, sizeof(ILMin), ILMin, ILMax, sizeof(ILLMin), ILLMin, ILLMax ); } 4 -2147483648 2147483647 8 -9223372036854775808 9223372036854775807 8 -9223372036854775808 9223372036854775807 \end{cfa} Hence, initialization in the range @INT_MIN@..@INT_MAX@ is 4 bytes, and outside this range is 8 bytes. \subsection{Usage} \label{s:Usage} C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type. \begin{clang} enum Week week = Mon; $\C{// week == 0}$ week = Fri; $\C{// week == 11}$ int i = Sun; $\C{// implicit conversion to int, i == 13}$ @week = 10000;@ $\C{// UNDEFINED! implicit conversion to Week}$ \end{clang} While converting an enumerator to its underlying type is useful, the implicit conversion from the base type to an enumeration type is a common source of error. Enumerators can appear in @switch@ and looping statements. \begin{cfa} enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun }; switch ( week ) { case Mon ... Fri: $\C{// gcc case range}$ printf( "weekday\n" ); case Sat: case Sun: printf( "weekend\n" ); } for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$ printf( "day %d\n", day ); // 0-6 } \end{cfa} For iterating to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values. For example, a gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values. Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@. For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@. There is a C idiom to automatically compute the number of enumerators in an enumeration. \begin{cfa} enum E { A, B, C, D, @N@ }; // N == 4 for ( enum E e = A; e < @N@; e += 1 ) ... \end{cfa} Here, the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@. @N@ is often used as the dimension for an array assocated with the enumeration. \begin{cfa} E array[@N@]; for ( enum E e = A; e < N; e += 1 ) { array[e] = e; } \end{cfa} However, for non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails. This idiom is used in another C idiom for matching companion information. For example, an enumeration is linked with a companion array of printable strings. \begin{cfa} enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES }; char * Integral_Name[@NO_OF_ITYPES@] = { "char", "signed char", "unsigned char", "signed short int", "unsigned short int", "signed int", "unsigned int", ... }; enum Integral_Type integral_type = ... printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name \end{cfa} However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}. The need to harmonize is at best indicated by a comment before the enumeration. This issue is exacerbated if enumeration and companion array are in different translation units. \bigskip While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful enumeration features in other programming languages. \section{\CFA Polymorphism} \subsection{Function Overloading} Function overloading is programming languages feature wherein functions may share the same name, but with different function signatures. In both C++ and \CFA, function names can be overloaded with different entities as long as they are different in terms of the number and type of parameters. \begin{cfa} void f(); // (1) void f(int); // (2); Overloaded on the number of parameters void f(char); // (3); Overloaded on parameter type f('A'); \end{cfa} In this case, the name f is overloaded with a nullity function and two arity functions with different parameters types. Exactly which precedures being executed is determined based on the passing arguments. The last expression of the preceding example calls f with one arguments, narrowing the possible candidates down to (2) and (3). Between those, function argument 'A' is an exact match to the parameter expected by (3), while needing an @implicit conversion@ to call (2). The compiler determines (3) is the better candidates among and procedure (3) is being executed. \begin{cfa} int f(int); // (4); Overloaded on return type [int, int] f(int); // (5) Overloaded on the number of return value \end{cfa} The function declarations (4) and (5) show the ability of \CFA functions overloaded with different return value, a feature that is not shared by C++. \subsection{Operator Overloading} Operators in \CFA are specialized function and are overloadable by with specially-named functions represents the syntax used to call the operator. % For example, @bool ?==?T(T lhs, T rhs)@ overloads equality operator for type T, where @?@ is the placeholders for operands for the operator. \begin{cfa} enum Weekday { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday }; bool ? days_can_fly(bird)) { fly(bird); } } } struct Robin {} r; int days_can_fly(Robin r) { return 23; } void fly(Robin r) {} bird_fly( r ); \end{cfa} Grouping type assertions into named trait effectively create a reusable interface for parametrics polymorphics types. \section{Expression Resolution} The overloading feature poses a challenge in \CFA expression resolution. Overloadeded identifiers can refer multiple candidates, with multiples being simultaneously valid. The main task of \CFA resolver is to identity a best candidate that involes less implicit conversion and polymorphism. \subsection{Conversion Cost} \label{s:ConversionCost} In C, function call arguments and function parameters do not need to be a exact match. When types mismatch, C performs an \newterm{implicit conversion} on argument to parameter type. The process is trivial with the exception on binary operators; When types of operands are different, C nees to decide which operands need implicit conversion. C defines the resolution pattern as "usual arithmetic conversion", in which C looks for a \newterm{common type} between operands, and convert either one or both operands to the common type. Loosely defined, a common type is a the smallest type in terms of size of representation that both operands can be converted into without losing their precision. Such conversion is called "widening" or "safe conversion". C binary operators can be restated as 2-arity functions that overloaded with different parameters. "Usual arithmetic conversion" is to find a overloaded instance that for both arguments, the conversion to parameter type is a widening conversion to the smallest type. \CFA generalizes "usual arithmetic conversion" to \newterm{conversion cost}. In the first design by Bilson, conversion cost is a 3-tuple, @(unsafe, poly, safe)@, where @unsafe@ the number of unsafe (narrorow conversion) from argument to parameter, @poly@ is the number of polymorphic function parameter, and @safe@ is sum of degree of safe (widening) conversion. Sum of degree is a method to quantify C's integer and floating-point rank. Every pair of widening conversion types has been assigned with a \newterm{distance}, and distance between the two same type is 0. For example, the distance from char to int is 2, distance from integer to long is 1, and distance from int to long long int is 2. The distance does not mirror C's rank system. For example, the rank of char and signed char are the same in C, but the distance from char to signed char is assigned with 1. @safe@ cost is summing all pair of argument to parameter safe conversion distance. Among the three in Bilson's model, @unsafe@ is the most significant cost and @safe@ is the least significant one, with an implication that \CFA always choose a candidate with the lowest @unsafe@ if possible. Assume the overloaded function @foo@ is called with two @int@ parameter. The cost for every overloaded @foo@ has been list along: \begin{cfa} void foo(char, char); // (2, 0, 0) void foo(char, int); // (1, 0, 0) forall(T, V) void foo(T, V); // (0, 2, 0) forall(T) void foo(T, T); // (0, 2, 0) forall(T) void foo(T, int); // (0, 1, 0) void foo(long long, long); // (0, 0, 3) void foo(long, long); // (0, 0, 2) void foo(int, long); // (0, 0, 1) int i, j; foo(i, j); \end{cfa} The overloaded instances are ordered from the highest to the lowest cost, and \CFA select the last candidate. In the later iteration of \CFA, Schluntz and Aaron expanded conversion cost to a 7-tuple with 4 additional categories, @@(unsafe, poly, safe, sign, vars, specialization, reference)@@. with interpretation listed below: \begin{itemize} \item Unsafe \item Poly \item Safe \item Sign is the number of sign/unsign variable conversion. \item Vars is the number of polymorphics type variable. \item Specialization is negative value of the number of type assertion. \item Reference is number of reference-to-rvalue conversion. \end{itemize} The extended conversion cost models looks for candidates that are more specific and less generic. @Var@s was introduced by Aaron to disambugate @forall(T, V) void foo(T, V)@ and @forall(T) void foo(T, T)@. The extra type parameter @V@ makes it more generic and less specific. More generic type means less constraints on types of its parameters. \CFA is in favor of candidates with more restrictions on polymorphism so @forall(T) void foo(T, T)@ has lower cost. @Specialization@ is a value that always less than or equal to zero. For every type assertion in @forall@ clause, \CFA subtracts one from @specialization@, starting from zero. More type assertions often means more constraints on argument type, and making the function less generic. \CFA defines two special cost value: @zero@ and @infinite@ A conversion cost is @zero@ when argument and parameter has exact match, and a conversion cost is @infinite@ when there is no defined conversion between two types. For example, the conversion cost from int to a struct type S is @infinite@.