| 1 | \chapter{Background}
|
|---|
| 2 |
|
|---|
| 3 | This chapter covers background material for C enumerations and \CFA features used in later discussion.
|
|---|
| 4 |
|
|---|
| 5 |
|
|---|
| 6 | \section{C}
|
|---|
| 7 |
|
|---|
| 8 | As mentioned in \VRef{s:Aliasing}, it is common for C programmers to ``believe'' there are three equivalent forms of named constants.
|
|---|
| 9 | \begin{clang}
|
|---|
| 10 | #define Mon 0
|
|---|
| 11 | static const int Mon = 0;
|
|---|
| 12 | enum { Mon };
|
|---|
| 13 | \end{clang}
|
|---|
| 14 | \begin{enumerate}[leftmargin=*]
|
|---|
| 15 | \item
|
|---|
| 16 | For @#define@, the programmer has to explicitly manage the constant name and value.
|
|---|
| 17 | Furthermore, these C preprocessor macro names are outside of the C type-system and can incorrectly change random text in a program.
|
|---|
| 18 | \item
|
|---|
| 19 | The same explicit management is true for the @const@ declaration, and the @const@ variable cannot appear in constant-expression locations, like @case@ labels, array dimensions,\footnote{
|
|---|
| 20 | C allows variable-length array-declarations (VLA), so this case does work, but it fails in \CC, which does not support VLAs, unless it is \lstinline{g++}.} immediate oper\-ands of assembler instructions, and occupies storage.
|
|---|
| 21 | \begin{clang}
|
|---|
| 22 | $\$$ nm test.o
|
|---|
| 23 | 0000000000000018 r Mon
|
|---|
| 24 | \end{clang}
|
|---|
| 25 | \item
|
|---|
| 26 | Only the @enum@ form is managed by the compiler, is part of the language type-system, works in all C constant-expression locations, and normally does not occupy storage.
|
|---|
| 27 | \end{enumerate}
|
|---|
| 28 |
|
|---|
| 29 |
|
|---|
| 30 | \subsection{C \lstinline{const}}
|
|---|
| 31 | \label{s:Cconst}
|
|---|
| 32 |
|
|---|
| 33 | C can simulate the aliasing @const@ declarations \see{\VRef{s:Aliasing}}, with static and dynamic initialization.
|
|---|
| 34 | \begin{cquote}
|
|---|
| 35 | \begin{tabular}{@{}ll@{}}
|
|---|
| 36 | \multicolumn{1}{@{}c}{\textbf{static initialization}} & \multicolumn{1}{c@{}}{\textbf{dynamic initialization}} \\
|
|---|
| 37 | \begin{clang}
|
|---|
| 38 | static const int one = 0 + 1;
|
|---|
| 39 | static const void * NIL = NULL;
|
|---|
| 40 | static const double PI = 3.14159;
|
|---|
| 41 | static const char Plus = '+';
|
|---|
| 42 | static const char * Fred = "Fred";
|
|---|
| 43 | static const int Mon = 0, Tue = Mon + 1, Wed = Tue + 1,
|
|---|
| 44 | Thu = Wed + 1, Fri = Thu + 1, Sat = Fri + 1, Sun = Sat + 1;
|
|---|
| 45 | \end{clang}
|
|---|
| 46 | &
|
|---|
| 47 | \begin{clang}
|
|---|
| 48 | void foo() {
|
|---|
| 49 | // auto scope only
|
|---|
| 50 | const int r = random() % 100;
|
|---|
| 51 | int va[r];
|
|---|
| 52 | }
|
|---|
| 53 | \end{clang}
|
|---|
| 54 | \end{tabular}
|
|---|
| 55 | \end{cquote}
|
|---|
| 56 | However, statically initialized identifiers cannot appear in constant-expression contexts, \eg @case@.
|
|---|
| 57 | Dynamically initialized identifiers may appear in initialization and array dimensions in @g++@, which allows variable-sized arrays on the stack.
|
|---|
| 58 | Again, this form of aliasing is not an enumeration.
|
|---|
| 59 |
|
|---|
| 60 |
|
|---|
| 61 | \subsection{C Enumeration}
|
|---|
| 62 | \label{s:CEnumeration}
|
|---|
| 63 |
|
|---|
| 64 | The C enumeration has the following syntax~\cite[\S~6.7.2.2]{C11}.
|
|---|
| 65 | \begin{clang}[identifierstyle=\linespread{0.9}\it]
|
|---|
| 66 | $\it enum$-specifier:
|
|---|
| 67 | enum identifier$\(_{opt}\)$ { enumerator-list }
|
|---|
| 68 | enum identifier$\(_{opt}\)$ { enumerator-list , }
|
|---|
| 69 | enum identifier
|
|---|
| 70 | enumerator-list:
|
|---|
| 71 | enumerator
|
|---|
| 72 | enumerator-list , enumerator
|
|---|
| 73 | enumerator:
|
|---|
| 74 | enumeration-constant
|
|---|
| 75 | enumeration-constant = constant-expression
|
|---|
| 76 | \end{clang}
|
|---|
| 77 | The terms \emph{enumeration} and \emph{enumerator} used in this work \see{\VRef{s:Terminology}} come from the grammar.
|
|---|
| 78 | The C enumeration semantics are discussed using examples.
|
|---|
| 79 |
|
|---|
| 80 |
|
|---|
| 81 | \subsubsection{Type Name}
|
|---|
| 82 | \label{s:TypeName}
|
|---|
| 83 |
|
|---|
| 84 | An \emph{unnamed} enumeration is used to provide aliasing \see{\VRef{s:Aliasing}} exactly like a @const@ declaration in other languages.
|
|---|
| 85 | However, it is restricted to integral values.
|
|---|
| 86 | \begin{clang}
|
|---|
| 87 | enum { Size = 20, Max = 10, MaxPlus10 = Max + 10, @Max10Plus1@, Fred = -7 };
|
|---|
| 88 | \end{clang}
|
|---|
| 89 | Here, the aliased constants are: 20, 10, 20, 21, and -7.
|
|---|
| 90 | Direct initialization is by a compile-time expression generating a constant value.
|
|---|
| 91 | Indirect initialization (without initializer, @Max10Plus1@) is called \newterm{auto-initialization}, where enumerators are assigned values from left to right, starting at zero or the next explicitly initialized constant, incrementing by @1@.
|
|---|
| 92 | Because multiple independent enumerators can be combined, enumerators with the same values can occur.
|
|---|
| 93 | The enumerators are rvalues, so assignment is disallowed.
|
|---|
| 94 | Finally, enumerators are \newterm{unscoped}, \ie enumerators declared inside of an @enum@ are visible (projected) outside into the enclosing scope of the @enum@ type.
|
|---|
| 95 | For unnamed enumerations, this semantic is required because there is no type name for scoped qualification.
|
|---|
| 96 |
|
|---|
| 97 | As noted, this kind of aliasing declaration is not an enumeration, even though it is declared using an @enum@ in C.
|
|---|
| 98 | While the semantics is misleading, this enumeration form matches with aggregate types:
|
|---|
| 99 | \begin{cfa}
|
|---|
| 100 | typedef struct @/* unnamed */@ { ... } S;
|
|---|
| 101 | struct @/* unnamed */@ { ... } x, y, z; $\C{// questionable}$
|
|---|
| 102 | struct S {
|
|---|
| 103 | union @/* unnamed */@ { $\C{// unscoped fields}$
|
|---|
| 104 | int i; double d ; char ch;
|
|---|
| 105 | };
|
|---|
| 106 | };
|
|---|
| 107 | \end{cfa}
|
|---|
| 108 | Hence, C programmers would expect this enumeration form to exist in harmony with the aggregate form.
|
|---|
| 109 |
|
|---|
| 110 | A \emph{named} enumeration is an enumeration:
|
|---|
| 111 | \begin{clang}
|
|---|
| 112 | enum @Week@ { Mon, Tue, Wed, Thu@ = 10@, Fri, Sat, Sun };
|
|---|
| 113 | \end{clang}
|
|---|
| 114 | and adopts the same semantics with respect to direct and auto intialization.
|
|---|
| 115 | For example, @Mon@ to @Wed@ are implicitly assigned with constants @0@--@2@, @Thu@ is explicitly set to constant @10@, and @Fri@ to @Sun@ are implicitly assigned with constants @11@--@13@.
|
|---|
| 116 | As well, initialization may occur in any order.
|
|---|
| 117 | \begin{clang}
|
|---|
| 118 | enum Week {
|
|---|
| 119 | Thu@ = 10@, Fri, Sat, Sun,
|
|---|
| 120 | Mon@ = 0@, Tue, Wed@,@ $\C{// terminating comma}$
|
|---|
| 121 | };
|
|---|
| 122 | \end{clang}
|
|---|
| 123 | Note, the comma in the enumerator list can be a terminator or a separator, allowing the list to end with a dangling comma.\footnote{
|
|---|
| 124 | A terminating comma appears in other C syntax, \eg the initializer list.}
|
|---|
| 125 | This feature allow enumerator lines to be interchanged without moving a comma.
|
|---|
| 126 | Named enumerators are also unscoped.
|
|---|
| 127 |
|
|---|
| 128 |
|
|---|
| 129 | \subsubsection{Implementation}
|
|---|
| 130 | \label{s:CenumImplementation}
|
|---|
| 131 |
|
|---|
| 132 | In theory, a C enumeration \emph{variable} is an implementation-defined integral type large enough to hold all enumerator values.
|
|---|
| 133 | In practice, C defines @int@~\cite[\S~6.4.4.3]{C11} as the underlying type for enumeration variables, restricting initialization to integral constants, which have type @int@ (unless qualified with a size suffix).
|
|---|
| 134 | However, type @int@ is defined as:
|
|---|
| 135 | \begin{quote}
|
|---|
| 136 | A ``plain'' @int@ object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range @INT_MIN@ to @INT_MAX@ as defined in the header @<limits.h>@).~\cite[\S~6.2.5(5)]{C11}
|
|---|
| 137 | \end{quote}
|
|---|
| 138 | However, @int@ means a 4 bytes on both 32/64-bit architectures, which does not seem like the ``natural'' size for a 64-bit architecture.
|
|---|
| 139 | Whereas, @long int@ means 4 bytes on a 32-bit and 8 bytes on 64-bit architectures, and @long long int@ means 8 bytes on both 32/64-bit architectures, where 64-bit operations are simulated on 32-bit architectures.
|
|---|
| 140 | \VRef[Figure]{f:gccEnumerationStorageSize} shows both @gcc@ and @clang@ partially ignore this specification and type the integral size of an enumerator based its initialization.
|
|---|
| 141 | Hence, initialization in the range @INT_MIN@..@INT_MAX@ results in a 4-byte enumerator, and outside this range the enumerator is 8 bytes.
|
|---|
| 142 | Note that @sizeof( typeof( IMin ) ) != sizeof( E )@, making the size of an enumerator different than is containing enumeration type, which seems inconsistent, \eg @sizeof( typeof( 3 ) ) == sizeof( int )@.
|
|---|
| 143 |
|
|---|
| 144 | \begin{figure}
|
|---|
| 145 | \begin{cfa}
|
|---|
| 146 | enum E { IMin = INT_MIN, IMax = INT_MAX,
|
|---|
| 147 | ILMin = LONG_MIN, ILMax = LONG_MAX,
|
|---|
| 148 | ILLMin = LLONG_MIN, ILLMax = LLONG_MAX };
|
|---|
| 149 | int main() {
|
|---|
| 150 | printf( "%zd %zd\n%zd %zd\n%zd %d %d\n%zd %ld %ld\n%zd %ld %ld\n",
|
|---|
| 151 | sizeof(enum E), sizeof(typeof(IMin)),
|
|---|
| 152 | sizeof(int), sizeof(long int),
|
|---|
| 153 | sizeof(IMin), IMin, IMax,
|
|---|
| 154 | sizeof(ILMin), ILMin, ILMax,
|
|---|
| 155 | sizeof(ILLMin), ILLMin, ILLMax );
|
|---|
| 156 | }
|
|---|
| 157 | 8 4
|
|---|
| 158 | 4 -2147483648 2147483647
|
|---|
| 159 | 8 -9223372036854775808 9223372036854775807
|
|---|
| 160 | 8 -9223372036854775808 9223372036854775807
|
|---|
| 161 | \end{cfa}
|
|---|
| 162 | \caption{\lstinline{gcc}/\lstinline{clang} Enumeration Storage Size}
|
|---|
| 163 | \label{f:gccEnumerationStorageSize}
|
|---|
| 164 | \end{figure}
|
|---|
| 165 |
|
|---|
| 166 |
|
|---|
| 167 | \subsubsection{Usage}
|
|---|
| 168 | \label{s:Usage}
|
|---|
| 169 |
|
|---|
| 170 | C proves an implicit \emph{bidirectional} conversion between an enumeration and its integral type, and between two different enumerations.
|
|---|
| 171 | \begin{clang}
|
|---|
| 172 | enum Week week = Mon; $\C{// week == 0}$
|
|---|
| 173 | week = Fri; $\C{// week == 11}$
|
|---|
| 174 | int i = Sun; $\C{// implicit conversion to int, i == 13}$
|
|---|
| 175 | @week = 10000;@ $\C{// UNDEFINED! implicit conversion to Week}$
|
|---|
| 176 |
|
|---|
| 177 | enum Season { Spring, Summer, Fall, Winter };
|
|---|
| 178 | @week = Winter;@ $\C{// UNDEFINED! implicit conversion to Week}$
|
|---|
| 179 | \end{clang}
|
|---|
| 180 | While converting an enumerator to its underlying type is useful, the implicit conversion from the base or another enumeration type to an enumeration is a common source of error.
|
|---|
| 181 |
|
|---|
| 182 | Enumerators can appear in @switch@ and looping statements.
|
|---|
| 183 | \begin{cfa}
|
|---|
| 184 | enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun };
|
|---|
| 185 | switch ( week ) {
|
|---|
| 186 | case Mon ... Fri: $\C{// gcc case range}$
|
|---|
| 187 | printf( "weekday\n" );
|
|---|
| 188 | case Sat: case Sun:
|
|---|
| 189 | printf( "weekend\n" );
|
|---|
| 190 | }
|
|---|
| 191 | for ( enum Week day = Mon; day <= Sun; day += 1 ) { $\C{// step of 1}$
|
|---|
| 192 | printf( "day %d\n", day ); // 0-6
|
|---|
| 193 | }
|
|---|
| 194 | \end{cfa}
|
|---|
| 195 | For iterating using arithmetic to make sense, the enumerator values \emph{must} have a consecutive ordering with a fixed step between values.
|
|---|
| 196 | For example, a previous gap introduced by @Thu = 10@, results in iterating over the values 0--13, where values 3--9 are not @Week@ values.
|
|---|
| 197 | Note, it is the bidirectional conversion that allows incrementing @day@: @day@ is converted to @int@, integer @1@ is added, and the result is converted back to @Week@ for the assignment to @day@.
|
|---|
| 198 | For safety, \CC does not support the bidirectional conversion, and hence, an unsafe cast is necessary to increment @day@: @day = (Week)(day + 1)@.
|
|---|
| 199 |
|
|---|
| 200 | There is a C idiom to automatically compute the number of enumerators in an enumeration.
|
|---|
| 201 | \begin{cfa}
|
|---|
| 202 | enum E { A, B, C, D, @N@ }; // N == 4
|
|---|
| 203 | for ( enum E e = A; e < @N@; e += 1 ) ...
|
|---|
| 204 | \end{cfa}
|
|---|
| 205 | Here, serendipitously the auto-incrementing counts the number of enumerators and puts the total into the last enumerator @N@.
|
|---|
| 206 | This @N@ is often used as the dimension for an array assocated with the enumeration.
|
|---|
| 207 | \begin{cfa}
|
|---|
| 208 | E array[@N@];
|
|---|
| 209 | for ( enum E e = A; e < N; e += 1 ) {
|
|---|
| 210 | array[e] = e;
|
|---|
| 211 | }
|
|---|
| 212 | \end{cfa}
|
|---|
| 213 | However, for non-consecutive ordering and non-integral typed enumerations, \see{\VRef{f:EumeratorTyping}}, this idiom fails.
|
|---|
| 214 |
|
|---|
| 215 | This idiom is often used with another C idiom for matching companion information.
|
|---|
| 216 | For example, an enumeration may be linked with a companion array of printable strings.
|
|---|
| 217 | \begin{cfa}
|
|---|
| 218 | enum Integral_Type { chr, schar, uschar, sshort, ushort, sint, usint, ..., NO_OF_ITYPES };
|
|---|
| 219 | char * Integral_Name[@NO_OF_ITYPES@] = {
|
|---|
| 220 | "char", "signed char", "unsigned char",
|
|---|
| 221 | "signed short int", "unsigned short int",
|
|---|
| 222 | "signed int", "unsigned int", ...
|
|---|
| 223 | };
|
|---|
| 224 | enum Integral_Type @integral_type@ = ...
|
|---|
| 225 | printf( "%s\n", Integral_Name[@integral_type@] ); // human readable type name
|
|---|
| 226 | \end{cfa}
|
|---|
| 227 | However, the companion idiom results in the \emph{harmonizing} problem because an update to the enumeration @Integral_Type@ often requires a corresponding update to the companion array \snake{Integral_Name}.
|
|---|
| 228 | The requirement to harmonize is at best indicated by a comment before the enumeration.
|
|---|
| 229 | This issue is exacerbated if enumeration and companion array are in different translation units.
|
|---|
| 230 |
|
|---|
| 231 | \bigskip
|
|---|
| 232 | While C provides a true enumeration, it is restricted, has unsafe semantics, and does not provide useful/advanced enumeration features found in other programming languages.
|
|---|
| 233 |
|
|---|
| 234 |
|
|---|
| 235 | \section{\CFA}
|
|---|
| 236 |
|
|---|
| 237 | \CFA in \emph{not} an object-oriented programming-language, \ie functions cannot be nested in aggregate types, and hence, there is no receive notation for calling functions, \eg @obj.method(...)@, where the first argument proceeds the call.
|
|---|
| 238 | The following section provide short descriptions of \CFA features mentioned further in the thesis.
|
|---|
| 239 |
|
|---|
| 240 |
|
|---|
| 241 | \subsection{Operator Overloading}
|
|---|
| 242 |
|
|---|
| 243 | Virtually all programming languages overload the arithmetic operators across the basic types using the number and type of parameters and returns.
|
|---|
| 244 | Like \CC, \CFA also allows these operators to be overloaded with user-defined types.
|
|---|
| 245 | The syntax for operator names uses the @'?'@ character to denote a parameter, \eg prefix and infix increment operators: @?++@, @++?@, and @?+?@.
|
|---|
| 246 | \begin{cfa}
|
|---|
| 247 | struct S { int i, j };
|
|---|
| 248 | S @?+?@( S op1, S op2 ) { return (S){ op1.i + op2.i, op1.j + op2.j }; }
|
|---|
| 249 | S s1, s2;
|
|---|
| 250 | s1 = s1 @+@ s2; $\C[1.75in]{// infix call}$
|
|---|
| 251 | s1 = @?+?@( s1, s2 ); $\C{// direct call}\CRT$
|
|---|
| 252 | \end{cfa}
|
|---|
| 253 | The type system examines each call size and selects the best matching overloaded function based on the number and types of the arguments.
|
|---|
| 254 | If there are intermixed operands, @2 + 3.5@, the type system attempts (safe) conversions changing the arguments to one or more of the parameter type(s).
|
|---|
| 255 |
|
|---|
| 256 |
|
|---|
| 257 | \subsection{Function Overloading}
|
|---|
| 258 |
|
|---|
| 259 | Both \CFA and \CC allow function names to be overloaded, as long as their prototypes differ in the number and type of parameters and returns.
|
|---|
| 260 | \begin{cfa}
|
|---|
| 261 | void f( void ); $\C[1.75in]{// (1): no parameter}$
|
|---|
| 262 | void f( char ); $\C{// (2): overloaded on the number and parameter type}$
|
|---|
| 263 | void f( int, int ); $\C{// (3): overloaded on the number and parameter type}$
|
|---|
| 264 | f( 'A' ); $\C{// select (2)}\CRT$
|
|---|
| 265 | \end{cfa}
|
|---|
| 266 | In this case, the name @f@ is overloaded depending on the number and parameter types.
|
|---|
| 267 | The type system examines each call size and selects the best matching based on the number and types of the arguments.
|
|---|
| 268 | Here, there is a perfect match for the call, @f( 'A' )@ with the number and parameter type of function (2).
|
|---|
| 269 |
|
|---|
| 270 | Ada, Scala, and \CFA type-systems also use the return type in resolving a call.
|
|---|
| 271 | \begin{cfa}
|
|---|
| 272 | int f( void ); $\C[1.75in]{// (4); overloaded on return type}$
|
|---|
| 273 | double f( void ); $\C{// (5); overloaded on return type}$
|
|---|
| 274 | int i = f(); $\C{// select (4)}$
|
|---|
| 275 | double d = f(); $\C{// select (5)}\CRT$
|
|---|
| 276 | \end{cfa}
|
|---|
| 277 |
|
|---|
| 278 |
|
|---|
| 279 | \subsection{Variable Overloading}
|
|---|
| 280 | Unlike almost all programming languages, \CFA has variable overloading within a scope, along with shadow overloading in nested scopes.
|
|---|
| 281 | \begin{cfa}
|
|---|
| 282 | void foo( double d );
|
|---|
| 283 | int v; $\C[1.75in]{// (1)}$
|
|---|
| 284 | double v; $\C{// (2) variable overloading}$
|
|---|
| 285 | foo( v ); $\C{// select (2)}$
|
|---|
| 286 | {
|
|---|
| 287 | int v; $\C{// (3) shadow overloading}$
|
|---|
| 288 | double v; $\C{// (4) and variable overloading}$
|
|---|
| 289 | foo( v ); $\C{// select (4)}\CRT$
|
|---|
| 290 | }
|
|---|
| 291 | \end{cfa}
|
|---|
| 292 | The \CFA type system simply treats overloaded variables as an overloaded function returning a value with no parameters.
|
|---|
| 293 |
|
|---|
| 294 |
|
|---|
| 295 | \subsection{Constructor and Destructor}
|
|---|
| 296 |
|
|---|
| 297 | While \CFA in not object oriented, it adopts many language features commonly used in object-oriented languages;
|
|---|
| 298 | these features are independent from object-oriented programming.
|
|---|
| 299 |
|
|---|
| 300 | All objects in \CFA are initialized by @constructors@ \emph{after} allocation and de-initialized \emph{before} deallocation.
|
|---|
| 301 | \CC cannot have constructors for basic-types because they have no aggregate type \lstinline[language=C++]{struct/class} in which to insert a constructor definition.
|
|---|
| 302 | Like \CC, \CFA has multiple auto-generated constructors for every type.
|
|---|
| 303 |
|
|---|
| 304 | The prototype for the constructor/destructor are @void ?{}( T &, ... )@ and @void ^?{}( T &, ... )@, respectively.
|
|---|
| 305 | The first parameter is logically, the \lstinline[language=C++]{this} or \lstinline[language=java]{self} in other object-oriented languages, and implicitly passed.
|
|---|
| 306 | \begin{cfa}
|
|---|
| 307 | struct Employee {
|
|---|
| 308 | char * name;
|
|---|
| 309 | double salary;
|
|---|
| 310 | };
|
|---|
| 311 | void @?{}@( Employee & this, char * name, double salary ) {
|
|---|
| 312 | this.name = aalloc( sizeof(name) );
|
|---|
| 313 | strcpy( this.name, name );
|
|---|
| 314 | this.salary = salary;
|
|---|
| 315 | }
|
|---|
| 316 | void @^?{}@( Employee & this ) {
|
|---|
| 317 | free( this.name );
|
|---|
| 318 | }
|
|---|
| 319 | {
|
|---|
| 320 | Employee name = { "Sara Schmidt", 20.5 };
|
|---|
| 321 | } // implicit destructor call
|
|---|
| 322 | \end{cfa}
|
|---|
| 323 | Both constructor and destructor can be explicitly called.
|
|---|
| 324 | \begin{cfa}
|
|---|
| 325 | Employee name = { "Sara Schmidt", 20.5 };
|
|---|
| 326 | ... // use name
|
|---|
| 327 | ^?{}( name ); // de-initialize
|
|---|
| 328 | ?{}( name, "Jack Smith", 10.5 }; // re-initialize
|
|---|
| 329 | ... // use name
|
|---|
| 330 | \end{cfa}
|
|---|
| 331 |
|
|---|
| 332 |
|
|---|
| 333 | \subsection{Special Literals}
|
|---|
| 334 |
|
|---|
| 335 | The C constants @0@ and @1@ have special meaning.
|
|---|
| 336 | @0@ is the null pointer and used in conditional expressions, where @if ( p )@ is rewritten @if ( p != 0 )@;
|
|---|
| 337 | @1@ is an additive identity in unary operators @++@ and @--@.
|
|---|
| 338 | Aware of their significance, \CFA provides a special type @zero_t@ and @one_t@ for custom types.
|
|---|
| 339 | \begin{cfa}
|
|---|
| 340 | struct S { int i, j; };
|
|---|
| 341 | void ?{}( S & this, @zero_t@ ) { this.i = 0; this.j = 0; } // zero_t, no parameter name allowed
|
|---|
| 342 | int ?@!=@?( S this, @zero_t@ ) { return this.i != 0 && this.j != 0; }
|
|---|
| 343 | S s = @0@;
|
|---|
| 344 | if ( s @!= 0@ ) ...
|
|---|
| 345 | \end{cfa}
|
|---|
| 346 | Similarity, for @one_t@.
|
|---|
| 347 | \begin{cfa}
|
|---|
| 348 | void ?{}( S & this, @one_t@ ) { this.i = 1; this.j = 1; } // one_t, no parameter name allowed
|
|---|
| 349 | S & ?++( S & this, @one_t@ ) { return (S){ this.i++, this.j++ }; }
|
|---|
| 350 | \end{cfa}
|
|---|
| 351 |
|
|---|
| 352 |
|
|---|
| 353 | \subsection{Polymorphic Functions}
|
|---|
| 354 |
|
|---|
| 355 | Polymorphic functions are the functions that apply to all types.
|
|---|
| 356 | \CFA provides \newterm{parametric polymorphism} written with the @forall@ clause.
|
|---|
| 357 | \begin{cfa}
|
|---|
| 358 | @forall( T )@ T identity( T v ) { return v; }
|
|---|
| 359 | identity( 42 );
|
|---|
| 360 | \end{cfa}
|
|---|
| 361 | The @identity@ function accepts a value from any type as an argument and returns that value.
|
|---|
| 362 | At the call size, the type parameter @T@ is bounded to @int@ from the argument @42@.
|
|---|
| 363 |
|
|---|
| 364 | For polymorphic functions to be useful, the @forall@ clause needs \newterm{type assertion}s that restricts the polymorphic types it accepts.
|
|---|
| 365 | \begin{cfa}
|
|---|
| 366 | forall( T @| { void foo( T ); }@ ) void bar( T t ) { @foo( t );@ }
|
|---|
| 367 | struct S { ... } s;
|
|---|
| 368 | void foo( struct S );
|
|---|
| 369 | bar( s );
|
|---|
| 370 | \end{cfa}
|
|---|
| 371 | The assertion on @T@ restricts the range of types that can be manipulated by @bar@ to only those that have an implementation of @foo@ with the matching signature, allowing @bar@'s call to @foo@ in its body.
|
|---|
| 372 |
|
|---|
| 373 |
|
|---|
| 374 | \subsection{Trait}
|
|---|
| 375 |
|
|---|
| 376 | A @forall@ clause can assert many restrictions on multiple types.
|
|---|
| 377 | A common practice is to refactor the assertions into a named \newterm{trait}.
|
|---|
| 378 | \begin{cfa}
|
|---|
| 379 | forall(T) trait @Bird@ {
|
|---|
| 380 | int days_can_fly( T );
|
|---|
| 381 | void fly( T );
|
|---|
| 382 | };
|
|---|
| 383 | forall( B | @Bird@( B ) )
|
|---|
| 384 | void bird_fly( int days_since_born, B bird ) {
|
|---|
| 385 | if ( days_since_born > days_can_fly( bird )) fly( bird );
|
|---|
| 386 | }
|
|---|
| 387 | struct Robin {} robin;
|
|---|
| 388 | int days_can_fly( Robin robin ) { return 23; }
|
|---|
| 389 | void fly( Robin robin ) {}
|
|---|
| 390 | bird_fly( 23, robin );
|
|---|
| 391 | \end{cfa}
|
|---|
| 392 | Grouping type assertions into a named trait effectively creates a reusable interface for parametric-polymorphic types.
|
|---|
| 393 |
|
|---|
| 394 |
|
|---|
| 395 | \section{Expression Resolution}
|
|---|
| 396 |
|
|---|
| 397 | Overloading poses a challenge for all expression-resolution systems.
|
|---|
| 398 | Multiple overloaded names give multiple candidates at a call site, and a resolver must pick a \emph{best} match, where ``best'' is defined by a series of heuristics based on safety and programmer intuition/expectation.
|
|---|
| 399 | When multiple best matches exist, the resolution is ambiguous.
|
|---|
| 400 |
|
|---|
| 401 | The \CFA resolver attempts to identity a best candidate based on: first, the number of parameters and types, and second, when no exact match exists, the fewest implicit conversions and polymorphic variables.
|
|---|
| 402 | Finding an exact match is not discussed here, because the mechanism is fairly straightforward, even when the search space is large;
|
|---|
| 403 | only finding a non-exact match is discussed in detail.
|
|---|
| 404 |
|
|---|
| 405 |
|
|---|
| 406 | \subsection{Conversion Cost}
|
|---|
| 407 | \label{s:ConversionCost}
|
|---|
| 408 |
|
|---|
| 409 | Most programming languages perform some implicit conversions among basic types to facilitate mixed-mode arithmetic;
|
|---|
| 410 | otherwise, the program becomes littered with many explicit casts, which is not match programmer expectation.
|
|---|
| 411 | C is an aggressive language as it provides conversions among almost all of the basic types, even when the conversion is potentially unsafe or not meaningful, \ie @float@ to @bool@.
|
|---|
| 412 | C defines the resolution pattern as ``usual arithmetic conversion''~\cite[\S~6.3.1.8]{C11}, in which C looks for a \newterm{common type} between operands, and converts one or both operands to the common type.
|
|---|
| 413 | Loosely defined, a common type is a the smallest type in terms of size of representation that both operands can be converted into without losing their precision, called a \newterm{widening} or \newterm{safe conversion}.
|
|---|
| 414 |
|
|---|
| 415 | \CFA generalizes ``usual arithmetic conversion'' to \newterm{conversion cost}.
|
|---|
| 416 | In the first design by Bilson~\cite{Bilson03}, conversion cost is a 3-tuple, @(unsafe, poly, safe)@ applied between each argument/parameter type, where:
|
|---|
| 417 | \begin{enumerate}
|
|---|
| 418 | \item
|
|---|
| 419 | @unsafe@ is the number of precision losing (\newterm{narrowing} conversions),
|
|---|
| 420 | \item
|
|---|
| 421 | @poly@ is the number of polymorphic function parameters, and
|
|---|
| 422 | \item
|
|---|
| 423 | @safe@ is sum of the degree of safe (widening) conversions.
|
|---|
| 424 | \end{enumerate}
|
|---|
| 425 | Sum of degree is a method to quantify C's integer and floating-point rank.
|
|---|
| 426 | Every pair of widening conversion types is assigned a \newterm{distance}, and distance between the two same type is 0.
|
|---|
| 427 | For example, the distance from @char@ to @int@ is 2, distance from @int@ to @long@ is 1, and distance from @int@ to @long long int@ is 2.
|
|---|
| 428 | This distance does not mirror C's rank system.
|
|---|
| 429 | For example, the rank of @char@ and @signed char@ are the same in C, but the distance from @char@ to @signed char@ is assigned 1.
|
|---|
| 430 | @safe@ cost is summing all pairs of argument to parameter safe conversion distances.
|
|---|
| 431 | Among the three costs in Bilson's model, @unsafe@ is the most significant cost and @safe@ is the least significant, with an implication that \CFA always choose a candidate with the lowest @unsafe@, if possible.
|
|---|
| 432 |
|
|---|
| 433 | For example, assume the overloaded function @foo@ is called with two @int@ parameter.
|
|---|
| 434 | The cost for every overloaded @foo@ has been list along:
|
|---|
| 435 | \begin{cfa}
|
|---|
| 436 | void foo( char, char ); $\C[2.5in]{// (1) (2, 0, 0)}$
|
|---|
| 437 | void foo( char, int ); $\C{// (2) (1, 0, 0)}$
|
|---|
| 438 | forall( T, V ) void foo( T, V ); $\C{// (3) (0, 2, 0)}$
|
|---|
| 439 | forall( T ) void foo( T, T ); $\C{// (4) (0, 2, 0)}$
|
|---|
| 440 | forall( T ) void foo( T, int ); $\C{// (5) (0, 1, 0)}$
|
|---|
| 441 | void foo( long long, long ); $\C{// (6) (0, 0, 3)}$
|
|---|
| 442 | void foo( long, long ); $\C{// (7) (0, 0, 2)}$
|
|---|
| 443 | void foo( int, long ); $\C{// (8) (0, 0, 1)}$
|
|---|
| 444 | int i, j;
|
|---|
| 445 | foo( i, j ); $\C{// convert j to long and call (8)}\CRT$
|
|---|
| 446 | \end{cfa}
|
|---|
| 447 | The overloaded instances are ordered from the highest to the lowest cost, and \CFA select the last candidate (8).
|
|---|
| 448 |
|
|---|
| 449 | In the next iteration of \CFA, Schluntz and Aaron~\cite{Moss18} expanded conversion cost to a 7-tuple with 4 additional categories, @(unsafe, poly, safe, sign, vars, specialization, reference)@, with the following interpretations:
|
|---|
| 450 | \begin{enumerate}
|
|---|
| 451 | \item @unsafe@ from Bilson
|
|---|
| 452 | \item @poly@
|
|---|
| 453 | \item @safe@
|
|---|
| 454 | \item @sign@ is the number of sign/unsigned variable conversions
|
|---|
| 455 | \item @vars@ is the number of polymorphic type variables
|
|---|
| 456 | \item @specialization@ is a negative value of the number of type assertions
|
|---|
| 457 | \item @reference@ is the number of reference-to-rvalue conversions
|
|---|
| 458 | \end{enumerate}
|
|---|
| 459 | The extended conversion-cost model looks for candidates that are more specific and less generic.
|
|---|
| 460 | @vars@ disambiguates @forall( T, V ) foo( T, V )@ and @forall( T ) void foo( T, T )@, where the extra type parameter @V@ makes is more generic.
|
|---|
| 461 | A more generic type means less constraints on its parameter types.
|
|---|
| 462 | \CFA favours candidates with more restrictions on polymorphism, so @forall( T ) void foo( T, T )@ has lower cost.
|
|---|
| 463 | @specialization@ is an arbitrary count-down value starting at zero.
|
|---|
| 464 | For every type assertion in @forall@ clause (no assertions in the above example), \CFA subtracts one from @specialization@.
|
|---|
| 465 | More type assertions means more constraints on argument types, making the function less generic.
|
|---|
| 466 |
|
|---|
| 467 | \CFA defines two special cost values: @zero@ and @infinite@.
|
|---|
| 468 | A conversion cost is @zero@ when argument and parameter has an exact match, and a conversion cost is @infinite@ when there is no defined conversion between two types.
|
|---|
| 469 | For example, the conversion cost from @int@ to a @struct S@ is @infinite@.
|
|---|