Changes in doc/papers/general/Paper.tex [43c6dc82:c659968]
- File:
-
- 1 edited
-
doc/papers/general/Paper.tex (modified) (39 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/general/Paper.tex
r43c6dc82 rc659968 2 2 3 3 \usepackage{fullpage} 4 \usepackage{epic,eepic}5 4 \usepackage{xspace,calc,comment} 6 5 \usepackage{upquote} % switch curled `'" to straight … … 37 36 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 38 37 39 \newcommand{\Textbf}[ 2][red]{{\color{#1}{\textbf{#2}}}}38 \newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}} 40 39 \newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included 41 40 %\newcommand{\TODO}[1]{} % TODO elided … … 102 101 \makeatother 103 102 104 \newenvironment{cquote}{%105 \list{}{\lstset{resetmargins=true,aboveskip=0pt,belowskip=0pt}\topsep=4pt\parsep=0pt\leftmargin=\parindent\rightmargin\leftmargin}%106 \item\relax107 }{%108 \endlist109 }% cquote110 111 103 % CFA programming language, based on ANSI C (with some gcc additions) 112 104 \lstdefinelanguage{CFA}[ANSI]{C}{ … … 234 226 int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$ 235 227 \end{lstlisting} 236 The @identity@ function above can be applied to any complete \ newterm{object type} (or @otype@).228 The @identity@ function above can be applied to any complete \emph{object type} (or @otype@). 237 229 The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. 238 230 The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. 239 If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \ newterm{data type} (or @dtype@).231 If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@). 240 232 241 233 In \CFA, the polymorphism runtime-cost is spread over each polymorphic call, due to passing more arguments to polymorphic functions; … … 243 235 A design advantage is that, unlike \CC template-functions, \CFA polymorphic-functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat. 244 236 245 Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \ newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable.237 Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \emph{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. 246 238 For example, the function @twice@ can be defined using the \CFA syntax for operator overloading: 247 239 \begin{lstlisting} … … 310 302 \end{lstlisting} 311 303 Here, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@. 304 As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context. 305 In addition, several operations are defined in terms values @0@ and @1@, \eg: 306 \begin{lstlisting} 307 int x; 308 if (x) x++ $\C{// if (x != 0) x += 1;}$ 309 \end{lstlisting} 310 Every @if@ and iteration statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result. 311 Due to these rewrite rules, the values @0@ and @1@ have the types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts. 312 The types @zero_t@ and @one_t@ have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal. 313 312 314 313 315 \subsection{Traits} 314 316 315 \CFA provides \ newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:317 \CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration: 316 318 \begin{lstlisting} 317 319 trait summable( otype T ) { … … 337 339 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted. 338 340 339 In summation, the \CFA type-system uses \ newterm{nominal typing} for concrete types, matching with the C type-system, and \newterm{structural typing} for polymorphic types.341 In summation, the \CFA type-system uses \emph{nominal typing} for concrete types, matching with the C type-system, and \emph{structural typing} for polymorphic types. 340 342 Hence, trait names play no part in type equivalence; 341 343 the names are simply macros for a list of polymorphic assertions, which are expanded at usage sites. … … 382 384 Furthermore, writing and using preprocessor macros can be unnatural and inflexible. 383 385 384 \CC, Java, and other languages use \ newterm{generic types} to produce type-safe abstract data-types.386 \CC, Java, and other languages use \emph{generic types} to produce type-safe abstract data-types. 385 387 \CFA also implements generic types that integrate efficiently and naturally with the existing polymorphic functions, while retaining backwards compatibility with C and providing separate compilation. 386 388 However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates. … … 403 405 \end{lstlisting} 404 406 405 \CFA classifies generic types as either \ newterm{concrete} or \newterm{dynamic}.407 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. 406 408 Concrete types have a fixed memory layout regardless of type parameters, while dynamic types vary in memory layout depending on their type parameters. 407 A type may have polymorphic parameters but still be concrete, called \ newterm{dtype-static}.409 A type may have polymorphic parameters but still be concrete, called \emph{dtype-static}. 408 410 Polymorphic pointers are an example of dtype-static types, \eg @forall(dtype T) T *@ is a polymorphic type, but for any @T@, @T *@ is a fixed-sized pointer, and therefore, can be represented by a @void *@ in code generation. 409 411 … … 442 444 Though \CFA implements concrete generic-types efficiently, it also has a fully general system for dynamic generic types. 443 445 As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller. 444 Dynamic generic-types also have an \ newterm{offset array} containing structure-member offsets.446 Dynamic generic-types also have an \emph{offset array} containing structure-member offsets. 445 447 A dynamic generic-union needs no such offset array, as all members are at offset 0, but size and alignment are still necessary. 446 448 Access to members of a dynamic structure is provided at runtime via base-displacement addressing with the structure pointer and the member offset (similar to the @offsetof@ macro), moving a compile-time offset calculation to runtime. … … 455 457 For instance, modularity is generally provided in C by including an opaque forward-declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled @.c@ file. 456 458 \CFA supports this pattern for generic types, but the caller does not know the actual layout or size of the dynamic generic-type, and only holds it by a pointer. 457 The \CFA translator automatically generates \ newterm{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.459 The \CFA translator automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller. 458 460 These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all @sized@ parameters to the generic structure (un@sized@ parameters are forbidden from being used in a context that affects layout). 459 461 Results of these layout functions are cached so that they are only computed once per type per function. %, as in the example below for @pair@. … … 479 481 Since @pair(T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the fields of both pairs and the arguments to the comparison function match in type. 480 482 481 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \ newterm{tag-structures}.483 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag-structures}. 482 484 Sometimes information is only used for type-checking and can be omitted at runtime, \eg: 483 485 \begin{lstlisting} … … 535 537 The addition of multiple-return-value functions (MRVF) are useless without a syntax for accepting multiple values at the call-site. 536 538 The simplest mechanism for capturing the return values is variable assignment, allowing the values to be retrieved directly. 537 As such, \CFA allows assigning multiple values from a function into multiple variables, using a square-bracketed list of lvalue expressions (as above), called a \ newterm{tuple}.538 539 However, functions also use \ newterm{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg:539 As such, \CFA allows assigning multiple values from a function into multiple variables, using a square-bracketed list of lvalue expressions (as above), called a \emph{tuple}. 540 541 However, functions also use \emph{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg: 540 542 \begin{lstlisting} 541 543 printf( "%d %d\n", div( 13, 5 ) ); $\C{// return values seperated into arguments}$ … … 570 572 printf( "%d %d\n", qr ); 571 573 \end{lstlisting} 572 \CFA also supports \ newterm{tuple indexing} to access single components of a tuple expression:574 \CFA also supports \emph{tuple indexing} to access single components of a tuple expression: 573 575 \begin{lstlisting} 574 576 [int, int] * p = &qr; $\C{// tuple pointer}$ … … 613 615 \subsection{Tuple Assignment} 614 616 615 An assignment where the left side is a tuple type is called \ newterm{tuple assignment}.616 There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called \ newterm{multiple} and \newterm{mass assignment}, respectively.617 An assignment where the left side is a tuple type is called \emph{tuple assignment}. 618 There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called \emph{multiple} and \emph{mass assignment}, respectively. 617 619 %\lstDeleteShortInline@% 618 620 %\par\smallskip … … 648 650 \subsection{Member Access} 649 651 650 It is also possible to access multiple fields from a single expression using a \ newterm{member-access}.652 It is also possible to access multiple fields from a single expression using a \emph{member-access}. 651 653 The result is a single tuple-valued expression whose type is the tuple of the types of the members, \eg: 652 654 \begin{lstlisting} … … 778 780 Matching against a @ttype@ parameter consumes all remaining argument components and packages them into a tuple, binding to the resulting tuple of types. 779 781 In a given parameter list, there must be at most one @ttype@ parameter that occurs last, which matches normal variadic semantics, with a strong feeling of similarity to \CCeleven variadic templates. 780 As such, @ttype@ variables are also called \ newterm{argument packs}.782 As such, @ttype@ variables are also called \emph{argument packs}. 781 783 782 784 Like variadic templates, the main way to manipulate @ttype@ polymorphic functions is via recursion. … … 850 852 \subsection{Implementation} 851 853 852 Tuples are implemented in the \CFA translator via a transformation into \ newterm{generic types}.854 Tuples are implemented in the \CFA translator via a transformation into \emph{generic types}. 853 855 For each $N$, the first time an $N$-tuple is seen in a scope a generic type with $N$ type parameters is generated, \eg: 854 856 \begin{lstlisting} … … 901 903 Similarly, tuple member expressions are recursively expanded into a list of member access expressions. 902 904 903 Expressions that may contain side effects are made into \ newterm{unique expressions} before being expanded by the flattening conversion.905 Expressions that may contain side effects are made into \emph{unique expressions} before being expanded by the flattening conversion. 904 906 Each unique expression is assigned an identifier and is guaranteed to be executed exactly once: 905 907 \begin{lstlisting} … … 1050 1052 \label{s:WithClauseStatement} 1051 1053 1052 Grouping heterogenous data into \newterm{aggregate}s (structure/union)is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers:1054 Grouping heterogenous data into \newterm{aggregate}s is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers: 1053 1055 \begin{cfa} 1054 struct S { $\C{// aggregate}$1055 char c; $\C{// fields}$1056 struct S { $\C{// aggregate}$ 1057 char c; $\C{// fields}$ 1056 1058 int i; 1057 1059 double d; … … 1059 1061 S s, as[10]; 1060 1062 \end{cfa} 1061 However, routines manipulating aggregates must repeatthe aggregate name to access its containing fields:1063 However, routines manipulating aggregates have repeition of the aggregate name to access its containing fields: 1062 1064 \begin{cfa} 1063 1065 void f( S s ) { 1064 `s.`c; `s.`i; `s.`d; $\C{// access containing fields}$1066 `s.`c; `s.`i; `s.`d; $\C{// access containing fields}$ 1065 1067 } 1066 1068 \end{cfa} … … 1068 1070 \begin{C++} 1069 1071 class C { 1070 char c; $\C{// fields}$1072 char c; $\C{// fields}$ 1071 1073 int i; 1072 1074 double d; 1073 int mem() { $\C{// implicit "this" parameter}$1074 `this->`c; `this->`i; `this->`d; $\C{// access containing fields}$1075 int mem() { $\C{// implicit "this" parameter}$ 1076 `this->`c; `this->`i; `this->`d;$\C{// access containing fields}$ 1075 1077 } 1076 1078 } 1077 1079 \end{C++} 1078 Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of lexical scoping. 1079 However, for other aggregate parameters, qualification is necessary: 1080 \begin{cfa} 1081 struct T { double m, n; }; 1082 int C::mem( T & t ) { $\C{// multiple aggregate parameters}$ 1083 c; i; d; $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$ 1084 `t.`m; `t.`n; $\C{// must qualify}$ 1085 } 1086 \end{cfa} 1080 Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of nested lexical-scoping. 1087 1081 1088 1082 % In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided. 1089 1083 % In any programming language, some functions have a naturally close relationship with a particular data type. 1090 % Object-oriented programming allows this close relationship to be codified in the language by making such functions \ newterm{class methods} of their related data type.1084 % Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type. 1091 1085 % Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type. 1092 1086 % When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code. … … 1094 1088 % \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.} 1095 1089 1096 To simplify the programmer experience, \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate qualification to fields by opening a scope containing thefield identifiers.1097 Hence, the qualified fields become variables with the side-effect that it is easier to optimizingfield references in a block.1090 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate qualification to fields by opening a scope containing field identifiers. 1091 Hence, the qualified fields become variables, and making it easier to optimize field references in a block. 1098 1092 \begin{cfa} 1099 void f( S s ) `with( s )` { $\C{// with clause}$1100 c; i; d; $\C{\color{red}// s.c, s.i, s.d}$1093 void f( S s ) `with( s )` { $\C{// with clause}$ 1094 c; i; d; $\C{\color{red}// s.c, s.i, s.d}$ 1101 1095 } 1102 1096 \end{cfa} … … 1104 1098 \begin{cfa} 1105 1099 int mem( S & this ) `with( this )` { $\C{// with clause}$ 1106 c; i; d; $\C{\color{red}// this.c, this.i, this.d}$1100 c; i; d; $\C{\color{red}// this.c, this.i, this.d}$ 1107 1101 } 1108 1102 \end{cfa} 1109 with the generality of opening multiple aggregate-parameters:1103 The key generality over the object-oriented approach is that one aggregate parameter \lstinline[language=C++]@this@ is not treated specially over other aggregate parameters: 1110 1104 \begin{cfa} 1105 struct T { double m, n; }; 1111 1106 int mem( S & s, T & t ) `with( s, t )` { $\C{// multiple aggregate parameters}$ 1112 c; i; d; $\C{\color{red}// s.c, s.i, s.d}$1113 m; n; $\C{\color{red}// t.m, t.n}$1107 c; i; d; $\C{\color{red}// s.c, s.i, s.d}$ 1108 m; n; $\C{\color{red}// t.m, t.n}$ 1114 1109 } 1115 1110 \end{cfa} 1116 1117 In detail, the @with@ clause/statement has the form: 1111 The equivalent object-oriented style is: 1118 1112 \begin{cfa} 1119 $\emph{with-statement}$: 1120 'with' '(' $\emph{expression-list}$ ')' $\emph{compound-statement}$ 1121 \end{cfa} 1122 and may appear as the body of a routine or nested within a routine body. 1123 Each expression in the expression-list provides a type and object. 1124 The type must be an aggregate type. 1125 (Enumerations are already opened.) 1126 The object is the implicit qualifier for the open structure-fields. 1127 1128 All expressions in the expression list are open in ``parallel'' within the compound statement. 1129 This semantic is different from Pascal, which nests the openings. 1130 The difference between parallel and nesting occurs for fields with the same name but different type: 1131 \begin{cfa} 1132 struct S { int i; int j; double m; } s, w; 1133 struct T { int i; int k; int m } t, w; 1134 with( s, t ) { 1135 j + k; $\C{// unambiguous, s.j + t.m}$ 1136 m = 5.0; $\C{// unambiguous, t.m = 5.0}$ 1137 m = 1; $\C{// unambiguous, s.m = 1}$ 1138 int a = s.i + m; $\C{// unambiguous, a = s.i + t.i}$ 1139 int b = s.i + t.i; $\C{// unambiguous, qualification}$ 1140 sout | (double)m | endl; $\C{// unambiguous, cast}$ 1141 i; $\C{// ambiguous}$ 1142 } 1143 \end{cfa} 1144 \CFA's ability to overload variables means usages of field with the same names can be automatically disambiguated, eliminating most qualification. 1145 Qualification or a cast is used to disambiguate. 1146 A cast may be necessary to disambiguate between the overload variables in a @with@ expression: 1147 \begin{cfa} 1148 with( w ) { ... } $\C{// ambiguous, same name and no context}$ 1149 with( (S)w ) { ... } $\C{// unambiguous}$ 1150 \end{cfa} 1151 1152 \begin{cfa} 1153 struct S { int i, j; } sv; 1154 with( sv ) { 1155 S & sr = sv; 1156 with( sr ) { 1157 S * sp = &sv; 1158 with( *sp ) { 1159 i = 3; j = 4; $\C{\color{red}// sp-{\textgreater}i, sp-{\textgreater}j}$ 1160 } 1161 i = 3; j = 4; $\C{\color{red}// sr.i, sr.j}$ 1162 } 1163 i = 3; j = 4; $\C{\color{red}// sv.i, sv.j}$ 1113 int S::mem( T & t ) { $\C{// multiple aggregate parameters}$ 1114 c; i; d; $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$ 1115 `t.`m; `t.`n; 1164 1116 } 1165 1117 \end{cfa} … … 1170 1122 struct S1 { ... } s1; 1171 1123 struct S2 { ... } s2; 1172 `with( s1 )` { $\C{// with statement}$1124 `with( s1 )` { $\C{// with statement}$ 1173 1125 // access fields of s1 without qualification 1174 `with( s2 )` { $\C{// nesting}$1126 `with( s2 )` { $\C{// nesting}$ 1175 1127 // access fields of s1 and s2 without qualification 1176 1128 } … … 1182 1134 \end{cfa} 1183 1135 1136 When opening multiple structures, fields with the same name and type are ambiguous and must be fully qualified. 1137 For fields with the same name but different type, context/cast can be used to disambiguate. 1138 \begin{cfa} 1139 struct S { int i; int j; double m; } a, c; 1140 struct T { int i; int k; int m } b, c; 1141 `with( a, b )` { 1142 j + k; $\C{// unambiguous, unique names define unique types}$ 1143 i; $\C{// ambiguous, same name and type}$ 1144 a.i + b.i; $\C{// unambiguous, qualification defines unique names}$ 1145 m; $\C{// ambiguous, same name and no context to define unique type}$ 1146 m = 5.0; $\C{// unambiguous, same name and context defines unique type}$ 1147 m = 1; $\C{// unambiguous, same name and context defines unique type}$ 1148 } 1149 `with( c )` { ... } $\C{// ambiguous, same name and no context}$ 1150 `with( (S)c )` { ... } $\C{// unambiguous, same name and cast defines unique type}$ 1151 \end{cfa} 1152 1153 The components in the "with" clause 1154 1155 with ( a, b, c ) { ... } 1156 1157 serve 2 purposes: each component provides a type and object. The type must be a 1158 structure type. Enumerations are already opened, and I think a union is opened 1159 to some extent, too. (Or is that just unnamed unions?) The object is the target 1160 that the naked structure-fields apply to. The components are open in "parallel" 1161 at the scope of the "with" clause/statement, so opening "a" does not affect 1162 opening "b", etc. This semantic is different from Pascal, which nests the 1163 openings. 1164 1165 Having said the above, it seems reasonable to allow a "with" component to be an 1166 expression. The type is the static expression-type and the object is the result 1167 of the expression. Again, the type must be an aggregate. Expressions require 1168 parenthesis around the components. 1169 1170 with( a, b, c ) { ... } 1171 1172 Does this now make sense? 1173 1174 Having written more CFA code, it is becoming clear to me that I *really* want 1175 the "with" to be implemented because I hate having to type all those object 1176 names for fields. It's a great way to drive people away from the language. 1177 1184 1178 1185 1179 \subsection{Exception Handling ???} … … 1196 1190 \subsection{Alternative Declaration Syntax} 1197 1191 1198 \newcommand{\R}[1]{\Textbf{#1}}1199 \newcommand{\B}[1]{{\Textbf[blue]{#1}}}1200 \newcommand{\G}[1]{{\Textbf[OliveGreen]{#1}}}1201 1202 C declaration syntax is notoriously confusing and error prone.1203 For example, many C programmers are confused by a declaration as simple as:1204 \begin{cquote}1205 \lstDeleteShortInline@%1206 \begin{tabular}{@{}ll@{}}1207 \begin{cfa}1208 int * x[5]1209 \end{cfa}1210 &1211 \raisebox{-0.75\totalheight}{\input{Cdecl}}1212 \end{tabular}1213 \lstMakeShortInline@%1214 \end{cquote}1215 Is this an array of 5 pointers to integers or a pointer to an array of 5 integers?1216 If there is any doubt, it implies productivity and safety issues even for basic programs.1217 Another example of confusion results from the fact that a routine name and its parameters are embedded within the return type, mimicking the way the return value is used at the routine's call site.1218 For example, a routine returning a pointer to an array of integers is defined and used in the following way:1219 \begin{cfa}1220 int `(*`f`())[`5`]` {...}; $\C{// definition}$1221 ... `(*`f`())[`3`]` += 1; $\C{// usage}$1222 \end{cfa}1223 Essentially, the return type is wrapped around the routine name in successive layers (like an onion).1224 While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice.1225 1226 \CFA provides its own type, variable and routine declarations, using a different syntax.1227 The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.1228 In the following example, \R{red} is the base type and \B{blue} is qualifiers.1229 The \CFA declarations move the qualifiers to the left of the base type, \ie move the blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type.1230 \begin{cquote}1231 \lstDeleteShortInline@%1232 \lstset{moredelim=**[is][\color{blue}]{+}{+}}1233 \begin{tabular}{@{}l@{\hspace{3em}}l@{}}1234 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\1235 \begin{cfa}1236 +[5] *+ `int` x1;1237 +* [5]+ `int` x2;1238 +[* [5] int]+ f`( int p )`;1239 \end{cfa}1240 &1241 \begin{cfa}1242 `int` +*+ x1 +[5]+;1243 `int` +(*+x2+)[5]+;1244 +int (*+f`( int p )`+)[5]+;1245 \end{cfa}1246 \end{tabular}1247 \lstMakeShortInline@%1248 \end{cquote}1249 The only exception is bit field specification, which always appear to the right of the base type.1250 % Specifically, the character @*@ is used to indicate a pointer, square brackets @[@\,@]@ are used to represent an array or function return value, and parentheses @()@ are used to indicate a routine parameter.1251 However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list.1252 For instance, variables @x@ and @y@ of type pointer to integer are defined in \CFA as follows:1253 \begin{cquote}1254 \lstDeleteShortInline@%1255 \begin{tabular}{@{}l@{\hspace{3em}}l@{}}1256 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\1257 \begin{cfa}1258 `*` int x, y;1259 \end{cfa}1260 &1261 \begin{cfa}1262 int `*`x, `*`y;1263 \end{cfa}1264 \end{tabular}1265 \lstMakeShortInline@%1266 \end{cquote}1267 The downside of this semantics is the need to separate regular and pointer declarations:1268 \begin{cquote}1269 \lstDeleteShortInline@%1270 \begin{tabular}{@{}l@{\hspace{3em}}l@{}}1271 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\1272 \begin{cfa}1273 `*` int x;1274 int y;1275 \end{cfa}1276 &1277 \begin{cfa}1278 int `*`x, y;1279 1280 \end{cfa}1281 \end{tabular}1282 \lstMakeShortInline@%1283 \end{cquote}1284 which is prescribing a safety benefit.1285 Other examples are:1286 \begin{cquote}1287 \lstDeleteShortInline@%1288 \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}}1289 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\1290 \begin{cfa}1291 [ 5 ] int z;1292 [ 5 ] * char w;1293 * [ 5 ] double v;1294 struct s {1295 int f0:3;1296 * int f1;1297 [ 5 ] * int f2;1298 };1299 \end{cfa}1300 &1301 \begin{cfa}1302 int z[ 5 ];1303 char * w[ 5 ];1304 double (* v)[ 5 ];1305 struct s {1306 int f0:3;1307 int * f1;1308 int * f2[ 5 ]1309 };1310 \end{cfa}1311 &1312 \begin{cfa}1313 // array of 5 integers1314 // array of 5 pointers to char1315 // pointer to array of 5 doubles1316 1317 // common bit field syntax1318 1319 1320 1321 \end{cfa}1322 \end{tabular}1323 \lstMakeShortInline@%1324 \end{cquote}1325 1326 All type qualifiers, \eg @const@, @volatile@, etc., are used in the normal way with the new declarations and also appear left to right, \eg:1327 \begin{cquote}1328 \lstDeleteShortInline@%1329 \begin{tabular}{@{}l@{\hspace{1em}}l@{\hspace{1em}}l@{}}1330 \multicolumn{1}{c@{\hspace{1em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{1em}}}{\textbf{C}} \\1331 \begin{cfa}1332 const * const int x;1333 const * [ 5 ] const int y;1334 \end{cfa}1335 &1336 \begin{cfa}1337 int const * const x;1338 const int (* const y)[ 5 ]1339 \end{cfa}1340 &1341 \begin{cfa}1342 // const pointer to const integer1343 // const pointer to array of 5 const integers1344 \end{cfa}1345 \end{tabular}1346 \lstMakeShortInline@%1347 \end{cquote}1348 All declaration qualifiers, \eg @extern@, @static@, etc., are used in the normal way with the new declarations but can only appear at the start of a \CFA routine declaration,\footnote{\label{StorageClassSpecifier}1349 The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.~\cite[\S~6.11.5(1)]{C11}} \eg:1350 \begin{cquote}1351 \lstDeleteShortInline@%1352 \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}}1353 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\1354 \begin{cfa}1355 extern [ 5 ] int x;1356 static * const int y;1357 \end{cfa}1358 &1359 \begin{cfa}1360 int extern x[ 5 ];1361 const int static * y;1362 \end{cfa}1363 &1364 \begin{cfa}1365 // externally visible array of 5 integers1366 // internally visible pointer to constant int1367 \end{cfa}1368 \end{tabular}1369 \lstMakeShortInline@%1370 \end{cquote}1371 1372 The new declaration syntax can be used in other contexts where types are required, \eg casts and the pseudo-routine @sizeof@:1373 \begin{cquote}1374 \lstDeleteShortInline@%1375 \begin{tabular}{@{}l@{\hspace{3em}}l@{}}1376 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\1377 \begin{cfa}1378 y = (* int)x;1379 i = sizeof([ 5 ] * int);1380 \end{cfa}1381 &1382 \begin{cfa}1383 y = (int *)x;1384 i = sizeof(int * [ 5 ]);1385 \end{cfa}1386 \end{tabular}1387 \lstMakeShortInline@%1388 \end{cquote}1389 1390 Finally, new \CFA declarations may appear together with C declarations in the same program block, but cannot be mixed within a specific declaration.1391 Therefore, a programmer has the option of either continuing to use traditional C declarations or take advantage of the new style.1392 Clearly, both styles need to be supported for some time due to existing C-style header-files, particularly for UNIX-like systems.1393 1394 1192 1395 1193 \subsection{References} 1396 1194 1397 All variables in C have an \newterm{address}, a \newterm{value}, and a \newterm{type}; 1398 at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type. 1399 The C type-system does not always track the relationship between a value and its address; 1400 a value that does not have a corresponding address is called a \newterm{rvalue} (for ``right-hand value''), while a value that does have an address is called a \newterm{lvalue} (for ``left-hand value''). 1401 For example, in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue. 1402 Despite the nomenclature of ``left-hand'' and ``right-hand'', an expression's classification as lvalue or rvalue is entirely dependent on whether it has an address or not; in imperative programming, the address of a value is used for both reading and writing (mutating) a value, and as such lvalues can be converted to rvalues and read from, but rvalues cannot be mutated because they lack a location to store the updated value. 1403 1404 Within a lexical scope, lvalue expressions have an \newterm{address interpretation} for writing a value or a \newterm{value interpretation} to read a value. 1405 For example, in @x = y@, @x@ has an address interpretation, while @y@ has a value interpretation. 1406 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \newterm{pointer types} to serve a similar purpose. 1407 In C, for any type @T@ there is a pointer type @T *@, the value of which is the address of a value of type @T@. 1408 A pointer rvalue can be explicitly \newterm{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@. 1195 All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type. 1196 The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue. 1197 Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses. 1198 Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation. 1199 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose. 1200 In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@. 1409 1201 1410 1202 \begin{cfa} 1411 1203 int x = 1, y = 2, * p1, * p2, ** p3; 1412 p1 = &x; $\C{// p1 points to x}$1413 p2 = &y; $\C{// p2 points to y}$1414 p3 = &p1; $\C{// p3 points to p1}$1204 p1 = &x; $\C{// p1 points to x}$ 1205 p2 = &y; $\C{// p2 points to y}$ 1206 p3 = &p1; $\C{// p3 points to p1}$ 1415 1207 *p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15); 1416 1208 \end{cfa} … … 1418 1210 Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs. 1419 1211 For both brevity and clarity, it would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as the assignment to @*p2@ above. 1420 However, since C defines a number of forms of \ newterm{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.1212 However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators. 1421 1213 To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below: 1422 1214 1423 1215 \begin{cfa} 1424 in tx = 1, y = 2, & r1, & r2, && r3;1216 inx x = 1, y = 2, & r1, & r2, && r3; 1425 1217 &r1 = &x; $\C{// r1 points to x}$ 1426 1218 &r2 = &y; $\C{// r2 points to y}$ … … 1444 1236 This allows \CFA references to be default-initialized (\eg to a null pointer), and also to point to different addresses throughout their lifetime. 1445 1237 This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C. 1446 1447 1238 In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address. 1448 1239 In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there. … … 1458 1249 if @L@ is an lvalue of type {@T &@$_1 \cdots$@ &@$_l$} where $l \ge 0$ references (@&@ symbols) then @&L@ has type {@T `*`&@$_{\color{red}1} \cdots$@ &@$_{\color{red}l}$}, \\ \ie @T@ pointer with $l$ references (@&@ symbols). 1459 1250 \end{itemize} 1251 1460 1252 Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience. 1461 1462 By analogy to pointers, \CFA references also allow cv-qualifiers such as @const@: 1253 By analogy to pointers, \CFA references also allow cv-qualifiers: 1463 1254 1464 1255 \begin{cfa} … … 1478 1269 1479 1270 More generally, this initialization of references from lvalues rather than pointers is an instance of a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; this conversion can actually be used in any context in \CFA an implicit conversion would be allowed. 1480 Similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion, and \CFA also includes a qualifier-adding ``reference-to-reference'' conversion, anal ogous to the @T *@ to @const T *@ conversion in standard C.1271 Similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion, and \CFA also includes a qualifier-adding ``reference-to-reference'' conversion, analagous to the @T *@ to @const T *@ conversion in standard C. 1481 1272 The final reference conversion included in \CFA is ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary. 1482 1273 When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary. 1483 1274 This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value. 1484 \CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \newterm{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers. 1485 1275 \CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers. 1486 1276 1487 1277 \subsection{Constructors and Destructors} … … 1489 1279 One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management. 1490 1280 However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory. 1491 \CC is well-known for an approach to manual memory management that addresses both these issues, Resource Aquisition Is Initialization (RAII), implemented by means of special \ newterm{constructor} and \newterm{destructor} functions; we have implemented a similar feature in \CFA.1281 \CC is well-known for an approach to manual memory management that addresses both these issues, Resource Aquisition Is Initialization (RAII), implemented by means of special \emph{constructor} and \emph{destructor} functions; we have implemented a similar feature in \CFA. 1492 1282 While RAII is a common feature of object-oriented programming languages, its inclusion in \CFA does not violate the design principle that \CFA retain the same procedural paradigm as C. 1493 1283 In particular, \CFA does not implement class-based encapsulation: neither the constructor nor any other function has privileged access to the implementation details of a type, except through the translation-unit-scope method of opaque structs provided by C. … … 1521 1311 \end{cfa} 1522 1312 1523 In the example above, a \ newterm{default constructor} (\ie one with no parameters besides the @this@ parameter) and destructor are defined for the @Array@ struct, a dynamic array of @int@.1524 @Array@ is an example of a \ newterm{managed type} in \CFA, a type with a non-trivial constructor or destructor, or with a field of a managed type.1313 In the example above, a \emph{default constructor} (\ie one with no parameters besides the @this@ parameter) and destructor are defined for the @Array@ struct, a dynamic array of @int@. 1314 @Array@ is an example of a \emph{managed type} in \CFA, a type with a non-trivial constructor or destructor, or with a field of a managed type. 1525 1315 As in the example, all instances of managed types are implicitly constructed upon allocation, and destructed upon deallocation; this ensures proper initialization and cleanup of resources contained in managed types, in this case the @data@ array on the heap. 1526 1316 The exact details of the placement of these implicit constructor and destructor calls are omitted here for brevity, the interested reader should consult \cite{Schluntz17}. 1527 1317 1528 1318 Constructor calls are intended to seamlessly integrate with existing C initialization syntax, providing a simple and familiar syntax to veteran C programmers and allowing constructor calls to be inserted into legacy C code with minimal code changes. 1529 As such, \CFA also provides syntax for \ newterm{copy initialization} and \newterm{initialization parameters}:1319 As such, \CFA also provides syntax for \emph{copy initialization} and \emph{initialization parameters}: 1530 1320 1531 1321 \begin{cfa} … … 1542 1332 In addition to initialization syntax, \CFA provides two ways to explicitly call constructors and destructors. 1543 1333 Explicit calls to constructors double as a placement syntax, useful for construction of member fields in user-defined constructors and reuse of large storage allocations. 1544 While the existing function-call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \ newterm{operator syntax} for both:1334 While the existing function-call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \emph{operator syntax} for both: 1545 1335 1546 1336 \begin{cfa} … … 1559 1349 For compatibility with C, a copy constructor from the first union member type is also defined. 1560 1350 For @struct@ types, each of the four functions are implicitly defined to call their corresponding functions on each member of the struct. 1561 To better simulate the behaviour of C initializers, a set of \ newterm{field constructors} is also generated for structures.1351 To better simulate the behaviour of C initializers, a set of \emph{field constructors} is also generated for structures. 1562 1352 A constructor is generated for each non-empty prefix of a structure's member-list which copy-constructs the members passed as parameters and default-constructs the remaining members. 1563 1353 To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all field constructors for that type are hidden from expression resolution; similarly, the generated default constructor is hidden upon declaration of any constructor. … … 1565 1355 1566 1356 In rare situations user programmers may not wish to have constructors and destructors called; in these cases, \CFA provides an ``escape hatch'' to not call them. 1567 If a variable is initialized using the syntax \lstinline|S x @= {}| it will be an \ newterm{unmanaged object}, and will not have constructors or destructors called.1357 If a variable is initialized using the syntax \lstinline|S x @= {}| it will be an \emph{unmanaged object}, and will not have constructors or destructors called. 1568 1358 Any C initializer can be the right-hand side of an \lstinline|@=| initializer, \eg \lstinline|Array a @= { 0, 0x0 }|, with the usual C initialization semantics. 1569 1359 In addition to the expressive power, \lstinline|@=| provides a simple path for migrating legacy C code to \CFA, by providing a mechanism to incrementally convert initializers; the \CFA design team decided to introduce a new syntax for this escape hatch because we believe that our RAII implementation will handle the vast majority of code in a desirable way, and we wished to maintain familiar syntax for this common case. … … 1574 1364 \section{Literals} 1575 1365 1576 C already includes limited polymorphism for literals -- @0@ can be either an integer or a pointer literal, depending on context, while the syntactic forms of literals of the various integer and floating-point types are very similar, differing from each other only in suffix.1577 In keeping with the general \CFA approach of adding features while respecting ``the C way'' of doing things, we have extended both C's polymorphic zero and typed literal syntax to interoperate with user-defined types, while maintaining a backwards-compatible semantics.1578 1366 1579 1367 \subsection{0/1} 1580 1368 1581 In C, @0@ has the special property that it is the only ``false'' value; by the standard, any value which compares equal to @0@ is false, while any value that compares unequal to @0@ is true. 1582 As such, an expression @x@ in any boolean context (such as the condition of an @if@ or @while@ statement, or the arguments to an @&&@, @||@, or ternary operator) can be rewritten as @x != 0@ without changing its semantics. 1583 The operator overloading feature of \CFA provides a natural means to implement this truth value comparison for arbitrary types, but the C type system is not precise enough to distinguish an equality comparison with @0@ from an equality comparison with an arbitrary integer or pointer. 1584 To provide this precision, \CFA introduces a new type @zero_t@ as type type of literal @0@ (somewhat analagous to @nullptr_t@ and @nullptr@ in \CCeleven); @zero_t@ can only take the value @0@, but has implicit conversions to the integer and pointer types so that standard C code involving @0@ continues to work properly. 1585 With this addition, the \CFA compiler rewrites @if (x)@ and similar expressions to @if ((x) != 0)@ or the appropriate analogue, and any type @T@ can be made ``truthy'' by defining an operator overload @int ?!=?(T, zero_t)@. 1586 \CC makes types truthy by adding a conversion to @bool@; prior to the addition of explicit cast operators in \CCeleven this approach had the pitfall of making truthy types transitively convertable to any numeric type; our design for \CFA avoids this issue. 1587 1588 \CFA also includes a special type for @1@, @one_t@; like @zero_t@, @one_t@ has built-in implicit conversions to the various integral types so that @1@ maintains its expected semantics in legacy code. 1589 The addition of @one_t@ allows generic algorithms to handle the unit value uniformly for types where that is meaningful. 1590 \TODO{Make this sentence true} In particular, polymorphic functions in the \CFA prelude define @++x@ and @x++@ in terms of @x += 1@, allowing users to idiomatically define all forms of increment for a type @T@ by defining the single function @T& ?+=(T&, one_t)@; analogous overloads for the decrement operators are present as well. 1369 \TODO{Some text already at the end of Section~\ref{sec:poly-fns}} 1370 1591 1371 1592 1372 \subsection{Units} … … 1617 1397 \end{cfa} 1618 1398 }% 1399 1619 1400 1620 1401 \section{Evaluation} … … 1792 1573 Finally, we demonstrate that \CFA performance for some idiomatic cases is better than C and close to \CC, showing the design is practically applicable. 1793 1574 1794 There is ongoing work on a wide range of \CFA feature extensions, including arrays with size, exceptions, concurrent primitives, modules, and user-defined conversions.1575 There is ongoing work on a wide range of \CFA feature extensions, including reference types, arrays with size, exceptions, concurrent primitives and modules. 1795 1576 (While all examples in the paper compile and run, a public beta-release of \CFA will take another 8--12 months to finalize these additional extensions.) 1796 1577 In addition, there are interesting future directions for the polymorphism design.
Note:
See TracChangeset
for help on using the changeset viewer.