Changeset 86c934a for doc/papers
- Timestamp:
- Feb 6, 2018, 4:41:56 PM (5 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 834b892
- Parents:
- 53d3ab4b (diff), 7d94d805 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/general/Paper.tex
r53d3ab4b r86c934a 61 61 \newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@basicstyle{\LstCommentStyle{#2}}}} 62 62 \newcommand{\CRT}{\global\columnposn=\gcolumnposn} 63 64 % Denote newterms in particular font and index them without particular font and in lowercase, e.g., \newterm{abc}. 65 % The option parameter provides an index term different from the new term, e.g., \newterm[\texttt{abc}]{abc} 66 % The star version does not lowercase the index information, e.g., \newterm*{IBM}. 67 \newcommand{\newtermFontInline}{\emph} 68 \newcommand{\newterm}{\@ifstar\@snewterm\@newterm} 69 \newcommand{\@newterm}[2][\@empty]{\lowercase{\def\temp{#2}}{\newtermFontInline{#2}}\ifx#1\@empty\index{\temp}\else\index{#1@{\protect#2}}\fi} 70 \newcommand{\@snewterm}[2][\@empty]{{\newtermFontInline{#2}}\ifx#1\@empty\index{#2}\else\index{#1@{\protect#2}}\fi} 63 71 64 72 % Latin abbreviation … … 1038 1046 The implicit targets of the current @continue@ and @break@, \ie the closest enclosing loop or @switch@, change as certain constructs are added or removed. 1039 1047 1048 \TODO{choose and fallthrough here as well?} 1049 1040 1050 1041 1051 \subsection{\texorpdfstring{\LstKeywordStyle{with} Clause / Statement}{with Clause / Statement}} 1042 1052 \label{s:WithClauseStatement} 1043 1053 1044 In any programming language, some functions have a naturally close relationship with a particular data type. 1045 Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type. 1046 Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type. 1047 When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code. 1048 1049 \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.} 1050 1051 1052 In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided. 1054 Grouping heterogenous data into \newterm{aggregate}s is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers: 1055 \begin{cfa} 1056 struct S { $\C{// aggregate}$ 1057 char c; $\C{// fields}$ 1058 int i; 1059 double d; 1060 }; 1061 S s, as[10]; 1062 \end{cfa} 1063 However, routines manipulating aggregates have repeition of the aggregate name to access its containing fields: 1064 \begin{cfa} 1065 void f( S s ) { 1066 `s.`c; `s.`i; `s.`d; $\C{// access containing fields}$ 1067 } 1068 \end{cfa} 1069 A similar situation occurs in object-oriented programming, \eg \CC: 1053 1070 \begin{C++} 1054 1071 class C { 1055 int i, j; 1056 int mem() { $\C{\color{red}// implicit "this" parameter}$ 1057 i = 1; $\C{\color{red}// this-{\textgreater}i}$ 1058 j = 2; $\C{\color{red}// this-{\textgreater}j}$ 1072 char c; $\C{// fields}$ 1073 int i; 1074 double d; 1075 int mem() { $\C{// implicit "this" parameter}$ 1076 `this->`c; `this->`i; `this->`d;$\C{// access containing fields}$ 1059 1077 } 1060 1078 } 1061 1079 \end{C++} 1062 Since \CFA is non-object-oriented, the equivalent object-oriented program looks like: 1080 Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of nested lexical-scoping. 1081 1082 % In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided. 1083 % In any programming language, some functions have a naturally close relationship with a particular data type. 1084 % Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type. 1085 % Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type. 1086 % When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code. 1087 % 1088 % \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.} 1089 1090 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided aggregate qualification to fields by opening a scope containing field identifiers. 1091 Hence, the qualified fields become variables, and making it easier to optimizing field references in a block. 1063 1092 \begin{cfa} 1064 struct S { int i, j; }; 1065 int mem( S & `this` ) { $\C{// explicit "this" parameter}$ 1066 `this.`i = 1; $\C{// "this" is not elided}$ 1067 `this.`j = 2; 1093 void f( S s ) `with s` { $\C{// with clause}$ 1094 c; i; d; $\C{\color{red}// s.c, s.i, s.d}$ 1068 1095 } 1069 1096 \end{cfa} 1070 but it is cumbersome having to write "this." many times in a member. 1071 1072 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided the "@this.@" by opening a scope containing field identifiers, changing the qualified fields into variables and giving an opportunity for optimizing qualified references. 1097 and the equivalence for object-style programming is: 1073 1098 \begin{cfa} 1074 int mem( S &this ) `with this` { $\C{// with clause}$ 1075 i = 1; $\C{\color{red}// this.i}$ 1076 j = 2; $\C{\color{red}// this.j}$ 1099 int mem( S & this ) `with this` { $\C{// with clause}$ 1100 c; i; d; $\C{\color{red}// this.c, this.i, this.d}$ 1077 1101 } 1078 1102 \end{cfa} 1079 which extends to multiple routine parameters:1103 The key generality over the object-oriented approach is that one aggregate parameter \lstinline[language=C++]@this@ is not treated specially over other aggregate parameters: 1080 1104 \begin{cfa} 1081 1105 struct T { double m, n; }; 1082 int mem2( S & this1, T & this2 ) `with this1, this2` { 1083 i = 1; j = 2; 1084 m = 1.0; n = 2.0; 1106 int mem( S & s, T & t ) `with s, t` { $\C{// multiple aggregate parameters}$ 1107 c; i; d; $\C{\color{red}// s.c, s.i, s.d}$ 1108 m; n; $\C{\color{red}// t.m, t.n}$ 1109 } 1110 \end{cfa} 1111 The equivalent object-oriented style is: 1112 \begin{cfa} 1113 int S::mem( T & t ) { $\C{// multiple aggregate parameters}$ 1114 c; i; d; $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$ 1115 `t.`m; `t.`n; 1085 1116 } 1086 1117 \end{cfa} … … 1091 1122 struct S1 { ... } s1; 1092 1123 struct S2 { ... } s2; 1093 `with s1` { $\C{// with statement}$1124 `with s1` { $\C{// with statement}$ 1094 1125 // access fields of s1 without qualification 1095 `with s2` { $\C{// nesting}$1126 `with s2` { $\C{// nesting}$ 1096 1127 // access fields of s1 and s2 without qualification 1097 1128 } … … 1109 1140 struct T { int i; int k; int m } b, c; 1110 1141 `with a, b` { 1111 j + k; $\C{// unambiguous, unique names define unique types}$1112 i; $\C{// ambiguous, same name and type}$1113 a.i + b.i; $\C{// unambiguous, qualification defines unique names}$1114 m; $\C{// ambiguous, same name and no context to define unique type}$1115 m = 5.0; $\C{// unambiguous, same name and context defines unique type}$1116 m = 1; $\C{// unambiguous, same name and context defines unique type}$1117 } 1118 `with c` { ... } $\C{// ambiguous, same name and no context}$1119 `with (S)c` { ... } $\C{// unambiguous, same name and cast defines unique type}$1142 j + k; $\C{// unambiguous, unique names define unique types}$ 1143 i; $\C{// ambiguous, same name and type}$ 1144 a.i + b.i; $\C{// unambiguous, qualification defines unique names}$ 1145 m; $\C{// ambiguous, same name and no context to define unique type}$ 1146 m = 5.0; $\C{// unambiguous, same name and context defines unique type}$ 1147 m = 1; $\C{// unambiguous, same name and context defines unique type}$ 1148 } 1149 `with c` { ... } $\C{// ambiguous, same name and no context}$ 1150 `with (S)c` { ... } $\C{// unambiguous, same name and cast defines unique type}$ 1120 1151 \end{cfa} 1121 1152 … … 1162 1193 \subsection{References} 1163 1194 1164 \TODO{Pull draft text from user manual; make sure to discuss nested references and rebind operator drawn from lvalue-addressof operator} 1165 1195 All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type. 1196 The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue. 1197 Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses. 1198 Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation. 1199 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose. 1200 In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@. 1201 1202 \begin{cfa} 1203 int x = 1, y = 2, * p1, * p2, ** p3; 1204 p1 = &x; $\C{// p1 points to x}$ 1205 p2 = &y; $\C{// p2 points to y}$ 1206 p3 = &p1; $\C{// p3 points to p1}$ 1207 \end{cfa} 1208 1209 Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs. 1210 It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity. 1211 However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators. 1212 To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below: 1213 1214 \begin{cfa} 1215 inx x = 1, y = 2, & r1, & r2, && r3; 1216 &r1 = &x; $\C{// r1 points to x}$ 1217 &r2 = &y; $\C{// r2 points to y}$ 1218 &&r3 = &&r1; $\C{// r3 points to r2}$ 1219 r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15); $\C{// implicit dereferencing}$ 1220 \end{cfa} 1221 1222 Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example. 1223 Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value. 1224 One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like: 1225 1226 \begin{cfa} 1227 `*`r2 = ((`*`r1 + `*`r2) * (`**`r3 - `*`r1)) / (`**`r3 - 15); 1228 \end{cfa} 1229 1230 References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above. 1231 Firstly, \CFA does not forbid references to references, unlike \CC. 1232 This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@. 1233 1234 Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable. 1235 This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime. 1236 This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C. 1237 In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address. 1238 In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there. 1239 The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above. 1240 This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case). 1241 The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x@ or @(&(&`*`)`*`)r3 = &(&`*`)r1@ or even @(&`*`)r2 = (&`*`)`*`r3@ for @&r2 = &r3@. 1242 1243 Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience. 1244 By analogy to pointers, \CFA references also allow cv-qualifiers: 1245 1246 \begin{cfa} 1247 const int cx = 5; $\C{// cannot change cx}$ 1248 const int & cr = cx; $\C{// cannot change cr's referred value}$ 1249 &cr = &cx; $\C{// rebinding cr allowed}$ 1250 cr = 7; $\C{// ERROR, cannot change cr}$ 1251 int & const rc = x; $\C{// must be initialized, like in \CC}$ 1252 &rc = &x; $\C{// ERROR, cannot rebind rc}$ 1253 rc = 7; $\C{// x now equal to 7}$ 1254 \end{cfa} 1255 1256 Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references. 1257 There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding. 1258 In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided. 1259 The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@. 1260 1261 This initialization of references from lvalues rather than pointers can be considered a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion. 1262 \CFA includes one more reference conversion, an ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary. 1263 When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary. 1264 This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value. 1265 \CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers. 1166 1266 1167 1267 \subsection{Constructors and Destructors} … … 1181 1281 1182 1282 \subsection{0/1} 1283 1284 \TODO{Some text already at the end of Section~\ref{sec:poly-fns}} 1183 1285 1184 1286
Note: See TracChangeset
for help on using the changeset viewer.