Changeset 86c934a for doc/papers/general


Ignore:
Timestamp:
Feb 6, 2018, 4:41:56 PM (6 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
834b892
Parents:
53d3ab4b (diff), 7d94d805 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:/u/cforall/software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Paper.tex

    r53d3ab4b r86c934a  
    6161\newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@basicstyle{\LstCommentStyle{#2}}}}
    6262\newcommand{\CRT}{\global\columnposn=\gcolumnposn}
     63
     64% Denote newterms in particular font and index them without particular font and in lowercase, e.g., \newterm{abc}.
     65% The option parameter provides an index term different from the new term, e.g., \newterm[\texttt{abc}]{abc}
     66% The star version does not lowercase the index information, e.g., \newterm*{IBM}.
     67\newcommand{\newtermFontInline}{\emph}
     68\newcommand{\newterm}{\@ifstar\@snewterm\@newterm}
     69\newcommand{\@newterm}[2][\@empty]{\lowercase{\def\temp{#2}}{\newtermFontInline{#2}}\ifx#1\@empty\index{\temp}\else\index{#1@{\protect#2}}\fi}
     70\newcommand{\@snewterm}[2][\@empty]{{\newtermFontInline{#2}}\ifx#1\@empty\index{#2}\else\index{#1@{\protect#2}}\fi}
    6371
    6472% Latin abbreviation
     
    10381046The implicit targets of the current @continue@ and @break@, \ie the closest enclosing loop or @switch@, change as certain constructs are added or removed.
    10391047
     1048\TODO{choose and fallthrough here as well?}
     1049
    10401050
    10411051\subsection{\texorpdfstring{\LstKeywordStyle{with} Clause / Statement}{with Clause / Statement}}
    10421052\label{s:WithClauseStatement}
    10431053
    1044 In any programming language, some functions have a naturally close relationship with a particular data type.
    1045 Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
    1046 Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
    1047 When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
    1048 
    1049 \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
    1050 
    1051 
    1052 In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
     1054Grouping heterogenous data into \newterm{aggregate}s is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers:
     1055\begin{cfa}
     1056struct S {                                                              $\C{// aggregate}$
     1057        char c;                                                         $\C{// fields}$
     1058        int i;
     1059        double d;
     1060};
     1061S s, as[10];
     1062\end{cfa}
     1063However, routines manipulating aggregates have repeition of the aggregate name to access its containing fields:
     1064\begin{cfa}
     1065void f( S s ) {
     1066        `s.`c; `s.`i; `s.`d;                            $\C{// access containing fields}$
     1067}
     1068\end{cfa}
     1069A similar situation occurs in object-oriented programming, \eg \CC:
    10531070\begin{C++}
    10541071class C {
    1055         int i, j;
    1056         int mem() {                                     $\C{\color{red}// implicit "this" parameter}$
    1057                 i = 1;                                  $\C{\color{red}// this-{\textgreater}i}$
    1058                 j = 2;                                  $\C{\color{red}// this-{\textgreater}j}$
     1072        char c;                                                         $\C{// fields}$
     1073        int i;
     1074        double d;
     1075        int mem() {                                                     $\C{// implicit "this" parameter}$
     1076                `this->`c; `this->`i; `this->`d;$\C{// access containing fields}$
    10591077        }
    10601078}
    10611079\end{C++}
    1062 Since \CFA is non-object-oriented, the equivalent object-oriented program looks like:
     1080Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of nested lexical-scoping.
     1081
     1082% In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
     1083% In any programming language, some functions have a naturally close relationship with a particular data type.
     1084% Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
     1085% Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
     1086% When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
     1087%
     1088% \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
     1089
     1090\CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided aggregate qualification to fields by opening a scope containing field identifiers.
     1091Hence, the qualified fields become variables, and making it easier to optimizing field references in a block.
    10631092\begin{cfa}
    1064 struct S { int i, j; };
    1065 int mem( S & `this` ) {                 $\C{// explicit "this" parameter}$
    1066         `this.`i = 1;                           $\C{// "this" is not elided}$
    1067         `this.`j = 2;
     1093void f( S s ) `with s` {                                $\C{// with clause}$
     1094        c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
    10681095}
    10691096\end{cfa}
    1070 but it is cumbersome having to write "this." many times in a member.
    1071 
    1072 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided the "@this.@" by opening a scope containing field identifiers, changing the qualified fields into variables and giving an opportunity for optimizing qualified references.
     1097and the equivalence for object-style programming is:
    10731098\begin{cfa}
    1074 int mem( S &this ) `with this` { $\C{// with clause}$
    1075         i = 1;                                          $\C{\color{red}// this.i}$
    1076         j = 2;                                          $\C{\color{red}// this.j}$
     1099int mem( S & this ) `with this` {               $\C{// with clause}$
     1100        c; i; d;                                                        $\C{\color{red}// this.c, this.i, this.d}$
    10771101}
    10781102\end{cfa}
    1079 which extends to multiple routine parameters:
     1103The key generality over the object-oriented approach is that one aggregate parameter \lstinline[language=C++]@this@ is not treated specially over other aggregate parameters:
    10801104\begin{cfa}
    10811105struct T { double m, n; };
    1082 int mem2( S & this1, T & this2 ) `with this1, this2` {
    1083         i = 1; j = 2;
    1084         m = 1.0; n = 2.0;
     1106int mem( S & s, T & t ) `with s, t` {   $\C{// multiple aggregate parameters}$
     1107        c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
     1108        m; n;                                                           $\C{\color{red}// t.m, t.n}$
     1109}
     1110\end{cfa}
     1111The equivalent object-oriented style is:
     1112\begin{cfa}
     1113int S::mem( T & t ) {                                   $\C{// multiple aggregate parameters}$
     1114        c; i; d;                                                        $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$
     1115        `t.`m; `t.`n;
    10851116}
    10861117\end{cfa}
     
    10911122        struct S1 { ... } s1;
    10921123        struct S2 { ... } s2;
    1093         `with s1` {                                     $\C{// with statement}$
     1124        `with s1` {                                             $\C{// with statement}$
    10941125                // access fields of s1 without qualification
    1095                 `with s2` {                             $\C{// nesting}$
     1126                `with s2` {                                     $\C{// nesting}$
    10961127                        // access fields of s1 and s2 without qualification
    10971128                }
     
    11091140struct T { int i; int k; int m } b, c;
    11101141`with a, b` {
    1111         j + k;                                          $\C{// unambiguous, unique names define unique types}$
    1112         i;                                                      $\C{// ambiguous, same name and type}$
    1113         a.i + b.i;                                      $\C{// unambiguous, qualification defines unique names}$
    1114         m;                                                      $\C{// ambiguous, same name and no context to define unique type}$
    1115         m = 5.0;                                        $\C{// unambiguous, same name and context defines unique type}$
    1116         m = 1;                                          $\C{// unambiguous, same name and context defines unique type}$
    1117 }
    1118 `with c` { ... }                                $\C{// ambiguous, same name and no context}$
    1119 `with (S)c` { ... }                             $\C{// unambiguous, same name and cast defines unique type}$
     1142        j + k;                                                  $\C{// unambiguous, unique names define unique types}$
     1143        i;                                                              $\C{// ambiguous, same name and type}$
     1144        a.i + b.i;                                              $\C{// unambiguous, qualification defines unique names}$
     1145        m;                                                              $\C{// ambiguous, same name and no context to define unique type}$
     1146        m = 5.0;                                                $\C{// unambiguous, same name and context defines unique type}$
     1147        m = 1;                                                  $\C{// unambiguous, same name and context defines unique type}$
     1148}
     1149`with c` { ... }                                        $\C{// ambiguous, same name and no context}$
     1150`with (S)c` { ... }                                     $\C{// unambiguous, same name and cast defines unique type}$
    11201151\end{cfa}
    11211152
     
    11621193\subsection{References}
    11631194
    1164 \TODO{Pull draft text from user manual; make sure to discuss nested references and rebind operator drawn from lvalue-addressof operator}
    1165 
     1195All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
     1196The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
     1197Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
     1198Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
     1199Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
     1200In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
     1201
     1202\begin{cfa}
     1203int x = 1, y = 2, * p1, * p2, ** p3;
     1204p1 = &x;  $\C{// p1 points to x}$
     1205p2 = &y;  $\C{// p2 points to y}$
     1206p3 = &p1;  $\C{// p3 points to p1}$
     1207\end{cfa}
     1208
     1209Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
     1210It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity.
     1211However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
     1212To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below:
     1213
     1214\begin{cfa}
     1215inx x = 1, y = 2, & r1, & r2, && r3;
     1216&r1 = &x;  $\C{// r1 points to x}$
     1217&r2 = &y;  $\C{// r2 points to y}$
     1218&&r3 = &&r1;  $\C{// r3 points to r2}$
     1219r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15);  $\C{// implicit dereferencing}$
     1220\end{cfa}
     1221
     1222Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example.
     1223Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value.
     1224One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like:
     1225
     1226\begin{cfa}
     1227`*`r2 = ((`*`r1 + `*`r2) * (`**`r3 - `*`r1)) / (`**`r3 - 15);
     1228\end{cfa}
     1229
     1230References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
     1231Firstly, \CFA does not forbid references to references, unlike \CC.
     1232This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
     1233
     1234Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
     1235This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime.
     1236This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
     1237In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
     1238In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
     1239The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
     1240This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case).
     1241The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x@ or @(&(&`*`)`*`)r3 = &(&`*`)r1@ or even @(&`*`)r2 = (&`*`)`*`r3@ for @&r2 = &r3@.
     1242
     1243Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
     1244By analogy to pointers, \CFA references also allow cv-qualifiers:
     1245
     1246\begin{cfa}
     1247const int cx = 5;               $\C{// cannot change cx}$
     1248const int & cr = cx;    $\C{// cannot change cr's referred value}$
     1249&cr = &cx;                              $\C{// rebinding cr allowed}$
     1250cr = 7;                                 $\C{// ERROR, cannot change cr}$
     1251int & const rc = x;             $\C{// must be initialized, like in \CC}$
     1252&rc = &x;                               $\C{// ERROR, cannot rebind rc}$
     1253rc = 7;                                 $\C{// x now equal to 7}$
     1254\end{cfa}
     1255
     1256Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references.
     1257There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding.
     1258In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided.
     1259The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@.
     1260
     1261This initialization of references from lvalues rather than pointers can be considered a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion.
     1262\CFA includes one more reference conversion, an ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary.
     1263When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary.
     1264This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
     1265\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
    11661266
    11671267\subsection{Constructors and Destructors}
     
    11811281
    11821282\subsection{0/1}
     1283
     1284\TODO{Some text already at the end of Section~\ref{sec:poly-fns}}
    11831285
    11841286
Note: See TracChangeset for help on using the changeset viewer.