Context Navigation

← Previous Change
Next Change →

Changeset 86c934a for doc/papers

Timestamp:

Feb 6, 2018, 4:41:56 PM (8 years ago)

Author:

Rob Schluntz <rschlunt@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, stuck-waitfor-destruct, with_gc

Children:

834b892

Parents:

53d3ab4b (diff), 7d94d805 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:/u/cforall/software/cfa/cfa-cc

File:

: 1 edited

doc/papers/general/Paper.tex (modified) (6 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/papers/general/Paper.tex

-              r53d3ab4b
+              r86c934a
 \newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@basicstyle{\LstCommentStyle{#2}}}}
 \newcommand{\CRT}{\global\columnposn=\gcolumnposn}
+% Denote newterms in particular font and index them without particular font and in lowercase, e.g., \newterm{abc}.
+% The option parameter provides an index term different from the new term, e.g., \newterm[\texttt{abc}]{abc}
+% The star version does not lowercase the index information, e.g., \newterm*{IBM}.
+\newcommand{\newtermFontInline}{\emph}
+\newcommand{\newterm}{\@ifstar\@snewterm\@newterm}
+\newcommand{\@newterm}[2][\@empty]{\lowercase{\def\temp{#2}}{\newtermFontInline{#2}}\ifx#1\@empty\index{\temp}\else\index{#1@{\protect#2}}\fi}
+\newcommand{\@snewterm}[2][\@empty]{{\newtermFontInline{#2}}\ifx#1\@empty\index{#2}\else\index{#1@{\protect#2}}\fi}
 % Latin abbreviation
 …
 The implicit targets of the current @continue@ and @break@, \ie the closest enclosing loop or @switch@, change as certain constructs are added or removed.
+\TODO{choose and fallthrough here as well?}
 \subsection{\texorpdfstring{\LstKeywordStyle{with} Clause / Statement}{with Clause / Statement}}
 \label{s:WithClauseStatement}
+In any programming language, some functions have a naturally close relationship with a particular data type.
+Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
+Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
+When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
+\TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
+In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
+Grouping heterogenous data into \newterm{aggregate}s is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers:
+\begin{cfa}
+struct S {                                                              $\C{// aggregate}$
+        char c;                                                         $\C{// fields}$
+        int i;
+        double d;
+};
+S s, as[10];
+\end{cfa}
+However, routines manipulating aggregates have repeition of the aggregate name to access its containing fields:
+\begin{cfa}
+void f( S s ) {
+        `s.`c; `s.`i; `s.`d;                            $\C{// access containing fields}$
+}
+\end{cfa}
+A similar situation occurs in object-oriented programming, \eg \CC:
 \begin{C++}
 class C {
+        int i, j;
+        int mem() {                                     $\C{\color{red}// implicit "this" parameter}$
+                i = 1;                                  $\C{\color{red}// this-{\textgreater}i}$
+                j = 2;                                  $\C{\color{red}// this-{\textgreater}j}$
+        char c;                                                         $\C{// fields}$
+        int i;
+        double d;
+        int mem() {                                                     $\C{// implicit "this" parameter}$
+                `this->`c; `this->`i; `this->`d;$\C{// access containing fields}$
+        }
+}
 \end{C++}
+Since \CFA is non-object-oriented, the equivalent object-oriented program looks like:
+Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of nested lexical-scoping.
+% In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
+% In any programming language, some functions have a naturally close relationship with a particular data type.
+% Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
+% Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
+% When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
+%
+% \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
+\CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided aggregate qualification to fields by opening a scope containing field identifiers.
+Hence, the qualified fields become variables, and making it easier to optimizing field references in a block.
 \begin{cfa}
+struct S { int i, j; };
+int mem( S & `this` ) {                 $\C{// explicit "this" parameter}$
+        `this.`i = 1;                           $\C{// "this" is not elided}$
+        `this.`j = 2;
+void f( S s ) `with s` {                                $\C{// with clause}$
+        c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
+}
 \end{cfa}
+but it is cumbersome having to write "this." many times in a member.
+\CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided the "@this.@" by opening a scope containing field identifiers, changing the qualified fields into variables and giving an opportunity for optimizing qualified references.
+and the equivalence for object-style programming is:
 \begin{cfa}
+int mem( S &this ) `with this` { $\C{// with clause}$
+        i = 1;                                          $\C{\color{red}// this.i}$
+        j = 2;                                          $\C{\color{red}// this.j}$
+int mem( S & this ) `with this` {               $\C{// with clause}$
+        c; i; d;                                                        $\C{\color{red}// this.c, this.i, this.d}$
+}
 \end{cfa}
 which extends to multiple routine parameters:
+The key generality over the object-oriented approach is that one aggregate parameter \lstinline[language=C++]@this@ is not treated specially over other aggregate parameters:
 \begin{cfa}
 struct T { double m, n; };
+int mem2( S & this1, T & this2 ) `with this1, this2` {
+        i = 1; j = 2;
+        m = 1.0; n = 2.0;
+int mem( S & s, T & t ) `with s, t` {   $\C{// multiple aggregate parameters}$
+        c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
+        m; n;                                                           $\C{\color{red}// t.m, t.n}$
+}
+\end{cfa}
+The equivalent object-oriented style is:
+\begin{cfa}
+int S::mem( T & t ) {                                   $\C{// multiple aggregate parameters}$
+        c; i; d;                                                        $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$
+        `t.`m; `t.`n;
+}
 \end{cfa}
 …
         struct S1 { ... } s1;
         struct S2 { ... } s2;
         `with s1` {                                     $\C{// with statement}$
+        `with s1` {                                             $\C{// with statement}$
                 // access fields of s1 without qualification
                 `with s2` {                             $\C{// nesting}$
+                `with s2` {                                     $\C{// nesting}$
                         // access fields of s1 and s2 without qualification
+                }
 …
 struct T { int i; int k; int m } b, c;
 `with a, b` {
         j + k;                                          $\C{// unambiguous, unique names define unique types}$
         i;                                                      $\C{// ambiguous, same name and type}$
         a.i + b.i;                                      $\C{// unambiguous, qualification defines unique names}$
         m;                                                      $\C{// ambiguous, same name and no context to define unique type}$
         m = 5.0;                                        $\C{// unambiguous, same name and context defines unique type}$
         m = 1;                                          $\C{// unambiguous, same name and context defines unique type}$
+}
 `with c` { ... }                                $\C{// ambiguous, same name and no context}$
 `with (S)c` { ... }                             $\C{// unambiguous, same name and cast defines unique type}$
+        j + k;                                                  $\C{// unambiguous, unique names define unique types}$
+        i;                                                              $\C{// ambiguous, same name and type}$
+        a.i + b.i;                                              $\C{// unambiguous, qualification defines unique names}$
+        m;                                                              $\C{// ambiguous, same name and no context to define unique type}$
+        m = 5.0;                                                $\C{// unambiguous, same name and context defines unique type}$
+        m = 1;                                                  $\C{// unambiguous, same name and context defines unique type}$
+}
+`with c` { ... }                                        $\C{// ambiguous, same name and no context}$
+`with (S)c` { ... }                                     $\C{// unambiguous, same name and cast defines unique type}$
 \end{cfa}
 …
 \subsection{References}
+\TODO{Pull draft text from user manual; make sure to discuss nested references and rebind operator drawn from lvalue-addressof operator}
+All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
+The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
+Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
+Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
+Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
+In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
+\begin{cfa}
+int x = 1, y = 2, * p1, * p2, ** p3;
+p1 = &x;  $\C{// p1 points to x}$
+p2 = &y;  $\C{// p2 points to y}$
+p3 = &p1;  $\C{// p3 points to p1}$
+\end{cfa}
+Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
+It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity.
+However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
+To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below:
+\begin{cfa}
+inx x = 1, y = 2, & r1, & r2, && r3;
+&r1 = &x;  $\C{// r1 points to x}$
+&r2 = &y;  $\C{// r2 points to y}$
+&&r3 = &&r1;  $\C{// r3 points to r2}$
+r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15);  $\C{// implicit dereferencing}$
+\end{cfa}
+Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example.
+Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value.
+One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like:
+\begin{cfa}
+`*`r2 = ((`*`r1 + `*`r2) * (`**`r3 - `*`r1)) / (`**`r3 - 15);
+\end{cfa}
+References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
+Firstly, \CFA does not forbid references to references, unlike \CC.
+This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
+Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
+This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime.
+This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
+In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
+In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
+The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
+This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case).
+The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x@ or @(&(&`*`)`*`)r3 = &(&`*`)r1@ or even @(&`*`)r2 = (&`*`)`*`r3@ for @&r2 = &r3@.
+Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
+By analogy to pointers, \CFA references also allow cv-qualifiers:
+\begin{cfa}
+const int cx = 5;               $\C{// cannot change cx}$
+const int & cr = cx;    $\C{// cannot change cr's referred value}$
+&cr = &cx;                              $\C{// rebinding cr allowed}$
+cr = 7;                                 $\C{// ERROR, cannot change cr}$
+int & const rc = x;             $\C{// must be initialized, like in \CC}$
+&rc = &x;                               $\C{// ERROR, cannot rebind rc}$
+rc = 7;                                 $\C{// x now equal to 7}$
+\end{cfa}
+Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references.
+There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding.
+In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided.
+The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@.
+This initialization of references from lvalues rather than pointers can be considered a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion.
+\CFA includes one more reference conversion, an ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary.
+When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary.
+This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
+\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
 \subsection{Constructors and Destructors}
 …
 \subsection{0/1}
+\TODO{Some text already at the end of Section~\ref{sec:poly-fns}}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 86c934a for doc/papers

Legend:

doc/papers/general/Paper.tex

Download in other formats: