Changeset 10f8142 for doc/papers/general


Ignore:
Timestamp:
Feb 6, 2018, 5:52:57 PM (7 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
695571c
Parents:
834b892
Message:

Fix some typos and with-statement syntax in paper

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Paper.tex

    r834b892 r10f8142  
    10811081
    10821082% In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
    1083 % In any programming language, some functions have a naturally close relationship with a particular data type. 
     1083% In any programming language, some functions have a naturally close relationship with a particular data type.
    10841084% Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
    1085 % Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type. 
    1086 % When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code. 
    1087 % 
     1085% Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
     1086% When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
     1087%
    10881088% \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
    10891089
    1090 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided aggregate qualification to fields by opening a scope containing field identifiers.
    1091 Hence, the qualified fields become variables, and making it easier to optimizing field references in a block.
     1090\CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate qualification to fields by opening a scope containing field identifiers.
     1091Hence, the qualified fields become variables, and making it easier to optimize field references in a block.
    10921092\begin{cfa}
    1093 void f( S s ) `with s` {                                $\C{// with clause}$
     1093void f( S s ) `with( s )` {                             $\C{// with clause}$
    10941094        c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
    10951095}
     
    10971097and the equivalence for object-style programming is:
    10981098\begin{cfa}
    1099 int mem( S & this ) `with this` {               $\C{// with clause}$
     1099int mem( S & this ) `with( this )` {            $\C{// with clause}$
    11001100        c; i; d;                                                        $\C{\color{red}// this.c, this.i, this.d}$
    11011101}
     
    11041104\begin{cfa}
    11051105struct T { double m, n; };
    1106 int mem( S & s, T & t ) `with s, t` {   $\C{// multiple aggregate parameters}$
     1106int mem( S & s, T & t ) `with( s, t )` {        $\C{// multiple aggregate parameters}$
    11071107        c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
    11081108        m; n;                                                           $\C{\color{red}// t.m, t.n}$
     
    11221122        struct S1 { ... } s1;
    11231123        struct S2 { ... } s2;
    1124         `with s1` {                                             $\C{// with statement}$
     1124        `with( s1 )` {                                          $\C{// with statement}$
    11251125                // access fields of s1 without qualification
    1126                 `with s2` {                                     $\C{// nesting}$
     1126                `with( s2 )` {                                  $\C{// nesting}$
    11271127                        // access fields of s1 and s2 without qualification
    11281128                }
    11291129        }
    1130         `with s1, s2` {
     1130        `with( s1, s2 )` {
    11311131                // access unambiguous fields of s1 and s2 without qualification
    11321132        }
     
    11391139struct S { int i; int j; double m; } a, c;
    11401140struct T { int i; int k; int m } b, c;
    1141 `with a, b` {
     1141`with( a, b )` {
    11421142        j + k;                                                  $\C{// unambiguous, unique names define unique types}$
    11431143        i;                                                              $\C{// ambiguous, same name and type}$
     
    11471147        m = 1;                                                  $\C{// unambiguous, same name and context defines unique type}$
    11481148}
    1149 `with c` { ... }                                        $\C{// ambiguous, same name and no context}$
    1150 `with (S)c` { ... }                                     $\C{// unambiguous, same name and cast defines unique type}$
     1149`with( c )` { ... }                                     $\C{// ambiguous, same name and no context}$
     1150`with( (S)c )` { ... }                                  $\C{// unambiguous, same name and cast defines unique type}$
    11511151\end{cfa}
    11521152
    11531153The components in the "with" clause
    11541154
    1155   with a, b, c { ... }
     1155  with ( a, b, c ) { ... }
    11561156
    11571157serve 2 purposes: each component provides a type and object. The type must be a
     
    11821182\section{Declarations}
    11831183
    1184 It is important to the design team that \CFA subjectively ``feel like'' C to user programmers. 
    1185 An important part of this subjective feel is maintaining C's procedural programming paradigm, as opposed to the object-oriented paradigm of other systems languages such as \CC and Rust. 
    1186 Maintaining this procedural paradigm means that coding patterns that work in C will remain not only functional but idiomatic in \CFA, reducing the mental burden of retraining C programmers and switching between C and \CFA development. 
     1184It is important to the design team that \CFA subjectively ``feel like'' C to user programmers.
     1185An important part of this subjective feel is maintaining C's procedural programming paradigm, as opposed to the object-oriented paradigm of other systems languages such as \CC and Rust.
     1186Maintaining this procedural paradigm means that coding patterns that work in C will remain not only functional but idiomatic in \CFA, reducing the mental burden of retraining C programmers and switching between C and \CFA development.
    11871187Nonetheless, some features of object-oriented languages are undeniably convienient, and the \CFA design team has attempted to adapt them to a procedural paradigm so as to incorporate their benefits into \CFA; two of these features are resource management and name scoping.
    11881188
     
    11931193\subsection{References}
    11941194
    1195 All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type. 
     1195All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
    11961196The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
    1197 Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses. 
     1197Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
    11981198Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
    1199 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose. 
     1199Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
    12001200In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
    12011201
     
    12071207\end{cfa}
    12081208
    1209 Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs. 
     1209Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
    12101210It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity.
    12111211However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
     
    12201220\end{cfa}
    12211221
    1222 Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example. 
    1223 Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value. 
     1222Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example.
     1223Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value.
    12241224One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like:
    12251225
     
    12281228\end{cfa}
    12291229
    1230 References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above. 
    1231 Firstly, \CFA does not forbid references to references, unlike \CC. 
    1232 This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@. 
    1233 
    1234 Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable. 
    1235 This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime. 
    1236 This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C. 
    1237 In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address. 
    1238 In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there. 
    1239 The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above. 
     1230References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
     1231Firstly, \CFA does not forbid references to references, unlike \CC.
     1232This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
     1233
     1234Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
     1235This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime.
     1236This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
     1237In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
     1238In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
     1239The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
    12401240This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case).
    12411241The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x@ or @(&(&`*`)`*`)r3 = &(&`*`)r1@ or even @(&`*`)r2 = (&`*`)`*`r3@ for @&r2 = &r3@.
    12421242
    1243 Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience. 
     1243Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
    12441244By analogy to pointers, \CFA references also allow cv-qualifiers:
    12451245
     
    12541254\end{cfa}
    12551255
    1256 Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references. 
    1257 There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding. 
    1258 In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided. 
    1259 The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@. 
    1260 
    1261 This initialization of references from lvalues rather than pointers can be considered a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion. 
    1262 \CFA includes one more reference conversion, an ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary. 
    1263 When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary. 
    1264 This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value. 
     1256Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references.
     1257There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding.
     1258In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided.
     1259The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@.
     1260
     1261This initialization of references from lvalues rather than pointers can be considered a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion.
     1262\CFA includes one more reference conversion, an ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary.
     1263When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary.
     1264This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
    12651265\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
    12661266
    12671267\subsection{Constructors and Destructors}
    12681268
    1269 One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management. 
    1270 However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory. 
     1269One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management.
     1270However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
    12711271\CC is well-known for an approach to manual memory management that addresses both these issues, Resource Allocation Is Initialization (RAII), implemented by means of special \emph{constructor} and \emph{destructor} functions; we have implemented a similar feature in \CFA.
    12721272
     
    13361336        TIMED( "copy_int", ti = si; )
    13371337        TIMED( "clear_int", clear( &si ); )
    1338         REPEAT_TIMED( "pop_int", N, 
     1338        REPEAT_TIMED( "pop_int", N,
    13391339                int xi = pop( &ti ); if ( xi > maxi ) { maxi = xi; } )
    13401340        REPEAT_TIMED( "print_int", N/2, print( out, vali, ":", vali, "\n" ); )
     
    13461346        TIMED( "copy_pair", tp = sp; )
    13471347        TIMED( "clear_pair", clear( &sp ); )
    1348         REPEAT_TIMED( "pop_pair", N, 
     1348        REPEAT_TIMED( "pop_pair", N,
    13491349                pair(_Bool, char) xp = pop( &tp ); if ( xp > maxp ) { maxp = xp; } )
    13501350        REPEAT_TIMED( "print_pair", N/2, print( out, valp, ":", valp, "\n" ); )
     
    13631363Note, the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
    13641364
    1365 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents. 
     1365Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents.
    13661366The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted.
    13671367All code is compiled at \texttt{-O2} by GCC or G++ 6.2.0, with all \CC code compiled as \CCfourteen.
     
    13971397Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries.
    13981398
    1399 \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54 lines, respectively. 
     1399\CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54 lines, respectively.
    14001400On the other hand, C does not have a generic collections-library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
    1401 \CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library; 
     1401\CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;
    14021402with their omission, the \CCV line count is similar to C.
    14031403We justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
     
    14921492In addition, there are interesting future directions for the polymorphism design.
    14931493Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions.
    1494 \CFA polymorphic functions use dynamic virtual-dispatch; 
     1494\CFA polymorphic functions use dynamic virtual-dispatch;
    14951495the runtime overhead of this approach is low, but not as low as inlining, and it may be beneficial to provide a mechanism for performance-sensitive code.
    14961496Two promising approaches are an @inline@ annotation at polymorphic function call sites to create a template-specialization of the function (provided the code is visible) or placing an @inline@ annotation on polymorphic function-definitions to instantiate a specialized version for some set of types (\CC template specialization).
Note: See TracChangeset for help on using the changeset viewer.