Context Navigation

← Previous Change
Next Change →

Changeset 28bc8c8 for doc/papers/general

Timestamp:

Mar 8, 2018, 3:16:30 PM (8 years ago)

Author:

Aaron Moss <a3moss@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc

Children:

Parents:

Message:

Update evaluation section of paper

Location:

doc/papers/general

Files:

: 5 edited

Paper.tex (modified) (12 diffs)
evaluation/cfa-bench.c (modified) (1 diff)
evaluation/cfa-stack.c (modified) (2 diffs)
evaluation/timing.dat (modified) (1 diff)
evaluation/timing.xlsx (modified) ( previous)

Legend:

: Unmodified
: Added
: Removed

doc/papers/general/Paper.tex

-              rfb2ce27
+              r28bc8c8
 In fact, \CFA's features for generic programming can enable faster runtime execution than idiomatic @void *@-based C code.
 This claim is demonstrated through a set of generic-code-based micro-benchmarks in C, \CFA, and \CC (see stack implementations in Appendix~\ref{sec:BenchmarkStackImplementation}).
 Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks would show little runtime variance, other than in length and clarity of source code.
+Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks would show little runtime variance, differing only in length and clarity of source code.
 A more illustrative benchmark measures the costs of idiomatic usage of each language's features.
 Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list, a generic pair-data-structure, and a variadic @print@ function similar to that in Section~\ref{sec:variadic-tuples}.
+Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list.
 The benchmark test is similar for C and \CC.
 The experiment uses element types @int@ and @pair(_Bool, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, finds the maximum value in the other stack, and prints $N/2$ (to reduce graph height) constants.
+The experiment uses element types @int@ and @pair(short, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack.
 \begin{figure}
 \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
 int main( int argc, char * argv[] ) {
+int main() {
         int max = 0, val = 42;
         stack( int ) si, ti;
         REPEAT_TIMED( "push_int", N, push( si, val ); )
         TIMED( "copy_int", ti = si; )
+        TIMED( "copy_int", ti{ si }; )
         TIMED( "clear_int", clear( si ); )
         REPEAT_TIMED( "pop_int", N,
                 int x = pop( ti ); if ( x > max ) max = x; )
         pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' };
         stack( pair( _Bool, char ) ) sp, tp;
+        pair( short, char ) max = { 0h, '\0' }, val = { 42h, 'a' };
+        stack( pair( short, char ) ) sp, tp;
         REPEAT_TIMED( "push_pair", N, push( sp, val ); )
         TIMED( "copy_pair", tp = sp; )
+        TIMED( "copy_pair", tp{ sp }; )
         TIMED( "clear_pair", clear( sp ); )
         REPEAT_TIMED( "pop_pair", N,
                 pair(_Bool, char) x = pop( tp ); if ( x > max ) max = x; )
+                pair(short, char) x = pop( tp ); if ( x > max ) max = x; )
+}
 \end{cfa}
 …
 hence runtime checks are necessary to safely down-cast objects.
 The most notable difference among the implementations is in memory layout of generic types: \CFA and \CC inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV lack such a capability and instead must store generic objects via pointers to separately-allocated objects.
+For the print benchmark, idiomatic printing is used: the C and \CFA variants used @stdio.h@, while the \CC and \CCV variants used @iostream@; preliminary tests show this distinction has negligible runtime impact.
+Note, the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
+Note that the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents.
 The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted.
 All code is compiled at \texttt{-O2} by gcc or g++ 6.2.0, with all \CC code compiled as \CCfourteen.
+All code is compiled at \texttt{-O2} by gcc or g++ 6.3.0, with all \CC code compiled as \CCfourteen.
 The benchmarks are run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz maximum clock frequency.
 …
                                                                         & \CT{C}        & \CT{\CFA}     & \CT{\CC}      & \CT{\CCV}             \\ \hline
 maximum memory usage (MB)                       & 10001         & 2502          & 2503          & 11253                 \\
 source code size (lines)                        & 247           & 222           & 165           & 339                   \\
 redundant type annotations (lines)      & 39            & 2                     & 2                     & 15                    \\
 binary size (KB)                                        & 14            & 229           & 18            & 38                    \\
+source code size (lines)                        & 187           & 188           & 133           & 303                   \\
+redundant type annotations (lines)      & 25            & 0                     & 2                     & 16                    \\
+binary size (KB)                                        & 14            & 257           & 14            & 37                    \\
 \end{tabular}
 \end{table}
 The C and \CCV variants are generally the slowest with the largest memory footprint, because of their less-efficient memory layout and the pointer-indirection necessary to implement generic types;
 this inefficiency is exacerbated by the second level of generic types in the pair-based benchmarks.
 By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @_Bool@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.
+this inefficiency is exacerbated by the second level of generic types in the pair benchmarks.
+By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @short@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.
 \CCV is slower than C largely due to the cost of runtime type-checking of down-casts (implemented with @dynamic_cast@);
+There are two outliers in the graph for \CFA: all prints and pop of @pair@.
+Both of these cases result from the complexity of the C-generated polymorphic code, so that the gcc compiler is unable to optimize some dead code and condense nested calls.
+A compiler designed for \CFA could easily perform these optimizations.
+The outlier in the graph for \CFA, pop @pair@, results from the complexity of the generated-C polymorphic code.
+The gcc compiler is unable to optimize some dead code and condense nested calls; a compiler designed for \CFA could easily perform these optimizations.
 Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries.
 \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54 lines, respectively.
+\CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 41 and 42 lines, respectively.
 On the other hand, C does not have a generic collections-library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
 \CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;
+\CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @char@, @short@, and @int@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;
 with their omission, the \CCV line count is similar to C.
 We justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
 …
 Raw line-count, however, is a fairly rough measure of code complexity;
 another important factor is how much type information the programmer must manually specify, especially where that information is not checked by the compiler.
 Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointers arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@).
+Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointer arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@).
 To quantify this, the ``redundant type annotations'' line in Table~\ref{tab:eval} counts the number of lines on which the type of a known variable is re-specified, either as a format specifier, explicit downcast, type-specific function, or by name in a @sizeof@, struct literal, or @new@ expression.
 The \CC benchmark uses two redundant type annotations to create a new stack nodes, while the C and \CCV benchmarks have several such annotations spread throughout their code.
+The two instances in which the \CFA benchmark still uses redundant type specifiers are to cast the result of a polymorphic @malloc@ call (the @sizeof@ argument is inferred by the compiler).
+These uses are similar to the @new@ expressions in \CC, though the \CFA compiler's type resolver should shortly render even these type casts superfluous.
+The \CFA benchmark was able to eliminate all redundant type annotations through use of the polymorphic @alloc@ function discussed in Section~\ref{sec:libraries}.
 \section{Related Work}
 \subsection{Polymorphism}
 …
 \CFA
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
-forall(otype T) struct stack_node;
-forall(otype T) struct stack {
-        stack_node(T) * head;
-};
 forall(otype T) struct stack_node {
         T value;
         stack_node(T) * next;
 };
+forall(otype T) struct stack { stack_node(T) * head; };
 forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; }
 forall(otype T) void ?{}( stack(T) & s, stack(T) t ) {
         stack_node(T) ** crnt = &s.head;
         for ( stack_node(T) * next = t.head; next; next = next->next ) {
+                stack_node(T) * new_node = ((stack_node(T)*)malloc());
+                (*new_node){ next->value }; /***/
+                *crnt = new_node;
+                stack_node(T) * acrnt = *crnt;
+                crnt = &acrnt->next;
+                *crnt = alloc();
+                ((*crnt)->value){ next->value };
+                crnt = &(*crnt)->next;
+        }
         *crnt = 0;
 …
 forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; }
 forall(otype T) void push( stack(T) & s, T value ) {
         stack_node(T) * new_node = ((stack_node(T)*)malloc());
         (*new_node){ value, s.head }; /***/
         s.head = new_node;
+        stack_node(T) * n = alloc();
+        (*n){ value, head };
+        head = n;
+}
 forall(otype T) T pop( stack(T) & s ) {
+        stack_node(T) * n = s.head;
+        s.head = n->next;
+        T v = n->value;
+        delete( n );
+        return v;
+        stack_node(T) * n = head;
+        head = n->next;
+        T x = n->value;
+        ^(*n){};
+        free( n );
+        return x;
+}
 forall(otype T) void clear( stack(T) & s ) {
         for ( stack_node(T) * next = s.head; next; ) {
+        for ( stack_node(T) * next = head; next; ) {
                 stack_node(T) * crnt = next;
                 next = crnt->next;
+                delete( crnt );
+                ^(*crnt){};
+                free(crnt);
+        }
         s.head = 0;
+        head = 0;
+}
 \end{cfa}
 …
 \CC
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
 template<typename T> class stack {
+template<typename T> struct stack {
         struct node {
                 T value;
 …
         };
         node * head;
         void copy(const stack<T>& o) {
+        void copy(const stack<T> & o) {
                 node ** crnt = &head;
                 for ( node * next = o.head;; next; next = next->next ) {
 …
                 *crnt = nullptr;
+        }
-  public:
         stack() : head(nullptr) {}
         stack(const stack<T>& o) { copy(o); }
+        stack(const stack<T> & o) { copy(o); }
         stack(stack<T> && o) : head(o.head) { o.head = nullptr; }
         ~stack() { clear(); }
         stack & operator= (const stack<T>& o) {
+        stack & operator= (const stack<T> & o) {
                 if ( this == &o ) return *this;
                 clear();
 …
         struct stack_node * next;
 };
+struct stack { struct stack_node* head; };
 struct stack new_stack() { return (struct stack){ NULL }; /***/ }
 void copy_stack(struct stack * s, const struct stack * t, void * (*copy)(const void *)) {
 …
         for ( struct stack_node * next = t->head; next; next = next->next ) {
                 *crnt = malloc(sizeof(struct stack_node)); /***/
                 **crnt = (struct stack_node){ copy(next->value) }; /***/
+                (*crnt)->value = copy(next->value);
                 crnt = &(*crnt)->next;
+        }
         *crnt = 0;
+        *crnt = NULL;
+}
 _Bool stack_empty(const struct stack * s) { return s->head == NULL; }
 …
 \CCV
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+stack::node::node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {}
+void stack::copy(const stack & o) {
+        node ** crnt = &head;
+        for ( node * next = o.head; next; next = next->next ) {
+                *crnt = new node{ *next->value };
+                crnt = &(*crnt)->next;
+struct stack {
+        struct node {
+                ptr<object> value;
+                node* next;
+                node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {}
+        };
+        node* head;
+        void copy(const stack & o) {
+                node ** crnt = &head;
+                for ( node * next = o.head; next; next = next->next ) {
+                        *crnt = new node{ *next->value }; /***/
+                        crnt = &(*crnt)->next;
+                }
+                *crnt = nullptr;
+        }
+        *crnt = nullptr;
+}
+stack::stack() : head(nullptr) {}
+stack::stack(const stack & o) { copy(o); }
+stack::stack(stack && o) : head(o.head) { o.head = nullptr; }
+stack::~stack() { clear(); }
+stack & stack::operator= (const stack & o) {
+        if ( this == &o ) return *this;
+        clear();
+        copy(o);
+        return *this;
+}
+stack & stack::operator= (stack && o) {
+        if ( this == &o ) return *this;
+        head = o.head;
+        o.head = nullptr;
+        return *this;
+}
+bool stack::empty() const { return head == nullptr; }
+void stack::push(const object & value) { head = new node{ value, head }; /***/ }
+ptr<object> stack::pop() {
+        node * n = head;
+        head = n->next;
+        ptr<object> x = std::move(n->value);
+        delete n;
+        return x;
+}
+void stack::clear() {
+        for ( node * next = head; next; ) {
+                node * crnt = next;
+                next = crnt->next;
+                delete crnt;
+        stack() : head(nullptr) {}
+        stack(const stack & o) { copy(o); }
+        stack(stack && o) : head(o.head) { o.head = nullptr; }
+        ~stack() { clear(); }
+        stack & operator= (const stack & o) {
+                if ( this == &o ) return *this;
+                clear();
+                copy(o);
+                return *this;
+        }
+        head = nullptr;
+}
+        stack & operator= (stack && o) {
+                if ( this == &o ) return *this;
+                head = o.head;
+                o.head = nullptr;
+                return *this;
+        }
+        bool empty() const { return head == nullptr; }
+        void push(const object & value) { head = new node{ value, head }; /***/ }
+        ptr<object> pop() {
+                node * n = head;
+                head = n->next;
+                ptr<object> x = std::move(n->value);
+                delete n;
+                return x;
+        }
+        void clear() {
+                for ( node * next = head; next; ) {
+                        node * crnt = next;
+                        next = crnt->next;
+                        delete crnt;
+                }
+                head = nullptr;
+        }
+};
 \end{cfa}

doc/papers/general/evaluation/cfa-bench.c

rfb2ce27	r28bc8c8
3	3	#include "cfa-pair.h"
4	4
5		int main( ~~int argc, char * argv[]~~ ) {
	5	int main() {
6	6	int max = 0, val = 42;
7	7	stack( int ) si, ti;

doc/papers/general/evaluation/cfa-stack.c

-              rfb2ce27
+              r28bc8c8
         stack_node(T) ** crnt = &s.head;
         for ( stack_node(T) * next = t.head; next; next = next->next ) {
                 *crnt = malloc();
+                *crnt = alloc();
                 ((*crnt)->value){ next->value };
                 crnt = &(*crnt)->next;
 …
 forall(otype T) void push( stack(T) & s, T value ) with( s ) {
         stack_node(T)* n = malloc();
+        stack_node(T)* n = alloc();
         (*n){ value, head };
         head = n;

doc/papers/general/evaluation/timing.dat

-              rfb2ce27
+              r28bc8c8
 "400 million repetitions"       "C"     "\\CFA{}"       "\\CC{}"        "\\CC{obj}"
+"push\nint"     2976    2225    1522    3266
+"copy\nnt"      2932    7072    1526    3110
+"clear\nint"    1380    731     750     1488
+"pop\nint"      1444    1196    756     5156
+"push\npair"    3695    2257    953     6840
+"copy\npair"    6034    6650    994     7224
+"clear\npair"   2832    848     742     3297
+"pop\npair"     3009    5348    797     25235
+"push\nint"     3002    2459    1520    3305
+"copy\nint"     2985    2057    1521    3152
+"clear\nint"    1374    827     718     1469
+"pop\nint"      1416    1221    717     5467
+"push\npair"    4214    2752    946     6826
+"copy\npair"    6127    2105    993     7330
+"clear\npair"   2881    885     711     3564
+"pop\npair"     3046    5434    783     26538

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: