Changeset 28bc8c8 for doc/papers/general
- Timestamp:
- Mar 8, 2018, 3:16:30 PM (7 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 9d6f011
- Parents:
- fb2ce27
- Location:
- doc/papers/general
- Files:
-
- 5 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/general/Paper.tex
rfb2ce27 r28bc8c8 2547 2547 In fact, \CFA's features for generic programming can enable faster runtime execution than idiomatic @void *@-based C code. 2548 2548 This claim is demonstrated through a set of generic-code-based micro-benchmarks in C, \CFA, and \CC (see stack implementations in Appendix~\ref{sec:BenchmarkStackImplementation}). 2549 Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks would show little runtime variance, other thanin length and clarity of source code.2549 Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks would show little runtime variance, differing only in length and clarity of source code. 2550 2550 A more illustrative benchmark measures the costs of idiomatic usage of each language's features. 2551 Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list , a generic pair-data-structure, and a variadic @print@ function similar to that in Section~\ref{sec:variadic-tuples}.2551 Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list. 2552 2552 The benchmark test is similar for C and \CC. 2553 The experiment uses element types @int@ and @pair( _Bool, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, finds the maximum value in the other stack, and prints $N/2$ (to reduce graph height) constants.2553 The experiment uses element types @int@ and @pair(short, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack. 2554 2554 2555 2555 \begin{figure} 2556 2556 \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt] 2557 int main( int argc, char * argv[]) {2557 int main() { 2558 2558 int max = 0, val = 42; 2559 2559 stack( int ) si, ti; 2560 2560 2561 2561 REPEAT_TIMED( "push_int", N, push( si, val ); ) 2562 TIMED( "copy_int", ti = si; )2562 TIMED( "copy_int", ti{ si }; ) 2563 2563 TIMED( "clear_int", clear( si ); ) 2564 2564 REPEAT_TIMED( "pop_int", N, 2565 2565 int x = pop( ti ); if ( x > max ) max = x; ) 2566 2566 2567 pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' };2568 stack( pair( _Bool, char ) ) sp, tp;2567 pair( short, char ) max = { 0h, '\0' }, val = { 42h, 'a' }; 2568 stack( pair( short, char ) ) sp, tp; 2569 2569 2570 2570 REPEAT_TIMED( "push_pair", N, push( sp, val ); ) 2571 TIMED( "copy_pair", tp = sp; )2571 TIMED( "copy_pair", tp{ sp }; ) 2572 2572 TIMED( "clear_pair", clear( sp ); ) 2573 2573 REPEAT_TIMED( "pop_pair", N, 2574 pair( _Bool, char) x = pop( tp ); if ( x > max ) max = x; )2574 pair(short, char) x = pop( tp ); if ( x > max ) max = x; ) 2575 2575 } 2576 2576 \end{cfa} … … 2583 2583 hence runtime checks are necessary to safely down-cast objects. 2584 2584 The most notable difference among the implementations is in memory layout of generic types: \CFA and \CC inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV lack such a capability and instead must store generic objects via pointers to separately-allocated objects. 2585 For the print benchmark, idiomatic printing is used: the C and \CFA variants used @stdio.h@, while the \CC and \CCV variants used @iostream@; preliminary tests show this distinction has negligible runtime impact. 2586 Note, the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically. 2585 Note that the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically. 2587 2586 2588 2587 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents. 2589 2588 The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted. 2590 All code is compiled at \texttt{-O2} by gcc or g++ 6. 2.0, with all \CC code compiled as \CCfourteen.2589 All code is compiled at \texttt{-O2} by gcc or g++ 6.3.0, with all \CC code compiled as \CCfourteen. 2591 2590 The benchmarks are run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz maximum clock frequency. 2592 2591 … … 2606 2605 & \CT{C} & \CT{\CFA} & \CT{\CC} & \CT{\CCV} \\ \hline 2607 2606 maximum memory usage (MB) & 10001 & 2502 & 2503 & 11253 \\ 2608 source code size (lines) & 247 & 222 & 165 & 339\\2609 redundant type annotations (lines) & 39 & 2 & 2 & 15\\2610 binary size (KB) & 14 & 2 29 & 18 & 38\\2607 source code size (lines) & 187 & 188 & 133 & 303 \\ 2608 redundant type annotations (lines) & 25 & 0 & 2 & 16 \\ 2609 binary size (KB) & 14 & 257 & 14 & 37 \\ 2611 2610 \end{tabular} 2612 2611 \end{table} 2613 2612 2614 2613 The C and \CCV variants are generally the slowest with the largest memory footprint, because of their less-efficient memory layout and the pointer-indirection necessary to implement generic types; 2615 this inefficiency is exacerbated by the second level of generic types in the pair -basedbenchmarks.2616 By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @ _Bool@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.2614 this inefficiency is exacerbated by the second level of generic types in the pair benchmarks. 2615 By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @short@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead. 2617 2616 \CCV is slower than C largely due to the cost of runtime type-checking of down-casts (implemented with @dynamic_cast@); 2618 There are two outliers in the graph for \CFA: all prints and pop of @pair@. 2619 Both of these cases result from the complexity of the C-generated polymorphic code, so that the gcc compiler is unable to optimize some dead code and condense nested calls. 2620 A compiler designed for \CFA could easily perform these optimizations. 2617 The outlier in the graph for \CFA, pop @pair@, results from the complexity of the generated-C polymorphic code. 2618 The gcc compiler is unable to optimize some dead code and condense nested calls; a compiler designed for \CFA could easily perform these optimizations. 2621 2619 Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries. 2622 2620 2623 \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54lines, respectively.2621 \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 41 and 42 lines, respectively. 2624 2622 On the other hand, C does not have a generic collections-library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers. 2625 \CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @ bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;2623 \CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @char@, @short@, and @int@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library; 2626 2624 with their omission, the \CCV line count is similar to C. 2627 2625 We justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose. … … 2629 2627 Raw line-count, however, is a fairly rough measure of code complexity; 2630 2628 another important factor is how much type information the programmer must manually specify, especially where that information is not checked by the compiler. 2631 Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointer sarguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@).2629 Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointer arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@). 2632 2630 To quantify this, the ``redundant type annotations'' line in Table~\ref{tab:eval} counts the number of lines on which the type of a known variable is re-specified, either as a format specifier, explicit downcast, type-specific function, or by name in a @sizeof@, struct literal, or @new@ expression. 2633 2631 The \CC benchmark uses two redundant type annotations to create a new stack nodes, while the C and \CCV benchmarks have several such annotations spread throughout their code. 2634 The two instances in which the \CFA benchmark still uses redundant type specifiers are to cast the result of a polymorphic @malloc@ call (the @sizeof@ argument is inferred by the compiler). 2635 These uses are similar to the @new@ expressions in \CC, though the \CFA compiler's type resolver should shortly render even these type casts superfluous. 2636 2632 The \CFA benchmark was able to eliminate all redundant type annotations through use of the polymorphic @alloc@ function discussed in Section~\ref{sec:libraries}. 2637 2633 2638 2634 \section{Related Work} 2639 2640 2635 2641 2636 \subsection{Polymorphism} … … 2765 2760 \CFA 2766 2761 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt] 2767 forall(otype T) struct stack_node;2768 forall(otype T) struct stack {2769 stack_node(T) * head;2770 };2771 2762 forall(otype T) struct stack_node { 2772 2763 T value; 2773 2764 stack_node(T) * next; 2774 2765 }; 2766 forall(otype T) struct stack { stack_node(T) * head; }; 2775 2767 forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; } 2776 2768 forall(otype T) void ?{}( stack(T) & s, stack(T) t ) { 2777 2769 stack_node(T) ** crnt = &s.head; 2778 2770 for ( stack_node(T) * next = t.head; next; next = next->next ) { 2779 stack_node(T) * new_node = ((stack_node(T)*)malloc()); 2780 (*new_node){ next->value }; /***/ 2781 *crnt = new_node; 2782 stack_node(T) * acrnt = *crnt; 2783 crnt = &acrnt->next; 2771 *crnt = alloc(); 2772 ((*crnt)->value){ next->value }; 2773 crnt = &(*crnt)->next; 2784 2774 } 2785 2775 *crnt = 0; … … 2794 2784 forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; } 2795 2785 forall(otype T) void push( stack(T) & s, T value ) { 2796 stack_node(T) * n ew_node = ((stack_node(T)*)malloc());2797 (*n ew_node){ value, s.head }; /***/2798 s.head = new_node;2786 stack_node(T) * n = alloc(); 2787 (*n){ value, head }; 2788 head = n; 2799 2789 } 2800 2790 forall(otype T) T pop( stack(T) & s ) { 2801 stack_node(T) * n = s.head; 2802 s.head = n->next; 2803 T v = n->value; 2804 delete( n ); 2805 return v; 2791 stack_node(T) * n = head; 2792 head = n->next; 2793 T x = n->value; 2794 ^(*n){}; 2795 free( n ); 2796 return x; 2806 2797 } 2807 2798 forall(otype T) void clear( stack(T) & s ) { 2808 for ( stack_node(T) * next = s.head; next; ) {2799 for ( stack_node(T) * next = head; next; ) { 2809 2800 stack_node(T) * crnt = next; 2810 2801 next = crnt->next; 2811 delete( crnt ); 2802 ^(*crnt){}; 2803 free(crnt); 2812 2804 } 2813 s.head = 0;2805 head = 0; 2814 2806 } 2815 2807 \end{cfa} … … 2818 2810 \CC 2819 2811 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt] 2820 template<typename T> classstack {2812 template<typename T> struct stack { 2821 2813 struct node { 2822 2814 T value; … … 2825 2817 }; 2826 2818 node * head; 2827 void copy(const stack<T> & o) {2819 void copy(const stack<T> & o) { 2828 2820 node ** crnt = &head; 2829 2821 for ( node * next = o.head;; next; next = next->next ) { … … 2833 2825 *crnt = nullptr; 2834 2826 } 2835 public:2836 2827 stack() : head(nullptr) {} 2837 stack(const stack<T> & o) { copy(o); }2828 stack(const stack<T> & o) { copy(o); } 2838 2829 stack(stack<T> && o) : head(o.head) { o.head = nullptr; } 2839 2830 ~stack() { clear(); } 2840 stack & operator= (const stack<T> & o) {2831 stack & operator= (const stack<T> & o) { 2841 2832 if ( this == &o ) return *this; 2842 2833 clear(); … … 2877 2868 struct stack_node * next; 2878 2869 }; 2870 struct stack { struct stack_node* head; }; 2879 2871 struct stack new_stack() { return (struct stack){ NULL }; /***/ } 2880 2872 void copy_stack(struct stack * s, const struct stack * t, void * (*copy)(const void *)) { … … 2882 2874 for ( struct stack_node * next = t->head; next; next = next->next ) { 2883 2875 *crnt = malloc(sizeof(struct stack_node)); /***/ 2884 **crnt = (struct stack_node){ copy(next->value) }; /***/2876 (*crnt)->value = copy(next->value); 2885 2877 crnt = &(*crnt)->next; 2886 2878 } 2887 *crnt = 0;2879 *crnt = NULL; 2888 2880 } 2889 2881 _Bool stack_empty(const struct stack * s) { return s->head == NULL; } … … 2914 2906 \CCV 2915 2907 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt] 2916 stack::node::node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {} 2917 void stack::copy(const stack & o) { 2918 node ** crnt = &head; 2919 for ( node * next = o.head; next; next = next->next ) { 2920 *crnt = new node{ *next->value }; 2921 crnt = &(*crnt)->next; 2908 struct stack { 2909 struct node { 2910 ptr<object> value; 2911 node* next; 2912 node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {} 2913 }; 2914 node* head; 2915 void copy(const stack & o) { 2916 node ** crnt = &head; 2917 for ( node * next = o.head; next; next = next->next ) { 2918 *crnt = new node{ *next->value }; /***/ 2919 crnt = &(*crnt)->next; 2920 } 2921 *crnt = nullptr; 2922 2922 } 2923 *crnt = nullptr; 2924 } 2925 stack::stack() : head(nullptr) {} 2926 stack::stack(const stack & o) { copy(o); } 2927 stack::stack(stack && o) : head(o.head) { o.head = nullptr; } 2928 stack::~stack() { clear(); } 2929 stack & stack::operator= (const stack & o) { 2930 if ( this == &o ) return *this; 2931 clear(); 2932 copy(o); 2933 return *this; 2934 } 2935 stack & stack::operator= (stack && o) { 2936 if ( this == &o ) return *this; 2937 head = o.head; 2938 o.head = nullptr; 2939 return *this; 2940 } 2941 bool stack::empty() const { return head == nullptr; } 2942 void stack::push(const object & value) { head = new node{ value, head }; /***/ } 2943 ptr<object> stack::pop() { 2944 node * n = head; 2945 head = n->next; 2946 ptr<object> x = std::move(n->value); 2947 delete n; 2948 return x; 2949 } 2950 void stack::clear() { 2951 for ( node * next = head; next; ) { 2952 node * crnt = next; 2953 next = crnt->next; 2954 delete crnt; 2923 stack() : head(nullptr) {} 2924 stack(const stack & o) { copy(o); } 2925 stack(stack && o) : head(o.head) { o.head = nullptr; } 2926 ~stack() { clear(); } 2927 stack & operator= (const stack & o) { 2928 if ( this == &o ) return *this; 2929 clear(); 2930 copy(o); 2931 return *this; 2955 2932 } 2956 head = nullptr; 2957 } 2933 stack & operator= (stack && o) { 2934 if ( this == &o ) return *this; 2935 head = o.head; 2936 o.head = nullptr; 2937 return *this; 2938 } 2939 bool empty() const { return head == nullptr; } 2940 void push(const object & value) { head = new node{ value, head }; /***/ } 2941 ptr<object> pop() { 2942 node * n = head; 2943 head = n->next; 2944 ptr<object> x = std::move(n->value); 2945 delete n; 2946 return x; 2947 } 2948 void clear() { 2949 for ( node * next = head; next; ) { 2950 node * crnt = next; 2951 next = crnt->next; 2952 delete crnt; 2953 } 2954 head = nullptr; 2955 } 2956 }; 2958 2957 \end{cfa} 2959 2958 -
doc/papers/general/evaluation/cfa-bench.c
rfb2ce27 r28bc8c8 3 3 #include "cfa-pair.h" 4 4 5 int main( int argc, char * argv[]) {5 int main() { 6 6 int max = 0, val = 42; 7 7 stack( int ) si, ti; -
doc/papers/general/evaluation/cfa-stack.c
rfb2ce27 r28bc8c8 12 12 stack_node(T) ** crnt = &s.head; 13 13 for ( stack_node(T) * next = t.head; next; next = next->next ) { 14 *crnt = malloc();14 *crnt = alloc(); 15 15 ((*crnt)->value){ next->value }; 16 16 crnt = &(*crnt)->next; … … 31 31 32 32 forall(otype T) void push( stack(T) & s, T value ) with( s ) { 33 stack_node(T)* n = malloc();33 stack_node(T)* n = alloc(); 34 34 (*n){ value, head }; 35 35 head = n; -
doc/papers/general/evaluation/timing.dat
rfb2ce27 r28bc8c8 1 1 "400 million repetitions" "C" "\\CFA{}" "\\CC{}" "\\CC{obj}" 2 "push\nint" 2976 2225 1522 3266 3 "copy\nnt" 2932 7072 1526 3110 4 "clear\nint" 1380 731 750 1488 5 "pop\nint" 1444 1196 756 5156 6 "push\npair" 3695 2257 953 6840 7 "copy\npair" 6034 6650 994 7224 8 "clear\npair" 2832 848 742 3297 9 "pop\npair" 3009 5348 797 25235 10 2 "push\nint" 3002 2459 1520 3305 3 "copy\nint" 2985 2057 1521 3152 4 "clear\nint" 1374 827 718 1469 5 "pop\nint" 1416 1221 717 5467 6 "push\npair" 4214 2752 946 6826 7 "copy\npair" 6127 2105 993 7330 8 "clear\npair" 2881 885 711 3564 9 "pop\npair" 3046 5434 783 26538
Note: See TracChangeset
for help on using the changeset viewer.