Changeset b1c220a for doc/theses/mike_brooks_MMath/string.tex
- Timestamp:
- Mar 27, 2025, 3:48:24 PM (6 months ago)
- Branches:
- master
- Children:
- 185cd94
- Parents:
- 379b6ea
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
r379b6ea rb1c220a 459 459 460 460 There are tradeoffs associated with the copy-on-write mechanism. 461 Several qualitative matters are detailed in the \VRef{s:PerformanceAssessment} and the qualitative issue of multi-threaded support is introduced here. 462 The \CFA sting library provides a switch to disable the sharing mechanism for situations where it is inappropriate. 463 464 Because of the inter-linked string handles, any participant managing one string is also managing, directly, the neighbouring strings, and from there, a data structure of the ``set of all strings.'' This data structure is intended for sequential access. 465 A negative consequence of this decision is that multiple threads using strings need to be set up so that they avoid attempting to modify (concurrently) an instance of this structure. 466 A positive consequence is that a single-threaded program, or a program with several independent threads, can use the sharing context without locking overhead. 461 Several qualitative matters are detailed in \VRef{s:PerformanceAssessment} and the qualitative issue of multi-threaded support is introduced here. 462 The \CFA string library provides a switch to disable threads allocating from the string buffer, when string sharing is unsafe. 463 When toggled, string management is moved to the storage allocator, specifically @malloc@/@free@, where the storage allocator is assumed to be thread-safe. 464 465 In detail, string sharing has inter-linked string handles, so any participant managing one string is also managing, directly, the neighbouring strings, and from there, a data structure of the ``set of all strings.'' 466 This string structure is intended for sequential access. 467 Hence, multiple threads using shared strings need to avoid modifying (concurrently) an instance of this structure (like Java immutable strings). 468 A positive consequence of this approach is that independent threads can use the sharing buffer without locking overhead. 467 469 468 470 When the string library is running with sharing disabled, it runs without implicit thread-safety challenges, which is the same as the \CC STL, and with performance goals similar to the STL. … … 470 472 Hence, concurrent users of string objects must still bring their own mutual exclusion, but the string library does not add any cross thread uses that are not apparent in a user's code. 471 473 472 The \CFA string library provides the type @string_sharectx@ to control an ambient sharing context for thecurrent thread.474 The \CFA string library provides the type @string_sharectx@ to control an ambient sharing context for a current thread. 473 475 It allows two adjustments: to opt out of sharing entirely or to begin sharing within a private context. 474 476 Either way, the chosen mode applies only to the current thread, for the duration of the lifetime of the created @string_sharectx@ object, up to being suspended by child lifetimes of different contexts. … … 492 494 \lstinputlisting[language=CFA, firstline=10, lastline=55]{sharectx.run.cfa} 493 495 & 494 \ includegraphics{string-sharectx.pdf}496 \raisebox{-0.17\totalheight}{\includegraphics{string-sharectx.pdf}} % lower 495 497 \end{tabular} 496 498 \caption{Controlling copying vs sharing of strings using \lstinline{string_sharectx}.} … … 519 521 520 522 To discuss: general goal of ... 521 while STL makes you think about memory management, all the time, and if you do your performance can be great ...523 while STL makes you think about memory management, all the time, and if you do, your performance can be great ... 522 524 \CFA sacrifices this advantage modestly in exchange for big wins when you're not thinking about memory management. 523 525 [Does this position cover all of it?] … … 530 532 \subsection{Methodology} 531 533 532 These tests use randomly generated text fragments of varying lengths.534 These tests use randomly generated varying-length strings (string content is immaterial). 533 535 A collection of such fragments is a \emph{corpus}. 534 536 The mean length of a fragment from a corpus is a typical explanatory variable. … … 540 542 \end{description} 541 543 The geometric distribution implies that lengths much longer than the mean occur frequently. 542 The special treatment of length 16 deals with comparison to STL, given that STL has short-string optimization (see [TODO: write and cross-ref future-work SSO]), currently not implemented in \CFA.543 When success , notwithstanding SSO, is illustrated, a fixed-size or from-16 distribution ensures that extra-optimized cases are not part of the mix on the STL side.544 The special treatment of length 16 deals with comparison to STL, given that STL has a short-string optimization (see [TODO: write and cross-ref future-work SSO]), currently not implemented in \CFA. 545 When success is illustrated, notwithstanding SSO, a fixed-size or from-16 distribution ensures that extra-optimized cases are not part of the mix on the STL side. 544 546 In all experiments that use a corpus, its text is generated and loaded into the SUT before the timed phase begins. 545 547 … … 564 566 565 567 Another experimental variable is whether the user's logical allocation is fresh \vs reused. 566 Here, \emph{reusing a logical allocation}, means that the program variable, into which the user is concatenating, previously held a long string: \\568 Here, \emph{reusing a logical allocation}, means that the program variable, into which the user is concatenating, previously held a long string: 567 569 \begin{cquote} 568 570 \setlength{\tabcolsep}{20pt} 569 571 \begin{tabular}{@{}ll@{}} 572 \multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\ 570 573 \begin{cfa} 571 574 572 575 for ( ... ) { 573 @string x;@ // fresh576 @string x;@ 574 577 for ( ... ) 575 578 x += ... … … 580 583 string x; 581 584 for ( ... ) { 582 @x = "";@ // reused585 @x = "";@ 583 586 for ( ... ) 584 587 x += ... … … 587 590 \end{tabular} 588 591 \end{cquote} 589 These benchmark drivers have an outer loop for ``until a sample-worthy amount of execution has happened'' and an inner loop for ``build up the desired-length string.'' 592 593 All benchmark drivers have an outer loop for ``until a sample-worthy amount of execution has happened'' and an inner loop for ``building up the desired-length string.'' 590 594 In general, a user should not have to care about this difference, yet the STL performs differently in these cases. 591 595 Furthermore, if a routine takes a string by reference, if cannot use the fresh approach. … … 593 597 For the STL, this cost includes obtaining a fresh buffer from the memory allocator and copying older characters into the new buffer, while \CFA-sharing hides such a cost entirely. 594 598 The fresh \vs reuse distinction is only relevant in the \emph{append} tests. 595 596 The \emph{append} tests use the varying-from-1 corpus construction, \ie they do not assume away the STL's advantage for small-string optimization.597 \PAB{To discuss: any other case variables introduced in the performance intro}598 \VRef[Figure]{fig:string-graph-peq-cppemu} shows this behaviour, by the STL and by \CFA in STL emulation mode.599 \CFA reproduces STL's performance, up to a 15\% penalty averaged over the cases shown, diminishing with larger strings, and 50\% in the worst case.600 This penalty characterizes the amount of implementation fine tuning done with STL and not done with \CFA in present state.601 The larger inherent penalty, for a user mismanaging reuse, is 40\% averaged over the cases shown, is minimally 24\%, shows up consistently between the STL and \CFA implementations, and increases with larger strings.602 599 603 600 \begin{figure} … … 609 606 \end{figure} 610 607 608 The \emph{append} tests use the varying-from-1 corpus construction, \ie they assume the STL's advantage of small-string optimization. 609 \PAB{To discuss: any other case variables introduced in the performance intro} 610 \VRef[Figure]{fig:string-graph-peq-cppemu} shows this behaviour, by the STL and by \CFA in STL emulation mode. 611 \CFA reproduces STL's performance, up to a 15\% penalty averaged over the cases shown, diminishing with larger strings, and 50\% in the worst case. 612 This penalty characterizes the amount of implementation fine tuning done with STL and not done with \CFA in present state. 613 \PAB{The larger inherent penalty, for a user mismanaging reuse, is 40\% averaged over the cases shown, is minimally 24\%, shows up consistently between the STL and \CFA implementations, and increases with larger strings.} 614 611 615 \begin{figure} 612 616 \centering … … 624 628 \includegraphics{string-graph-pta-sharing.pdf} 625 629 % \includegraphics[width=\textwidth]{string-graph-pta-sharing.png} 626 \caption{Average time per iteration (lower is better) with one \lstinline{x = x + y} invocation (new, purple bands), comparing \CFA (having implicit sharing activated) with STL.630 \caption{Average time per iteration (lower is better) with one \lstinline{x = x + y} invocation, comparing \CFA (having implicit sharing activated) with STL. 627 631 For context, the results from \VRef[Figure]{fig:string-graph-peq-sharing} are repeated as the bottom bands. 628 632 While not a design goal, and not graphed out, \CFA in STL-emulation mode outperformed STL in this case; user-managed allocation reuse did not affect any of the implementations in this case.} … … 637 641 638 642 To have introduced: STL string library forces users to think about memory management when communicating values across a function call 639 643 \begin{cfa} 644 void foo( string s ); 645 string s = "abc"; 646 foo( s ); 647 \end{cfa} 640 648 STL charges a prohibitive penalty for passing a string by value. 641 649 With implicit sharing active, \CFA treats this operation as normal and supported. … … 657 665 STL's performance worsens as string length increases, while \CFA has the same performance at all sizes. 658 666 659 The \CFA cost to pass is nontrivial.667 The \CFA cost to pass a string is nontrivial. 660 668 The contributor is adding and removing the callee's string handle from the global list. 661 669 This cost is $1.5 \times$ to $2 \times$ over STL's when small-string optimization applies, though this cost should be avoidable in the same case, given a \CFA realization of this optimization.
Note:
See TracChangeset
for help on using the changeset viewer.