Changeset ac9c0ee8


Ignore:
Timestamp:
Apr 26, 2026, 6:41:18 PM (2 days ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
74accc6
Parents:
eeefc0c
Message:

proofreading in string chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/string.tex

    reeefc0c rac9c0ee8  
    21812181This plot breaks down the time spent, comparing STL--\CFA tradeoffs, at successful len-50 with unsuccessful len-200.
    21822182Data are sourced from running the experiment under \emph{perf}, recording samples of the call stack, and imposing a mutually-exclusive classification on these call stacks.
    2183 Reading a stacked bar from the top down, \emph{text import} captures samples where a routine like @memcpy@ is running, so that the string library is copying characters from the corpus source into its allocated working space.
     2183Reading a stacked bar from the top down, \emph{text import} captures samples where a routine like @memcpy@ is running, so that the string library is copying characters from the corpus source into its allocated text buffer.
    21842184\emph{Malloc-free} means literally one of those routines (taking nontrivial time only for STL), while \emph{gc} is the direct \CFA(-only) equivalent.
    21852185All of the attributions so far occur while a string is live; further time spent in a string's lifecycle management functions is attributed \emph{ctor-dtor}.
     
    22112211If the STL monolithic compilation advantage is removed from consideration, the \emph{text-import} difference is the only reason that \CFA is not beating STL on speed, by about 10\%, across the board.
    22122212
    2213 An investigation\footnote{
    2214         \MLB{Peter, you need to be okay with this.}
    2215         The description of this investigation that appears in the current draft is my best recollection concerning work done previously.
    2216         But, so far, I have been unable to find this actual work.
    2217         90\% case: I find it or reproduce it and save the details properly; this footnote disappears.
    2218         10\% case: I can't do so; I retract the explanation above.
    2219 } into the \emph{text-import} difference revealed an interesting optimization opportunity.
    2220 Both implementations use a @memcpy@ operation, sourcing from the program's @argv@ representation, targeting the string library's working space.
    2221 The @memcpy@ action is inlined into its call site successfully, in both implementations.
    2222 But STL's, which runs faster, does the data movement with vector instructions, while \CFA's does not.
    2223 This STL-only instruction sequence appears to be correct only when the source and destination have their starting byte at the same offset within a vector chunk.
    2224 The \CFA implementation has made no provision for this quality, so it is good for correctness that \CFA does not receive the vector version.
    2225 Presumably, the optimizer (or check affecting the instruction stream) has noticed STL arranging for the destination to line up with the source.
    2226 It could do so either by matching a known alignment (statically) or choosing to match the source's unaligned chunk offset (dynamically).
    2227 Either possibility would be a choice to incur further fragmentation, when allocating working space (the copy's destination), in exchange for a faster copy.
    2228 The \CFA implementation may benefit from attempting such a scheme.
    2229 At present, incorporating the necessary fragementation into the working heap management is too disruptive.
     2213An investigation into the \emph{text-import} difference revealed an interesting optimization opportunity.
     2214Both implementations use a @memcpy@ operation, sourcing from the program's @argv@ strings (corpus strings specified on the command line), targeting the string library's text buffer.
     2215In both implementations, the @memcpy@ code is inlined.
     2216However, at runtime, the STL's version runs faster, by doing the data movement with vector instructions, while \CFA's does not.
     2217This STL optimization only occurs when the source and destination are aligned on a memory boundary matching with vector-data alignment (64-byte alignment).
     2218However, strings in the \CFA text buffer are only byte aligned, whereas the \CC SSO and @malloc@ed strings are 16-byte aligned, increasing the possibly of vector alignment or an optimization that ultimately results in vector operations.
     2219The \CFA implementation may benefit from such a scheme by wasting a small amount of space to position strings at a larger alignment boundary.
     2220At present, incorporating this optimization into the heap management is too disruptive.
    22302221So, this discovery is left as a potential improvement.
    22312222
Note: See TracChangeset for help on using the changeset viewer.