Changeset ac9c0ee8
- Timestamp:
- Apr 26, 2026, 6:41:18 PM (2 days ago)
- Branches:
- master
- Children:
- 74accc6
- Parents:
- eeefc0c
- File:
-
- 1 edited
-
doc/theses/mike_brooks_MMath/string.tex (modified) (2 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
reeefc0c rac9c0ee8 2181 2181 This plot breaks down the time spent, comparing STL--\CFA tradeoffs, at successful len-50 with unsuccessful len-200. 2182 2182 Data are sourced from running the experiment under \emph{perf}, recording samples of the call stack, and imposing a mutually-exclusive classification on these call stacks. 2183 Reading a stacked bar from the top down, \emph{text import} captures samples where a routine like @memcpy@ is running, so that the string library is copying characters from the corpus source into its allocated working space.2183 Reading a stacked bar from the top down, \emph{text import} captures samples where a routine like @memcpy@ is running, so that the string library is copying characters from the corpus source into its allocated text buffer. 2184 2184 \emph{Malloc-free} means literally one of those routines (taking nontrivial time only for STL), while \emph{gc} is the direct \CFA(-only) equivalent. 2185 2185 All of the attributions so far occur while a string is live; further time spent in a string's lifecycle management functions is attributed \emph{ctor-dtor}. … … 2211 2211 If the STL monolithic compilation advantage is removed from consideration, the \emph{text-import} difference is the only reason that \CFA is not beating STL on speed, by about 10\%, across the board. 2212 2212 2213 An investigation\footnote{ 2214 \MLB{Peter, you need to be okay with this.} 2215 The description of this investigation that appears in the current draft is my best recollection concerning work done previously. 2216 But, so far, I have been unable to find this actual work. 2217 90\% case: I find it or reproduce it and save the details properly; this footnote disappears. 2218 10\% case: I can't do so; I retract the explanation above. 2219 } into the \emph{text-import} difference revealed an interesting optimization opportunity. 2220 Both implementations use a @memcpy@ operation, sourcing from the program's @argv@ representation, targeting the string library's working space. 2221 The @memcpy@ action is inlined into its call site successfully, in both implementations. 2222 But STL's, which runs faster, does the data movement with vector instructions, while \CFA's does not. 2223 This STL-only instruction sequence appears to be correct only when the source and destination have their starting byte at the same offset within a vector chunk. 2224 The \CFA implementation has made no provision for this quality, so it is good for correctness that \CFA does not receive the vector version. 2225 Presumably, the optimizer (or check affecting the instruction stream) has noticed STL arranging for the destination to line up with the source. 2226 It could do so either by matching a known alignment (statically) or choosing to match the source's unaligned chunk offset (dynamically). 2227 Either possibility would be a choice to incur further fragmentation, when allocating working space (the copy's destination), in exchange for a faster copy. 2228 The \CFA implementation may benefit from attempting such a scheme. 2229 At present, incorporating the necessary fragementation into the working heap management is too disruptive. 2213 An investigation into the \emph{text-import} difference revealed an interesting optimization opportunity. 2214 Both implementations use a @memcpy@ operation, sourcing from the program's @argv@ strings (corpus strings specified on the command line), targeting the string library's text buffer. 2215 In both implementations, the @memcpy@ code is inlined. 2216 However, at runtime, the STL's version runs faster, by doing the data movement with vector instructions, while \CFA's does not. 2217 This STL optimization only occurs when the source and destination are aligned on a memory boundary matching with vector-data alignment (64-byte alignment). 2218 However, strings in the \CFA text buffer are only byte aligned, whereas the \CC SSO and @malloc@ed strings are 16-byte aligned, increasing the possibly of vector alignment or an optimization that ultimately results in vector operations. 2219 The \CFA implementation may benefit from such a scheme by wasting a small amount of space to position strings at a larger alignment boundary. 2220 At present, incorporating this optimization into the heap management is too disruptive. 2230 2221 So, this discovery is left as a potential improvement. 2231 2222
Note:
See TracChangeset
for help on using the changeset viewer.