Ignore:
Timestamp:
May 13, 2025, 1:17:50 PM (4 months ago)
Author:
Mike Brooks <mlbrooks@…>
Branches:
master
Children:
0528d79
Parents:
7d02d35 (diff), 2410424 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/string.tex

    r7d02d35 rbd72f517  
    1717\begin{cquote}
    1818\begin{tabular}{@{}l|l|l|l@{}}
    19 C @char [ ]@                    &  \CC @string@                 & Java @String@     & \CFA @string@     \\
     19C @char [ ]@                    &  \CC @string@                 & Java @String@ & \CFA @string@ \\
    2020\hline
    21 @strcpy@, @strncpy@             & @=@                                   & @=@               & @=@       \\
    22 @strcat@, @strncat@             & @+@, @+=@                             & @+@, @+=@         & @+@, @+=@ \\
     21@strcpy@, @strncpy@             & @=@                                   & @=@                   & @=@   \\
     22@strcat@, @strncat@             & @+@, @+=@                             & @+@, @+=@             & @+@, @+=@     \\
    2323@strcmp@, @strncmp@             & @==@, @!=@, @<@, @<=@, @>@, @>=@
    24                                                 & @equals@, @compareTo@
    25                                                                                                                                         & @==@, @!=@, @<@, @<=@, @>@, @>=@ \\
    26 @strlen@                                & @length@, @size@              & @length@                      & @size@        \\
    27 @[ ]@                                   & @[ ]@                                 & @charAt@          & @[ ]@     \\
    28 @strncpy@                               & @substr@                              & @substring@       & @( )@, on RHS of @=@      \\
    29 @strncpy@                               & @replace@                             & @replace@         & @( )@, on LHS of @=@ \\
    30 @strstr@                                & @find@                                & @indexOf@         & @find@ \\
    31 @strcspn@                               & @find_first_of@               & @matches@         & @include@ \\
    32 @strspn@                                & @find_first_not_of@   & @matches@         & @exclude@ \\
    33 n/a                                             & @c_str@, @data@               & n/a               & @strcpy@, @strncpy@ \\
     24                                                                                                & @equals@, @compareTo@
     25                                                                                                                                & @==@, @!=@, @<@, @<=@, @>@, @>=@ \\
     26@strlen@                                & @length@, @size@              & @length@              & @size@        \\
     27@[ ]@                                   & @[ ]@                                 & @charAt@              & @[ ]@ \\
     28@strncpy@                               & @substr@                              & @substring@   & @( )@, on RHS of @=@  \\
     29@strncpy@                               & @replace@                             & @replace@             & @( )@, on LHS of @=@ \\
     30@strstr@                                & @find@                                & @indexOf@             & @find@ \\
     31@strcspn@                               & @find_first_of@               & @matches@             & @include@ \\
     32@strspn@                                & @find_first_not_of@   & @matches@             & @exclude@ \\
     33N/A                                             & @c_str@, @data@               & N/A                   & @strcpy@, @strncpy@ \\
    3434\end{tabular}
    3535\end{cquote}
     
    5353
    5454
    55 \section{\CFA \lstinline{string} type}
     55\section{\CFA \lstinline{string} Type}
    5656\label{s:stringType}
    5757
     
    272272ch = ch + 'b'; $\C[2in]{// LHS disambiguate, add character values}$
    273273s = 'a' + 'b'; $\C{// LHS disambiguate, concatenate characters}$
    274 printf( "%c\n", @'a' + 'b'@ ); $\C[2in]{// no LHS information, ambiguous}$
    275 printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast}$
     274printf( "%c\n", @'a' + 'b'@ ); $\C{// no LHS information, ambiguous}$
     275printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast}\CRT$
    276276\end{cfa}
    277277The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion).
     
    319319ch = ch * 3; $\C[2in]{// LHS disambiguate, multiply character values}$
    320320s = 'a' * 3; $\C{// LHS disambiguate, concatenate characters}$
    321 printf( "%c\n", @'a' * 3@ ); $\C[2in]{// no LHS information, ambiguous}$
    322 printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast}$
     321printf( "%c\n", @'a' * 3@ ); $\C{// no LHS information, ambiguous}$
     322printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast}\CRT$
    323323\end{cfa}
    324324Fortunately, character multiplication without LHS information is even rarer than addition, so repurposing the operator @*@ for @string@ types is not a problem.
     
    600600&
    601601\begin{cfa}
    602 for ( ;; ) {
     602for () {
    603603        size_t posn = exclude( line, alpha );
    604604  if ( posn == len( line ) ) break;
     
    722722\end{tabular}
    723723\end{cquote}
    724 Input text can be gulped, including whitespace, from the current point to an arbitrary delimiter character using @getline@.
     724Input text can be \emph{gulped}, including whitespace, from the current point to an arbitrary delimiter character using @getline@.
    725725
    726726The \CFA philosophy for input is that, for every constant type in C, these constants should be usable as input.
     
    760760\end{tabular}
    761761\end{cquote}
    762 Note, the ability to read in quoted strings to match with program string constants.
     762Note, the ability to read in quoted strings with whitespace to match with program string constants.
    763763The @nl@ at the end of an input ignores the rest of the line.
    764764
     
    845845                                        & Laxed: The target's type is anything string-like; it may have a different status concerning ownership.
    846846                                                                & Strict: The target's type is the same as the source; both strings are equivalent peers concerning ownership.
    847                                                                                         & n/a           & no    & yes   & yes \\
     847                                                                                        & N/A           & no    & yes   & yes \\
    848848\hline
    849849Referent
     
    863863        The C ``string'' is @char *@, under the conventions of @<string.h>@. Because this type does not manage a text allocation, symmetry does not apply.
    864864\item
    865         The Java @String@ class is analyzed; its @StringBuffer@ class behaves similarly to @C++@.
     865        The Java @String@ class is analyzed; its @StringBuffer@ class behaves similarly to \CC.
    866866\end{itemize}
    867867\caption{Comparison of languages' strings, storage management perspective.}
     
    869869\end{figure}
    870870
    871 In C, these declarations give very different things.
     871In C, these declarations are very different.
    872872\begin{cfa}
    873873char x[$\,$] = "abcde";
    874874char * y = "abcde";
    875875\end{cfa}
    876 Both associate the declared name with fixed-six contiguous bytes, filled as @{'a', 'b', 'c', 'd', 'e', 0}@.
    877 But @x@ gets them allocated in the active stack frame (with values filled in as control passes the declaration), while @y@ refers into the executable's read-only data section.
     876Both associate the declared name with the fixed, six contiguous bytes: @{'a', 'b', 'c', 'd', 'e', 0}@.
     877But @x@ is allocated on the stack (with values filled at the declaration), while @y@ refers to the executable's read-only data-section.
    878878With @x@ representing an allocation, it offers information in @sizeof(x)@ that @y@ does not.
    879 But this extra information is second-class, as it can only be used in the immediate lexical context, \ie it cannot be passed on to string operations or user functions.
     879But this extra information is second-class, as it can only be used in the immediate lexical context, \ie it cannot be passed to string operations or user functions.
    880880Only pointers to text buffers are first-class, and discussed further.
    881 
    882881\begin{cfa}
    883882char * s = "abcde";
    884 char * s1 = s;  $\C{// alias state, n/a symmetry, variable-constrained referent}$
    885 char * s2 = &s[1];  $\C{// alias state, n/a symmetry, variable-constrained referent}\CRT$
    886 char * s3 = &s2[1];  $\C{// alias state, n/a symmetry, variable-constrained referent}
     883char * s1 = s;  $\C[2.25in]{// alias state, N/A symmetry, variable-constrained referent}$
     884char * s2 = &s[1];  $\C{// alias state, N/A symmetry, variable-constrained referent}$
     885char * s3 = &s2[1];  $\C{// alias state, N/A symmetry, variable-constrained referent}\CRT$
    887886printf( "%s %s %s %s\n", s, s1, s2, s3 );
    888887$\texttt{\small abcde abcde bcde cde}$
     
    904903string & s5 = s.substr(2,4);  $\C{// error: cannot point to temporary}\CRT$
    905904\end{cfa}
    906 The @s1@ lax symmetry reflects how its validity of depends on the lifetime of @s@.
     905The @s1@ lax symmetry reflects how its validity depends on the lifetime of @s@.
    907906It is common practice in \CC to use the @s1@-style for a by-reference function parameter.
    908907Doing so assumes that the callee only uses the referenced string for the duration of the call, \ie no storing the parameter (as a reference) for later.
    909908So, when the called function is a constructor, its definition typically uses an @s2@-style copy-initialization.
    910909Exceptions to this pattern are possible, but require the programmer to assure safety where the type system does not.
    911 The @s3@ initialization must copy the substring because it must support a subsequent @c_str@ call, which provides a null-termination, generally at a different position than the source string's.
     910The @s3@ initialization must copy the substring to support a subsequent @c_str@ call, which provides null-termination, generally at a different position than the source string's.
    912911@s2@ assignment could be made fast, by reference-counting the text area and using copy-on-write, but would require an implementation upgrade.
    913912
     
    929928With @s2@, the case for fast-copy is more subtle.
    930929Certainly, its value is not pointer-equal to @s@, implying at least a further allocation.
    931 But because Java is not constrained to use a null-terminated representation, a standard-library implementation is free to refer to the source characters in-place.
     930But because Java is \emph{not} constrained to use a null-terminated representation, a standard-library implementation is free to refer to the source characters in-place.
    932931Java does not meet the aliasing requirement because immutability makes it impossible to modify.
    933932Java's @StringBuffer@ provides aliasing (see @replace@ example on \VPageref{p:JavaReplace}), though without supporting symmetric treatment of a fragment referent, \eg @substring@ of a @StringBuffer@ is a @String@;
     
    960959
    961960
    962 
    963 \subsection{Logical overlap}
     961\subsection{Logical Overlap}
    964962
    965963It may be unfamiliar to combine \VRef[Figure]{f:StrSemanticCompare}'s alias state and fragment referent in one API, or at the same time.
    966964This section shows the capability in action.
    967965
    968 In summary, the metaphor of a GUI text editor is intended.
    969 Selecting a consecutive block of text using the mouse defines an aliased substring within the file.
    970 Typing in this state overwrites what was there before, replacing the originally selected text with more or less text.
    971 But the \emph{whole file} grows or shrinks as a result, not just the selection.
    972 This action models assigning to an aliased substring when the two strings overlap by total containment: one string is the selection, the other is the whole file.
    973 
    974 Now extend the metaphor to a multi-user online editor.
    975 If Alice selects a range of text at the bottom of the file, wile Bob is rewriting a paragraph at the top, Alice's selection holds onto the logical characters initially selected, unaffected by Bob making the total file grow/shrink, and unaffectd by Bob causing the start index of Alice's selction to vary.
    976 This action models assigning to an aliased substring when the two strings do not overlap at all: one string is Alice's selection, the other is Bob's.
    977 
    978 If a third office worker were also watching Alice's and Bob's actions on the whole file (a string with ``all the text'' is kept around), then two further single-user-edit cases give the semantics of the individual edits flowing into the whole.
    979 But, departing from the document analogy, it is not necessary to keep a such a third string:
    980 no one has to resource-manage ``the document.''
    981 When an original string, from which both the Alice- and Bob-parts came, ceases to exist, Alice and Bob are left with two independent strings.
    982 They are independent because Alice and Bob have no API for growing the bounds of a string to subsume text that may once have been around it.
    983 
    984 Edge cases, notably ``Venn-diagram overlap,'' had to have handlings chosen.
    985 The intent in fleshing out these details was to achieve the above story, with a single API, while keeping the rest as simple as possible.
    986 The remainder of this section shows the resulting decisions, played out at the API level.
    987 
    988 \CFA uses the marker @`share@ as a dynamic mechanism to indicate alias (mutations shared) \vs snapshot (not quite an immutable result, but one with subsequent mutations isolated).
     966\begin{comment}
     967The metaphor of a GUI text-editor is used to illustrate combining these features.
     968Most editors allow selecting a consecutive block of text (highlighted) to define an aliased substring within a document.
     969Typing in this area overwrites the prior text, replacing the selected text with less, same, or more text.
     970Importantly, the document also changes size, not just the selection.
     971%This alias model is assigning to an aliased substring for two strings overlapping by total containment: one is the selected string, the other is the document.
     972Extend the metaphor to two selected areas, where one area can be drag-and-dropped into another, changing the text in the drop area and correspondingly changing the document.
     973When the selected areas are indenpendent, the semantics of the drag-and-drop are straightforward.
     974However, for overlapping selections, either partial or full, there are multiple useful semantics.
     975For example, two areas overlap at the top, or bottom, or a block at a corner, where one areas is dropped into the other.
     976For selecting a smaller area within a larger, and dropping the smaller area into the larger to replace it.
     977In both cases, meaningful semantics must be constructed or the operation precluded.
     978However, without this advanced capability, certain operations become multi-step, possible requiring explicit temporaries.
     979\end{comment}
     980
     981A GUI text-editor provides a metaphor.
     982Selecting a block of text using the mouse defines an aliased substring within a document.
     983Typing in this area overwrites what was there, replacing the originally selected text with more or less text.
     984But the \emph{containing document} also grows or shrinks, not just the selection.
     985This action models assigning to an aliased substring when one string is completely contained in the other.
     986
     987Extend the metaphor to a multi-user editor.
     988If Alice selects a range of text at the bottom, while Bob is rewriting a paragraph at the top, Alice's selection holds onto the characters initially selected, unaffected by Bob making the document grow/shrink even though Alice's start index in the document is changing.
     989This action models assigning to an aliased substring when the two strings do not overlap.
     990
     991Logically, Alice's and Bob's actions on the whole document are like two single-user-edit cases, giving the semantics of the individual edits flowing into a whole.
     992But, there is no need to have two separate document strings.
     993Even if a third selection removes all the text, both Alice's and Bob's strings remain.
     994The independence of their selections assumes that the editor API does not allow the selection to be enlarged, \ie adding text from the containing environment, which may have disappeared.
     995
     996This leaves the ``Venn-diagram overlap'' cases, where Alice's and Bob's selections overlap at the top, bottom, or corner.
     997In this case, the selection areas are dependent, and so, changes in content and size in one may have an affect in the other.
     998There are multiple possible semantics for this case.
     999The remainder of this section shows the chosen semantics for all of the cases.
     1000
     1001String sharing is expressed using the @`share@ marker to indicate aliasing (mutations shared) \vs snapshot (not quite an immutable result, but one with subsequent mutations isolated).
    9891002This aliasing relationship is a sticky property established at initialization.
    9901003For example, here strings @s1@ and @s1a@ are in an aliasing relationship, while @s2@ is in a copy relationship.
    9911004\input{sharing1.tex}
    9921005Here, the aliasing (@`share@) causes partial changes (subscripting) to flow in both directions.
    993 (In the following examples, watch how @s1@ and @s1a@ change together, and @s2@ is independent.)
     1006(In the following examples, note how @s1@ and @s1a@ change together, and @s2@ is independent.)
    9941007\input{sharing2.tex}
    9951008Similarly for complete changes.
     
    9991012\input{sharing4.tex}
    10001013
    1001 Now, consider string @s1_mid@ being an alias in the middle of @s1@, along with @s2@, made by a simple copy from the middle of @s1@.
     1014Now, consider string @s1_mid@ being an alias in the middle of @s1@, along with @s2@, made by a copy from the middle of @s1@.
    10021015\input{sharing5.tex}
    10031016Again, @`share@ passes changes in both directions; copy does not.
     
    10201033When @s1_bgn@'s size increases by 3, @s1_mid@'s starting location moves from 1 to 4 and @s1_end@'s from 3 to 6,
    10211034
    1022 When changes happens on an aliasing substring that overlap.
     1035When changes happen on an aliasing substring that overlap.
    10231036\input{sharing10.tex}
    1024 Strings @s1_crs@ and @s1_mid@ overlap at character 4, @j@ because the substrings are 3,2 and 4,2.
     1037Strings @s1_crs@ and @s1_mid@ overlap at character 4, @j@, because the substrings are 3,2 and 4,2.
    10251038When @s1_crs@'s size increases by 1, @s1_mid@'s starting location moves from 4 to 5, but the overlapping character remains, changing to @'+'@.
    10261039
     
    10791092
    10801093
    1081 
    1082 \section{Storage management}
     1094\section{Storage Management}
    10831095
    10841096This section discusses issues related to storage management of strings.
     
    10991111const string s1 = "abc";
    11001112\end{cfa}
    1101 the @const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage.
    1102 Hence, @s1@ is not pointing at an immutable constant, meaning its underlying string can be mutable, unless some other designation is specified, such as Java's global immutable rule.
    1103 
    1104 
    1105 
    1106 \subsection{General implementation}
     1113@const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage.
     1114Hence, @s1@ is not pointing at an immutable constant and its underlying string is mutable, unless some other designation is specified, such as Java's global immutable rule.
     1115
     1116
     1117\subsection{General Implementation}
    11071118\label{string-general-impl}
    11081119
     
    11131124A string is a smart pointer into this buffer.
    11141125
    1115 This cycle of frequent cheap allocations, interspersed with infrequent expensive compactions, has obvious similarities to a general-purpose memory manager based on garbage collection (GC).
     1126This cycle of frequent cheap allocations, interspersed with infrequent expensive compactions, has obvious similarities to a general-purpose memory-manager based on garbage collection (GC).
    11161127A few differences are noteworthy.
    11171128First, in a general purpose manager, the allocated objects may contain pointers to other objects, making the transitive reachability of these objects a crucial property.
    11181129Here, the allocations are text, so one allocation never keeps another alive.
    11191130Second, in a general purpose manager, the handle that keeps an allocation alive is a bare pointer.
    1120 For strings, a fatter representation is acceptable because this pseudo-pointer is only used for enty into the string-heap, not for general data-sub-structure linking around the general heap.
     1131For strings, a fatter representation is acceptable because this pseudo-pointer is only used for entry into the string-heap, not for general data-substructure linking around the general heap.
    11211132
    11221133\begin{figure}
     
    11281139\VRef[Figure]{f:memmgr-basic} shows the representation.
    11291140The heap header and text buffer define a sharing context.
    1130 Normally, one global sharing context is appropriate for an entire program;
    1131 concurrent exceptions are discussed in \VRef{s:ControllingImplicitSharing}.
    1132 A string is a handle into the buffer and node within a linked list.
     1141Normally, one global context is appropriate for an entire program;
     1142concurrency is discussed in \VRef{s:ControllingImplicitSharing}.
     1143A string is a handle to a node in a linked list containing a information about a string text in the buffer.
    11331144The list is doubly linked for $O(1)$ insertion and removal at any location.
    11341145Strings are ordered in the list by text start address.
    1135 The header maintains a next-allocation pointer, @alloc@, pointing to the last live allocation in the buffer.
     1146The heap header maintains a next-allocation pointer, @alloc@, pointing to the last live allocation in the buffer.
    11361147No external references point into the buffer and the management procedure relocates the text allocations as needed.
    11371148A string handle references a containing string, while its string is contiguous and not null terminated.
     
    11391150String handles can be allocated in the stack or heap, and represent the string variables in a program.
    11401151Normal C life-time rules apply to guarantee correctness of the string linked-list.
    1141 The text buffer is large enough with good management so that often only one dynamic allocation is necessary during program execution.
     1152The text buffer is large enough with good management so that often only one dynamic allocation is necessary during program execution, but not so large as to cause program bloat.
    11421153% During this period, strings can vary in size dynamically.
    11431154
    11441155When the text buffer fills, \ie the next new string allocation causes @alloc@ to point beyond the end of the buffer, the strings are compacted.
    11451156The linked handles define all live strings in the buffer, which indirectly defines the allocated and free space in the buffer.
    1146 Since the string handles are in sorted order, the handle list can be traversed, copying the first live text to the start of the buffer, and subsequent strings after each other.
    1147 If, upon compaction, the amount of free storage would still be less than the new string allocation, a larger text buffer is heap-allocated, the current buffer is copied into the new buffer, and the original buffer is freed.
     1157The string handles are maintained in sorted order, so the handle list can be traversed, copying the first live text to the start of the buffer, and subsequent strings after each other.
     1158After compaction, if free storage is still be less than the new string allocation, a larger text buffer is heap-allocated, the current buffer is copied into the new buffer, and the original buffer is freed.
    11481159Note, the list of string handles is structurally unaffected during a compaction;
    11491160only the text pointers in the handles are modified to new buffer locations.
     
    11571168Both string initialization styles preserve the string module's internal invariant that the linked-list order matches the buffer order.
    11581169For string destruction, handles are removed from the list.
    1159 As a result, once a last handle using a run of buffer characters is destroyed, that buffer space gets excluded from the next compaction, making its character-count available in the compacted buffer.
    1160 
    1161 Certain string operations can result in a substring of another string.
    1162 The resulting handle is then placed in the correct sorted position in the list, possible with a short linear search to locate the position.
     1170Once the last handle using a run of buffer characters is destroyed, that buffer space is excluded from use until the next compaction.
     1171
     1172Certain string operations result in a substring of another string.
     1173The resulting handle is then placed in the correct sorted position in the list, possible requiring a short linear search to locate the position.
    11631174For string operations resulting in a new string, that string is allocated at the end of the buffer.
    11641175For shared-edit strings, handles that originally referenced containing locations need to see the new value at the new buffer location.
     
    11751186
    11761187
    1177 \subsection{RAII limitations}
     1188\subsection{RAII Limitations}
    11781189\label{string-raii-limit}
    11791190
    11801191Earlier work on \CFA~\cite[ch.~2]{Schluntz17} implemented object constructors and destructors for all types (basic and user defined).
    1181 A constructor is a user-defined function run implicitly \emph{after} an object's storage is allocated, and a destructor is a user-defined function run \emph{before} an object's storage is deallcated.
     1192A constructor is a user-defined function run implicitly \emph{after} an object's storage is allocated, and a destructor is a user-defined function run \emph{before} an object's storage is deallocated.
    11821193This feature, called Resource Acquisition Is Initialization (RAII)~\cite[p.~389]{Stroustrup94}, helps guarantee invariants for users before accessing an object and for the programming environment after an object terminates.
    11831194
     
    12131224\end{cfa}
    12141225A module providing the @T@ type can traverse @all_T@ at relevant times, to keep the objects ``good.''
    1215 Hence, declaring a @T@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' in the future.
     1226Hence, declaring a @T@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' during its lifetime.
    12161227Again, both \CFA and \CC support this usage style.
    12171228
    12181229A third capability concerns \emph{implicitly} requested copies.
    12191230When stack-allocated objects are used as parameter and return values, a sender's version exists in one stack frame and a receiver's version exists in another.
    1220 In the parameter direction, the language's function-call handling must arrange for a copy-constructor call to happen\footnote{
    1221         \CC also offers move constructors and return-value optimization~\cite{RVO20}.
    1222         These features help reduce unhelpful copy-constructor calls, which, for types like the example \lstinline{S}, would lead to extra memory allocations.
    1223         \CFA does not currently have these features; adding similarly-intended features to \CFA is desirable.
    1224         However, this section is about a problem in the realization of features that \CFA already supports.
    1225         To understand the problem presented, the appropriate comparison is with classic versions of \CC that treated such copy-constructor calls as necessary.}
    1226 at a time near the control transfer into the callee, with the source as the caller's (sender's) version and the target as the callee's (receiver's) version.
    1227 (In the return direction, the roles are reversed and the copy-constructor call happens near the return of control.)
    1228 \CC supports this capability without qualification.
    1229 \CFA offers limited support here; simple examples work, but implicit copying does not combine successfully with the other RAII capabilities discussed.
     1231In the parameter direction, the language's function-call handling must arrange for a copy-constructor call to happen, at a time near the control transfer into the callee. %, with the source as the caller's (sender's) version and the target as the callee's (receiver's) version.
     1232In the return direction, the roles are reversed and the copy-constructor call happens near the return of control.
     1233\CC supports this capability.% without qualification.
     1234\CFA offers limited support;
     1235simple examples work, but implicit copying does not combine successfully with the other RAII capabilities discussed.
     1236
     1237\CC also offers move constructors and return-value optimization~\cite{RVO20}.
     1238These features help reduce unhelpful copy-constructor calls, which, for types like the @S@ example, would lead to extra memory allocations.
     1239\CFA does not currently have these features; adding similarly-intended features to \CFA is desirable.
     1240However, this section is about a problem in the realization of features that \CFA already supports.
     1241Hence, the comparison continues with the classic version of \CC that treated such copy-constructor calls as necessary.
    12301242
    12311243To summarize the unsupported combinations, the relevant features are:
     
    12431255At that time, adhering to a principal of minimal intervention, this code could always be treated as passthrough:
    12441256\begin{cfa}
    1245 struct U {...};
     1257struct U { ... };
    12461258// RAII to go here
    12471259void f( U u ) { F_BODY(u) }
     
    12491261f( x );
    12501262\end{cfa}
    1251 But adding custom RAII (at ``...here'') changes things.
    1252 The common C++ lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present CFA lowering.
    1253 
    1254 \noindent
    1255 \begin{tabular}{l|l}
    1256 \begin{cfa}
    1257 // C++, likely CFA to be
     1263But adding custom RAII (at ``...go here'') changes things.
     1264The common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present \CFA lowering.
     1265\begin{cquote}
     1266\setlength{\tabcolsep}{15pt}
     1267\begin{tabular}{@{}l|l@{}}
     1268\begin{cfa}
     1269$\C[0.0in]{// \CC, \CFA future}\CRT$
    12581270struct U {...};
    12591271// RAII elided
    12601272void f( U * __u_orig ) {
    12611273        U u = * __u_orig;  // call copy ctor
    1262         F_BODY(u)
     1274        F_BODY( u );
    12631275        // call dtor, u
    12641276}
    12651277U x; // call default ctor
    1266 f( & x ) ;
     1278
     1279
     1280f( &x ) ;
     1281
     1282
    12671283// call dtor, x
    12681284\end{cfa}
    12691285&
    12701286\begin{cfa}
    1271 // CFA today
     1287$\C[0.0in]{// \CFA today}\CRT$
    12721288struct U {...};
    12731289// RAII elided
    12741290void f( U u ) {
    1275         F_BODY(u)
     1291
     1292        F_BODY( u );
     1293
    12761294}
    12771295U x; // call default ctor
     
    12841302\end{cfa}
    12851303\end{tabular}
    1286 
    1287 In the CFA-today scheme, the lowered form is still using a by-value C call.
    1288 C does a @memcpy@ on structs passed by value.
    1289 And so, @F_BDY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions.
    1290 If @U@ is trying to have a style-\#2 invariant, it shows up broken in @F_BDY@: references that are supposed to be to @u@ are actually to the different location @__u_for_f@.
    1291 The \CC scheme does not have this problem because it constructs the for-@f@ copy in the correct location.
    1292 
    1293 Yet, the \CFA-today scheme is sufficient to deliver style-\#1 invariants (in this style-\#3 use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times.  So, reference-counting or simple ownership applications get their invariants respected under call/return-by-value.
     1304\end{cquote}
     1305The current \CFA scheme is still using a by-value C call.
     1306C does a @memcpy@ on structures passed by value.
     1307And so, @F_BODY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions.
     1308If @U@ is trying to have a style-\#2 invariant, it shows up broken in @F_BODY@: references supposedly to @u@ are actually to @__u_for_f@.
     1309The \CC scheme does not have this problem because it constructs the for @f@ copy in the correct location within @f@.
     1310
     1311Yet, the current \CFA scheme is sufficient to deliver style-\#1 invariants (in this style-\#3 use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times.
     1312So, reference-counting or simple ownership applications get their invariants respected under call/return-by-value.
    12941313
    12951314% [Mike is not currently seeing how distinguishing initialization from assignment is relevant]
     
    13251344% The following discusses the consequences of this semantics with respect to lifetime management of \CFA strings.
    13261345
    1327 
    1328 The string API offers style \#3's pass-by-value in, for example, in the return of @"a" + "b"@.
     1346The string API offers style \#3's pass-by-value in, \eg in the return of @"a" + "b"@.
    13291347Its implementation uses the style-\#2 invariant of the string handles being linked to each other, helping to achieve high performance.
    1330 Since these two RAII styles cannont coexist, a workaround splits the API into two layers: one that provides pass-by-value, built upon the other with inter-linked handles.
     1348Since these two RAII styles cannot coexist, a workaround splits the API into two layers: one that provides pass-by-value, built upon the other with inter-linked handles.
    13311349The layer with pass-by-value incurs a performance penalty, while the layer without delivers the desired runtime performance.
    1332 The slower, friendlier High Level API (HL, type @string@) wrapps the faster, more primitive Low Level API (LL, type @string_res@, abbreviating ``resource'').
     1350The slower, friendlier High Level API (HL, type @string@) wraps the faster, more primitive Low Level API (LL, type @string_res@, abbreviating ``resource'').
    13331351Both APIs present the same features, up to return-by-value operations being unavailable in LL and implemented via the workaround in HL.
    13341352The intention is for most future code to target HL.
    1335 When the RAII issue is fixed, the full HL feature set will be acheivable using the LL-style lifetime management.
    1336 So then, there will be no need for two API levels; HL will be removed; LL's type will be renamed to @string@; programs written for current HL will run faster.
     1353When the RAII issue is fixed, the full HL feature set will be achievable using the LL-style lifetime management.
     1354Then, HL will be removed;
     1355LL's type will be renamed @string@ and programs written for current HL will run faster.
    13371356In the meantime, performance-critical sections of applications must use LL.
    13381357Subsequent performance experiments \see{\VRef{s:PerformanceAssessment}} use the LL API when comparing \CFA to other languages.
    13391358This measurement gives a fair estimate of the goal state for \CFA.
    13401359A separate measure of the HL overhead is also included.
    1341 
    1342 \VRef[Section]{string-general-impl} described the goal state for \CFA.  In present state, the type @string_res@ replaces its mention of @string@ as inter-linked handle.
    1343 
    1344 To use LL, a programmer rewrites invocations that used pass-by-value APIs into invocations where the resourcing is more explicit.
    1345 Many invocations are unaffected, notably including assignment and comparison.
    1346 Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases have revisions.
    1347 
    1348 \noindent
     1360hence, \VRef[Section]{string-general-impl} us describing the goal state for \CFA.
     1361In present state, the type @string_res@ replaces its mention of @string@ as inter-linked handle.
     1362
     1363To use LL, a programmer rewrites invocations using pass-by-value APIs into invocations where resourcing is more explicit.
     1364Many invocations are unaffected, notably assignment and comparison.
     1365Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases need revisions.
     1366\begin{cquote}
     1367\setlength{\tabcolsep}{15pt}
    13491368\begin{tabular}{ll}
    13501369HL & LL \\
    13511370\hline
    13521371\begin{cfa}
     1372
    13531373string s = "a" + "b";
    13541374\end{cfa}
     
    13631383string s = "abcde";
    13641384string s2 = s(2, 3); // s2 == "cde"
     1385
    13651386s(2,3) = "x"; // s == "abx" && s2 == "cde"
    13661387\end{cfa}
     
    13761397\begin{cfa}
    13771398string s = "abcde";
     1399
    13781400s[2] = "xxx";  // s == "abxxxde"
    13791401\end{cfa}
     
    13851407\end{cfa}
    13861408\end{tabular}
    1387 
     1409\end{cquote}
    13881410The actual HL workaround is having @string@ wrap a pointer to a uniquely owned, heap-allocated @string_res@.  This arrangement has @string@ being style-\#1 RAII, which is compatible with pass-by-value.
    13891411
    13901412
    1391 
    1392 \subsection{Sharing implementation}
     1413\subsection{Sharing Implementation}
    13931414\label{sharing-impl}
    13941415
    1395 The \CFA string module has two mechanisms to handle the case when string handles share a run of text.
    1396 
     1416The \CFA string module has two mechanisms to deal with string handles sharing text.
    13971417In the first type of sharing, the user requests that both string handles be views of the same logical, modifiable string.
    13981418This state is typically produced by the substring operation.
     
    14041424$\texttt{\small axcde xc}$
    14051425\end{cfa}
    1406 In a typical substring call, the source string-handle is referencing an entire string, and the resulting, newly made, string handle is referencing a portion of the original.
    1407 In this state, a subsequent modification made by either is visible in both.
     1426Here, the source string-handle is referencing an entire string, and the resulting, newly made, string handle is referencing a contained portion of the original.
     1427In this state, a modification made in the overlapping area is visible in both strings.
    14081428
    14091429The second type of sharing happens when the system implicitly delays the physical execution of a logical \emph{copy} operation, as part of its copy-on-write optimization.
     
    14181438In this state, a subsequent modification done on one handle triggers the deferred copy action, leaving the handles referencing different text within the buffer, holding distinct values.
    14191439
    1420 A further abstraction, in the string module's implementation, helps distinguish the two senses of sharing.
     1440A further abstraction helps distinguish the two senses of sharing.
    14211441A share-edit set (SES) is an equivalence class over string handles, being the reflexive, symmetric and transitive closure of the relationship of one string being constructed from another, with the ``share'' option given.
    14221442The SES is represented by a second linked list among the handles.
     
    14271447
    14281448
    1429 \subsection{Controlling implicit sharing}
     1449\subsection{Controlling Implicit Sharing}
    14301450\label{s:ControllingImplicitSharing}
    14311451
     
    14341454In detail, string sharing has inter-linked string handles, so managing one string is also managing the neighbouring strings, and from there, a data structure of the ``set of all strings.''
    14351455Therefore, it is useful to toggle this capability on or off when it is not providing any application benefit.
    1436 
    1437 \begin{figure}
    1438     \begin{tabular}{ll}
    1439         \lstinputlisting[language=CFA, firstline=10, lastline=55]{sharectx.run.cfa}
    1440         &
    1441         \raisebox{-0.17\totalheight}{\includegraphics{string-sharectx.pdf}} % lower
    1442     \end{tabular}
    1443         \caption{Controlling copying vs sharing of strings using \lstinline{string_sharectx}.}
    1444         \label{fig:string-sharectx}
    1445 \end{figure}
    14461456
    14471457The \CFA string library provides the type @string_sharectx@ to control an ambient sharing context.
     
    14561466Executing the example does not produce an interesting outcome, but the comments in the picture indicate when the logical copy operation runs with
    14571467\begin{description}
    1458     \item[share:] the copy being deferred, as described through the rest of this section (fast), or
    1459     \item[copy:] the copy performed eagerly (slow).
     1468        \item[share:] the copy being deferred, as described through the rest of this section (fast), or
     1469        \item[copy:] the copy performed eagerly (slow).
    14601470\end{description}
    14611471Only eager copies can cross @string_sharectx@ boundaries.
    14621472The intended use is with stack-managed lifetimes, in which the established context lasts until the current function returns, and affects all functions called that do not create their own contexts.
    14631473
    1464 [ TODO: true up with ``is thread local'' (implement that and expand this discussion to give a concurrent example, or adjust this wording) ]
    1465 
    1466 
    1467 \subsection{Sharing and threading}
     1474\begin{figure}
     1475        \begin{tabular}{ll}
     1476                \lstinputlisting[language=CFA, firstline=10, lastline=55]{sharectx.run.cfa}
     1477                &
     1478                \raisebox{-0.17\totalheight}{\includegraphics{string-sharectx.pdf}} % lower
     1479        \end{tabular}
     1480        \caption{Controlling copying vs sharing of strings using \lstinline{string_sharectx}.}
     1481        \label{fig:string-sharectx}
     1482\end{figure}
     1483
     1484
     1485\subsection{Sharing and Threading}
    14681486
    14691487The \CFA string library provides no thread safety, the same as \CC string, providing similar performance goals.
     
    14741492
    14751493
    1476 \subsection{Future work}
    1477 
    1478 Implementing the small-string optimization is straightforward, as a string header contains a pointer to the string text in the buffer.
    1479 This pointer could be marked with a flag and contain a small string.
    1480 However, there is now a conditional check required on the fast-path to switch between small and large string operations.
    1481 
    1482 It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as 8-bit characters.
    1483 Again, locations for identification flags must be found and checked along the fast path to select the correct actions.
    1484 Handling utf8 (variable length), is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters.
    1485 Trying to use a secondary array of fixed-sized pointers/offsets to the characters is possible, but raises the question of storage management for the utf8 characters themselves.
    1486 
    1487 
    1488 \section{Performance assessment}
    1489 \label{s:PerformanceAssessment}
    1490 
    1491 I assessed the \CFA string library's speed and memory usage against strings in \CC STL.
    1492 
    1493 Overall, this analysis shows that adding support for the features shown earlier in the chapter comes at no substantial cost in the performance of featrues common to both APIs.
    1494 
    1495 Moreover, the results support the \CFA string's position as a high-level enabler of simplified text processing.
    1496 STL makes its user think about memory management.
    1497 When the user does, and is successful, STL's performance can be very good.
    1498 But when the user fails to think through the consequences of the STL representation, performance becomes poor.
    1499 The \CFA string lets the user work at the level of just putting the right text into right variables, with corresponding performance degradations reduced or eliminated.
    1500 
    1501 % The final test shows the overall win of the \CFA text-sharing mechanism.
    1502 % It exercises several operations together, showing \CFA enabling clean user code to achieve performance that STL requires less-clean user code to achieve.
    1503 
    1504 
    1505 \subsection{Methodology}
    1506 
    1507 These tests use a \emph{corpus} of strings.
    1508 Their lengths are important; the specific characters occurring in them are immaterial.
    1509 In a result graph, a corpus's mean string length is often the independent variable shown on the X axis.
    1510 
    1511 When a corpus contains strings of different lenghths, the lengths are drawn from a geometric distribution.
    1512 Therefore, strings much longer than the mean occur nontrivially and strings slightly shorter than the mean occur most often.
    1513 A corpus's string sizes are one of:
    1514 \begin{description}
    1515         \item [Fixed-size] all string lengths are of the stated size.
    1516         \item [Varying 1 and up] the string lengths are drawn from the geometric distribution with a stated mean and all lengths occur.
    1517         \item [Varying 16 and up] string lengths are drawn from the geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16.  \PAB{Is this one unused?  May have just been for ``normalize.''}
    1518 \end{description}
    1519 The special treatment of length 16 deals with the short-string optimization (SSO) in STL @string@, currently not implemented in \CFA, though a fine future improvement to \CFA.
    1520 In the general case, an STL string handle is a pointer (to separately allocated text) and a length.
    1521 But when the text is shorter than this representation, the optimization repurposes the handle's storage to eliminate using the heap.
     1494\subsection{Short-String Optimization}
     1495
     1496\CC implements a short-string ($\le$16) optimization (SSO).
     1497As a string header contains a pointer to the string text, this pointer can be tagged and used to contain a short string, removing a dynamic memory allocation/deallocation.
    15221498\begin{c++}
    15231499class string {
     
    15291505                char sstr[sizeof(lstr)]; $\C{// short string <16 characters, text in situ}$
    15301506        };
    1531         $\C{// tagging for kind (short or long) elided}$
     1507        // some tagging for short or long strings
    15321508};
    15331509\end{c++}
    1534 
     1510However, there is now a conditional check required on the fast-path to switch between short and long string operations.
     1511
     1512It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as 8-bit characters.
     1513Again, locations for identification flags must be found and checked along the fast path to select the correct actions.
     1514Handling utf8 (variable length), is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters.
     1515Trying to use a secondary array of fixed-sized pointers/offsets to the characters is possible, but raises the question of storage management for the utf8 characters themselves.
     1516
     1517
     1518\section{Performance Assessment}
     1519\label{s:PerformanceAssessment}
     1520
     1521I assessed the \CFA string library's speed and memory usage against strings in \CC STL.
     1522Overall, this analysis shows that adding support for the features shown earlier in the chapter comes at no substantial cost in the performance of features common to both APIs.
     1523
     1524Moreover, the results support the \CFA string's position as a high-level enabler of simplified text processing.
     1525STL makes its user think about memory management.
     1526When the user does, and is successful, STL's performance can be very good.
     1527But when the user fails to think through the consequences of the STL representation, performance becomes poor.
     1528The \CFA string lets the user work at the level of just putting the right text into the right variables, with corresponding performance degradations reduced or eliminated.
     1529
     1530% The final test shows the overall win of the \CFA text-sharing mechanism.
     1531% It exercises several operations together, showing \CFA enabling clean user code to achieve performance that STL requires less-clean user code to achieve.
     1532
     1533
     1534\subsection{Methodology}
     1535
     1536These tests use a \emph{corpus} of strings.
     1537Their lengths are important; the specific characters occurring in them are immaterial.
     1538In a result graph, a corpus's mean string length is often the independent variable shown on the X axis.
     1539
     1540When a corpus contains strings of different lengths, the lengths are drawn from a geometric distribution.
     1541Therefore, strings much longer than the mean occur less often and strings slightly shorter than the mean occur most often.
     1542A corpus's string sizes are one of:
     1543\begin{description}
     1544        \item [Fixed-size] all string lengths are of the stated size.
     1545        \item [Varying 1 and up] the string lengths are drawn from the geometric distribution with a stated mean and all lengths occur.
     1546        \item [Varying 16 and up] string lengths are drawn from the geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16.
     1547\end{description}
     1548The special treatment of length 16 deals with the SSO in STL @string@, currently not implemented in \CFA.
    15351549A fixed-size or from-16 distribution ensures that \CC's extra-optimized cases are isolated within, or removed from, the comparison.
    1536 
    15371550In all experiments that use a corpus, its text is generated and loaded into the system under test before the timed phase begins.
    1538 
    15391551To ensure comparable results, a common memory allocator is used for \CFA and \CC.
    1540 \CFA runs the llheap allocator~\cite{Zulfiqar22}; the test rig plugs this same allocator into \CC.
     1552\CFA runs the llheap allocator~\cite{Zulfiqar22}, which is also plugged into \CC.
    15411553
    15421554The operations being measured take dozens of nanoseconds, so a succession of many invocations is run and timed as a group.
    1543 The experiments run with fixed duration (targeting approximately 5 seconds), stopping upon passing a goal time, as determined by re-checking @clock()@ every 10,000 invocations, which is never more often than once per 80 ms.
    1544 Timing outcomes reprt mean nanoseconds per invocation, which includes harness overhead and the targeted string API execution.
     1555The experiments run for a fixed duration (5 seconds), as determined by re-checking @clock()@ every 10,000 invocations, which is never more often than once per 80 ms.
     1556Timing outcomes report mean nanoseconds per invocation, which includes harness overhead and the targeted string API execution.
    15451557
    15461558\PAB{To discuss: hardware and such}
    15471559
    1548 As discussed in \VRef[Section]{string-raii-limit}, general performance comparisons are made using \CFA's faster, low-level string API, whose string type is named @string_res@.
    1549 
    1550 \VRef{s:ControllingImplicitSharing} presents an operational mode where \CFA string sharing is turned off.  In this mode, the \CFA string operates similarly to \CC's, by using a distinct heap allocation for each string's text.
    1551 Some experiments include measurements in this mode for baselining purposes.
    1552 It is called ``\CC emulation mode'' or ``nosharing'' here.
    1553 
     1560As discussed in \VRef[Section]{string-raii-limit}, general performance comparisons are made using \CFA's faster, low-level string API, named @string_res@.
     1561\VRef{s:ControllingImplicitSharing} presents an operational mode where \CFA string sharing is turned off.
     1562In this mode, the \CFA string operates similarly to \CC's, by using a heap allocation for string text.
     1563Some experiments include measurements in this mode for baselining purposes, called ``\CC emulation mode'' or ``nosharing''.
    15541564
    15551565
    15561566\subsection{Test: Append}
    15571567
    1558 These tests measure the speed of appending strings from the corpus onto a larger, growing string.  They show \CFA performing comparably to \CC overall, though with reduced penalties for simple API misuses for which \CC programmers may not know to watch out.
    1559 
     1568These tests measure the speed of appending strings from the corpus onto a larger, growing string.
     1569They show \CFA performing comparably to \CC overall, though with penalties for simple API misuses.
    15601570The basic harness is:
    1561 \begin{cquote}
    1562 \setlength{\tabcolsep}{20pt}
    1563 \begin{cfa}
    1564 START_TIMER
    1565 for ( ... ) {
    1566         string_res accum;
    1567         for ( i; 100 ) {
    1568                 accum += corpus[ f(i) ]; // importing from char * here
    1569                 COUNT_ONE_OP_DONE
     1571\begin{cfa}
     1572// set alarm duration
     1573for ( ... ) { $\C[1.5in]{// loop for duration}$
     1574        for ( i; N ) { $\C{// perform multiple appends (concatenations)}$
     1575                accum += corpus[ f( i ) ];
    15701576        }
     1577        count += N; $\C{// count number of appends}\CRT$
    15711578}
    1572 STOP_TIMER
    1573 \end{cfa}
    1574 \end{cquote}
    1575 The harness's outer loop executes until a sample-worthy amount of execution has happened.
    1576 The inner loop builds up the desired-length string with successive appends, before the outer makes it start over from a blank accumulator.
    1577 Each harness run targets a specific (mean) corpus string length and produces one data point on the result graph.
    1578 
     1579\end{cfa}
     1580The harness's outer loop executes for the experiment duration.
     1581The string is reset to empty before appending (not shown).
     1582The inner loop builds up a growing-length string with successive appends.
     1583Each run targets a specific (mean) corpus string length and produces one data point on the result graph.
    15791584Three specific comparisons are made with this harness.
    15801585Each picks its own independent-variable basis of comparison.
    1581 
    1582 All three comparisons use the varying-from-1 corpus construction, \ie they allow the STL to show its advantage from small-string optimization.
     1586All three comparisons use the varying-from-1 corpus construction, \ie they allow the STL to show its advantage for SSO.
    15831587
    15841588
    15851589\subsubsection{Fresh vs Reuse in \CC, Emulation Baseline}
    15861590
    1587 The first experiment compares \CFA with \CC, with \CFA operating in nosharing mode (and \CC having no other mode).
    1588 This experiment simply baselines how \CFA modestly lags \CC's optimization/tuning level generally, yet reproduces a coarser phenomenon.
    1589 
    1590 This experiment also introduces the first \CC coding pitfall, which the next experiment will show is helped by turning on \CFA sharing.  By this pitfall, a \CC programmer must pay attention to string variable reuse.
    1591 
    1592 \begin{cquote}
    1593 \setlength{\tabcolsep}{20pt}
     1591The first experiment compares \CFA with \CC, with \CFA operating in nosharing mode and \CC having no other mode, hence both string package are using @malloc@/@free@.
     1592% This experiment establishes a baseline for other experiments.
     1593This experiment also introduces the first \CC coding pitfall, which the next experiment shows is helped by turning on \CFA sharing.
     1594% This pitfall shows, a \CC programmer must pay attention to string variable reuse.
     1595In the following, both programs are doing the same thing: start with @accum@ empty and build it up by appending @N@ strings (type @string@ in \CC and the faster @string_res@ in \CFA).
     1596\begin{cquote}
     1597\setlength{\tabcolsep}{40pt}
    15941598\begin{tabular}{@{}ll@{}}
    15951599% \multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\
     
    15971601
    15981602for ( ... ) {
    1599         @string_res accum;@       // fresh
    1600         for ( ... )
    1601                 accum @+=@ ...
     1603        @string_res accum;@     $\C[1.5in]{// fresh}$
     1604        for ( N )
     1605                accum @+=@ ...  $\C{// append}\CRT$
    16021606}
    16031607\end{cfa}
     
    16061610string_res accum;
    16071611for ( ... ) {
    1608         @accum = "";@  $\C[1in]{// reuse\CRT}$
    1609         for ( ... )
    1610                 accum @+=@ ...
     1612        @accum = "";@  $\C[1.5in]{// reuse}$
     1613        for ( N )
     1614                accum @+=@ ...  $\C{// append}\CRT$
    16111615}
    16121616\end{cfa}
    16131617\end{tabular}
    16141618\end{cquote}
    1615 
    1616 Both programs are doing the same thing: start with @x@ empty and build it up by appending the same chunks.
    1617 A programmer should not have to consider this difference.
    1618 But from under the covers, each string being an individual allocation leaks through.
    1619 While the inner loop is appending text to an @x@ that had not yet grown to have a large capacity, the program is, naturally, paying to extend the variable-length allocation, occasionally.
    1620 This capacity stretching is a sticky property that survives assigning a (short, empty-string) value into an existing initialization.
    1621 So, the ``reuse'' version benefits from not growing the allocation on subsequent runs of the inner loop.
    1622 Yet, the ``fresh'' version is constantly restarting from a small buffer.
     1619The difference is creating a new or reusing an existing string variable.
     1620The pitfall is that most programmers do not consider this difference.
     1621However, creating a new variable implies deallocating the previous string storage and allocating new empty storage.
     1622As the string grows, further deallocations/allocations are required to release the previous and extend the current string storage.
     1623So, the fresh version is constantly restarting with zero string storage, while the reuse version benefits from having its prior large storage from the last append sequence.
    16231624
    16241625\begin{figure}
     
    16261627        \includegraphics{plot-string-peq-cppemu.pdf}
    16271628%       \includegraphics[width=\textwidth]{string-graph-peq-cppemu.png}
    1628         \caption{Fresh vs Reuse in \CC, Emulation Baseline.  Average time per iteration with one \lstinline{x += y} invocation (lower is better).  Comparing \CFA's STL emulation mode with STL implementations, and comparing the ``fresh'' with ``reused'' reset styles.}
     1629        \caption{Fresh vs Reuse in \CC, Emulation Baseline.
     1630        Average time per iteration with one \lstinline{x += y} invocation (lower is better).
     1631        Comparing \CFA's STL emulation mode with STL implementations, and comparing the fresh with reused reset styles.}
    16291632        \label{fig:string-graph-peq-cppemu}
    1630 \end{figure}
    1631 
    1632 \VRef[Figure]{fig:string-graph-peq-cppemu} shows the resulting performance.
    1633 The fresh \vs reuse penalty is the dominant difference.
    1634 The cost is 40\% averaged over the cases shown and minimally 24\%.
    1635 It shows up consistently on both the \CFA and STL implementations, and this cost is more prominent with larger strings.
    1636 
    1637 The lesser \CFA \vs STL difference shows \CFA reproducing STL's performance, up to a 15\% penalty averaged over the cases shown, diminishing with larger strings, and 50\% in the worst case.
    1638 This penalty characterizes implementation fine tuning done with STL and not done yet done with \CFA.
    1639 
    1640 
    1641 \subsubsection{\CFA's Fresh-Reuse Compromise}
    1642 
    1643 This comparison has the same setup as the last one, except that the \CFA implementation is switched to use its sharing mode.  The outcome is that the fresh/reuse difference vanishes in \CFA, with \CFA consistently delivering performance that compromises between the two \CC cases.
    1644 
    1645 \begin{figure}
    1646 \centering
    1647         \includegraphics{string-graph-peq-sharing.pdf}
     1633        \bigskip
     1634        \bigskip
     1635        \includegraphics{plot-string-peq-sharing.pdf}
    16481636%       \includegraphics[width=\textwidth]{string-graph-peq-sharing.png}
    1649         \caption{\CFA Compromise for Fresh \vs Reuse.  Average time per iteration with one \lstinline{x += y} invocation (lower is better).  Comparing \CFA's sharing mode with STL, and comparing the ``fresh'' with ``reused'' reset styles.  The \CC results are repeated from \ref{fig:string-graph-peq-cppemu}.}
     1637        \caption{\CFA Compromise for Fresh \vs Reuse.
     1638        Average time per iteration with one \lstinline{x += y} invocation (lower is better).
     1639        Comparing \CFA's sharing mode with STL, and comparing the fresh with reused reset styles.
     1640        The \CC results are repeated from \VRef[Figure]{fig:string-graph-peq-cppemu}.}
    16501641        \label{fig:string-graph-peq-sharing}
    16511642\end{figure}
    16521643
    1653 \VRef[Figure]{fig:string-graph-peq-sharing} has the result.
    1654 At append lengths 5 and above, \CFA not only splits the two STL cases, but its slowdown of 16\% over STL with user-managed reuse is close to the baseline \CFA-v-STL implementation difference seen with \CFA in STL-emulation mode.
    1655 
    1656 
    1657 \subsubsection{\CFA's low overhead for misusing \lstinline{+}}
    1658 
    1659 A further pitfall occurs when the user writes @x = x + y@, rather than @x += y@.  Again, they are logically equivalent.
     1644\VRef[Figure]{fig:string-graph-peq-cppemu} shows the resulting performance.
     1645The two fresh (solid) lines and the two reuse (dash) lines are identical, except for lengths $\le$10, where the \CC SSO has a 40\% average and minimally 24\% advantage.
     1646The gap between the fresh and reuse lines is the removal of the dynamic memory allocates and reuse of prior storage, \eg 100M allocations for fresh \vs 100 allocations for reuse across all experiments.
     1647While allocation reduction is huge, data copying dominates the cost, so the lines are still reasonably close together.
     1648
     1649
     1650\subsubsection{\CFA's Sharing Mode}
     1651
     1652This comparison is the same as the last one, except the \CFA implementation is using sharing mode.
     1653Hence, both \CFA's fresh and reuse versions have no memory allocations, and as before, only for reuse does \CC have no memory allocations.
     1654\VRef[Figure]{fig:string-graph-peq-sharing} shows the resulting performance.
     1655For fresh at append lengths 5 and above, \CFA is now closer to the \CC reuse performance, because of removing the dynamic allocations.
     1656However, for reuse, \CFA has slowed down slightly, to performance matching the new fresh version, as the two versions are now implemented virtually the same.
     1657The reason for the \CFA reuse slow-down is the overhead of managing the sharing scheme (primarily maintaining the list of handles), without gaining any benefit.
     1658
     1659\begin{comment}
     1660FIND A HOME!!!
     1661The potential benefits of the sharing scheme do not give \CFA an edge over \CC when appending onto a reused string, though the first one helps \CFA win at going onto a fresh string.  These abilities are:
     1662\begin{itemize}
     1663\item
     1664To grow a text allocation repeatedly without copying it elsewhere.
     1665This ability is enabled by \CFA's most-recently modified string being located immediately before the text buffer's \emph{shared} bump-pointer area, \ie often a very large greenfield, relative to the \emph{individual} string being grown.
     1666With \CC-reuse, this benefit is already reaped by the user's reuse of a pre-stretched allocation.
     1667Yet \CC-fresh pays the higher cost because its room to grow for free is at most a constant times the original string's length.
     1668\item
     1669To share an individual text allocation across multiple related strings.
     1670This ability is not applicable to appending with @+=@.
     1671It in play in [xref sub-experiment pta] and [xref experiment pbv].
     1672\item
     1673To share a text arena across unrelated strings, sourcing disparate allocations from a common place.
     1674That is, always allocating from a bump pointer, and never maintaining free lists.
     1675This ability is not relevant to running any append scenario on \CFA with sharing, because appending modifies an existing allocation and is not driving several allocations.
     1676This ability is assessed in [xref experiment allocn].
     1677\end{itemize}
     1678This cost, of slowing down append-with-reuse, is \CFA paying the piper for other scenarios done well.
     1679\CFA prioritizes the fresh use case because it is more natural.
     1680The \emph{user-invoked} reuse scheme is an unnatural programming act because it deliberately misuses lexical scope: a variable (@accum@) gets its lifetime extended beyond the scope in which it is used.
     1681
     1682A \CFA user needing the best performance on an append scenario can still access the \CC-like speed by invoking noshare.
     1683This (indirect) resource management is memory-safe, as compared to that required in \CC to use @string&@, where knowledge of another string's lifetime comes into play.
     1684This abstraction opt-out is also different from invoking the LL API-level option.
     1685In fact, these considerations are orthogonal.
     1686But the key difference is that invoking the LL API would be a temporary measure, to use a workaround of a known \CFA language issue; choosing to exempt a string from sharing is a permanent act of program tuning.
     1687Beyond these comparisons, opting for noshare actually provides program ``eye candy,'' indicating that under-the-hood thinking is becoming relevant here.
     1688\end{comment}
     1689
     1690
     1691\subsubsection{Misusing Concatenation}
     1692
     1693A further pitfall occurs writing the apparently equivalent @x = x + y@ \vs @x += y@.
    16601694For numeric types, the generated code is equivalent, giving identical performance.
    16611695However, for string types there can be a significant difference.
    1662 This pitfall is a particularly likely hazard for beginners.
     1696This pitfall is a particularly likely for beginners.
    16631697
    16641698In earlier experiments, the choice of \CFA API among HL and LL had no impact on the functionality being tested.
     
    17081742\end{tabular}
    17091743\end{cquote}
    1710 Note that this ``Goal'' code functions today in HL.
     1744Note, the goal code functions today in HL but with slower performance.
    17111745
    17121746\begin{figure}
    17131747\centering
    1714         \includegraphics{string-graph-pta-sharing.pdf}
     1748        \includegraphics{plot-string-pta-sharing.pdf}
    17151749%       \includegraphics[width=\textwidth]{string-graph-pta-sharing.png}
    1716         \caption{CFA's low overhead for misusing \lstinline{+}.  Average time per iteration with one \lstinline{x += y} invocation (lower is better). Comparing \CFA (having implicit sharing activated) with STL, and comparing the \lstinline{+}-then-\lstinline{=} with the \lstinline{+=} append styles.  The \lstinline{+=} results are repeated from \VRef[Figure]{fig:string-graph-peq-sharing}.}
     1750        \caption{CFA's low overhead for misusing concatenation.  Average time per iteration with one \lstinline{x += y} invocation (lower is better). Comparing \CFA (having implicit sharing activated) with STL, and comparing the \lstinline{+}-then-\lstinline{=} with the \lstinline{+=} append styles.  The \lstinline{+=} results are repeated from \VRef[Figure]{fig:string-graph-peq-sharing}.}
    17171751        \label{fig:string-graph-pta-sharing}
    17181752\end{figure}
    17191753
    1720 \VRef[Figure]{fig:string-graph-pta-sharing} gives the outcome.  The STL's penalty is $8 \times$ while \CFA's is only $2 \times$, averaged across the cases shown here.
     1754\VRef[Figure]{fig:string-graph-pta-sharing} gives the outcome, where the Y-axis is log scale because of the large differences.
     1755The STL's penalty is $8 \times$ while \CFA's is only $2 \times$, averaged across the cases shown here.
    17211756Moreover, the STL's gap increases with string size, while \CFA's converges.
    17221757So again, \CFA helps users who just want to treat strings as values, and not think about the resource management under the covers.
    17231758
    1724 While not a design goal, and not graphed out, \CFA in STL-emulation mode heppened to outperform STL in this case.  User-managed allocation reuse did not affect either implementation in this case; only ``fresh'' results are shown.
    1725 
    1726 
    1727 \subsection{Test: Pass argument}
     1759While not a design goal, and not graphed, \CFA in STL-emulation mode outperformed STL in this case.
     1760User-managed allocation reuse did not affect either implementation in this case; only ``fresh'' results are shown.
     1761
     1762
     1763\subsection{Test: Pass Argument}
    17281764
    17291765STL has a penalty for passing a string by value, which forces users to think about memory management when communicating values with a function.
    1730 The key \CFA value-add is that a user can think of a string simply as a value; this test shows that \CC charges a stiff penalty for thining this way, while \CFA does not.
     1766The key \CFA value-add is that a user can think of a string simply as a value; this test shows that \CC charges a stiff penalty for thinking this way, while \CFA does not.
    17311767This test illustrates a main advantage of the \CFA sharing algorithm (in one case).
    1732 It shows STL's small-string optimization providing a successful mitigation (in the other case).
     1768It shows STL's SSO providing a successful mitigation (in the other case).
    17331769
    17341770The basic operation considered is:
     
    17491785
    17501786}
    1751 START_TIMER
    1752 for ( i; ... ) {
    1753         helper( corpus[ f(i) ] ); // imported from char * previously
    1754         COUNT_ONE_OP_DONE
     1787for ( ... ) { // loop for duration
     1788        helper( corpus[ f( i ) ] );
     1789        count += 1;
    17551790}
    1756 STOP_TIMER
    17571791\end{cfa}
    17581792&
     
    17611795        string_res q = { qref, COPY_VALUE };
    17621796}
    1763 // rest same, elided
    1764 
    1765 
    1766 
    1767 
    1768 
    1769 \end{cfa}
    1770 \end{tabular}
    1771 \end{cquote}
    1772 The Goal (HL) version gives the simplest sketch of the test harness.
    1773 It uses a single level of looping.
    1774 Each iteration uses a corpus item as the argument to a function call.
     1797
     1798
     1799
     1800
     1801\end{cfa}
     1802\end{tabular}
     1803\end{cquote}
     1804The goal (HL) version gives the modified test harness, with a single loop.
     1805Each iteration uses a corpus item as the argument to the function call.
    17751806These corpus items were imported to the string heap before beginning the timed run.
    1776 
    17771807
    17781808\begin{figure}
    17791809\centering
    1780         \includegraphics{string-graph-pbv.pdf}
     1810        \includegraphics{plot-string-pbv.pdf}
    17811811%       \includegraphics[width=\textwidth]{string-graph-pbv.png}
     1812        \begin{tabularx}{\linewidth}{>{\centering\arraybackslash}X >{\centering\arraybackslash}X} (a) & (b) \end{tabularx}
    17821813        \caption{Average time per iteration (lower is better) with one call to a function that takes a by-value string argument, comparing \CFA (having implicit sharing activated) with STL.
    1783 (a) With \emph{Varying-from-1} corpus construction, in which the STL-only benefit of small-string optimization occurs, in varying degrees, at all string sizes.
    1784 (b) With \emph{Fixed-size} corpus construction, in which this benefit applies exactly to strings with length below 16.
    1785 [TODO: show version (b)]}
     1814(a) With \emph{Varying-from-1} corpus construction, in which the STL-only benefit of SSO optimization occurs, in varying degrees, at all string sizes.
     1815(b) With \emph{Fixed-size} corpus construction, in which this benefit applies exactly to strings with length below 16.}
    17861816        \label{fig:string-graph-pbv}
    17871817\end{figure}
    17881818
    1789 
    17901819\VRef[Figure]{fig:string-graph-pbv} shows the costs for calling a function that receives a string argument by value.
    1791 STL's performance worsens as string length increases, while \CFA has the same performance at all sizes.
    1792 
     1820STL's performance worsens uniformly as string length increases, while \CFA has the same performance at all sizes.
     1821Although the STL is better than \CFA until string length 10 because of the SSO.
    17931822While improved, the \CFA cost to pass a string is still nontrivial.
    17941823The contributor is adding and removing the callee's string handle from the global list.
    1795 This cost is $1.5 \times$ to $2 \times$ over STL's when small-string optimization applies, though this cost should be avoidable in the same case, upon a \CFA realization of this optimization.
    1796 At the larger sizes, when STL has to manage storage for the string, STL runs more than $3 \times$ slower, mainly due to time spent in the general-purpose memory allocator.
    1797 \PAB{Need to check that.  Expecting copying to dominate.}
     1824This cost is $1.5 \times$ to $2 \times$ over STL's when SSO applies, but is avoidable once \CFA realizes this optimization.
     1825At the larger sizes, the STL runs more than $3 \times$ slower, because it has to allocation/deallocate storage for the parameter and copy the argument string to the parameter.
     1826If the \CC string is passed by reference, the results are better and flat across string lengths like \CFA.
    17981827
    17991828
     
    18051834
    18061835A garbage collector, afforded the freedom of managed memory (where it knows about all the pointers and is allowed to modify them), often runs faster than malloc-free in an amortized analysis, even though it must occasionally stop to collect.
    1807 The sppedup happens because GC is able to use its collection time to move objects.
    1808 (In the case of the mini-allocator powering the \CFA string library, objects are runs of text.)  Moving objects lets fresh allocations consume from a large contiguous store of available memory; the ``bump pointer'' bookkeeping for such a scheme is very light.
     1836The speedup happens because GC is able to use its collection time to move objects.
     1837(In the case of the mini-allocator powering the \CFA string library, objects are runs of text.)
     1838Moving objects lets fresh allocations consume from a large contiguous store of available memory; the ``bump pointer'' bookkeeping for such a scheme is very light.
    18091839A malloc-free implementation without the freedom to move objects must, in the general case, allocate in the spaces between existing objects; doing so entails the heavier bookkeeping of maintaining a linked structure of freed allocations and/or coalescing freed allocations.
    18101840
     
    18621892\begin{figure}
    18631893\centering
    1864   \includegraphics{string-graph-allocn.pdf}
     1894  \includegraphics{plot-string-allocn.pdf}
    18651895% \includegraphics[width=\textwidth]{string-graph-allocn.png}
    18661896  \caption{Space and time performance, under varying fraction-live targets, for the five string lengths shown, at \emph{Fixed-size} corpus construction.
Note: See TracChangeset for help on using the changeset viewer.