Context Navigation

-              r41c3e46
+              rfc8ec54
 % Instead, each \CC string is null terminated just in case it might be needed for this purpose.
 Providing this backwards compatibility with C has a ubiquitous performance and storage cost.
+\section{\CFA \lstinline{string} type}
+\label{s:stringType}
+The \CFA string type is for manipulation of dynamically-sized character-strings versus C @char *@ type for manipulation of statically-sized null-terminated character-strings.
+Hence, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
+As a result, a @string@ declaration does not specify a maximum length, where a C string must.
+The maximum storage for a \CFA @string@ value is @size_t@ characters, which is $2^{32}$ or $2^{64}$ respectively.
+A \CFA string manages its length separately from the string, so there is no null (@'\0'@) terminating value at the end of a string value.
+Hence, a \CFA string cannot be passed to a C string manipulation routine, such as @strcat@.
+Like C strings, the characters in a @string@ are numbered starting from 0.
+The following operations have been defined to manipulate an instance of type @string@.
+The discussion assumes the following declarations and assignment statements are executed.
+\begin{cfa}
+#include @<string.hfa>@
+@string@ s, name, digit, alpha, punctuation, ifstmt;
+int i;
+name  = "MIKE";
+digit  = "0123456789";
+punctuation = "().,";
+ifstmt = "IF (A > B) {";
+\end{cfa}
+Note, the include file @string.hfa@ to access type @string@.
+\subsection{Implicit String Conversions}
+The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O.
+Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@.
+\VRef[Figure]{f:ImplicitConversionsString} shows examples of implicit conversions.
+Conversions can be explicitly specified using a compound literal:
+\begin{cfa}
+s = (string){ "abc" };                          $\C{// converts char * to string}$
+s = (string){ 5 };                                      $\C{// converts int to string}$
+s = (string){ 5.5 };                            $\C{// converts double to string}$
+\end{cfa}
+Conversions from @string@ to @char *@, attempt to be safe:
+either by requiring the maximum length of the @char *@ storage (@strncpy@) or allocating the @char *@ storage for the string characters (ownership), meaning the programmer must free the storage.
+As well, a C string is always null terminates, implying a minimum size of 1 character.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+string s = "abcde";
+char cs[3];
+strncpy( cs, s, sizeof(cs) );
+char * cp = s;
+delete( cp );
+cp = s + ' ' + s;
+delete( cp );
+\end{cfa}
+&
+\begin{cfa}
+"abcde"
+"ab\0", in place
+"abcde\0", malloc
+"abcde abcde\0", malloc
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\begin{figure}
+\begin{tabular}{@{}l|l@{}}
+\setlength{\tabcolsep}{15pt}
+\begin{cfa}
+//      string s = 5;
+        string s;
+        // conversion of char and char * to string
+        s = 'x';
+        s = "abc";
+        char cs[5] = "abc";
+        s = cs;
+        // conversion of integral, floating-point, and complex to string
+        s = 45hh;
+        s = 45h;
+        s = -(ssize_t)MAX - 1;
+        s = (size_t)MAX;
+        s = 5.5;
+        s = 5.5L;
+        s = 5.5+3.4i;
+        s = 5.5L+3.4Li;
+\end{cfa}
+&
+\begin{cfa}
+"x"
+"abc"
+"abc"
+"45"
+"45"
+"-9223372036854775808"
+"18446744073709551615"
+"5.5"
+"5.5"
+"5.5+3.4i"
+"5.5+3.4i"
+\end{cfa}
+\end{tabular}
+\caption{Implicit Conversions to String}
+\label{f:ImplicitConversionsString}
+\end{figure}
+\subsection{Length}
+The @len@ operation returns the length of a string using prefix call.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+const char * cs = "abc";
+i = ""`len;
+i = "abc"`len;
+i = cs`len;
+i = name`len;
+\end{cfa}
+&
+\begin{cfa}
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\subsection{Comparison Operators}
+The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare strings using lexicographical ordering, where longer strings are greater than shorter strings.
+C strings use function @strcmp@, as the relational/equality operators compare C string pointers not their values, which does not match normal programmer expectation.
+\subsection{Concatenation}
+The binary operators @+@ and @+=@ concatenate two strings, creating the sum of the strings.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = name + ' ' + digit;
+s += name;
+s = s + 'a' + 'b';
+s = s + "a" + "abc";
+s = 'a' + 'b' + s;
+s = "a" + "abc" + s;
+\end{cfa}
+&
+\begin{cfa}
+"MIKE 0123456789"
+"MIKE 0123456789MIKE"
+$\CC$ unsupported
+$\CC$ unsupported
+\end{cfa}
+\end{tabular}
+\end{cquote}
+The \CFA type-system allows full  commutativity with character and C strings;
+\CC does not.
+\subsection{Repetition}
+The binary operators @*@ and @*=@ repeat a string $N$ times.
+If $N = 0$, a zero length string, @""@ is returned.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = 'x' * 3;
+s = "abc" * 3;
+s = (name + ' ') * 3;
+\end{cfa}
+&
+\begin{cfa}
+xxx
+abcabcabc
+MIKE MIKE MIKE
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\subsection{Substring}
+The substring operation returns a subset of the string starting at a position in the string and traversing a length.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = name( 2, 2 );
+s = name( 3, -2 );
+s = name( 2, 8 );
+s = name( 0, -1 );
+s = name( -1, -1 );
+s = name( -3 );
+\end{cfa}
+&
+\begin{cfa}
+"KE"
+"IK", length is opposite direction
+"KE", length is clipped to 2
+"", beyond string so clipped to null
+"K", start $and$ length are negative
+"IKE", to end of string
+\end{cfa}
+\end{tabular}
+\end{cquote}
+A negative starting position is a specification from the right end of the string.
+A negative length means that characters are selected in the opposite (right to left) direction from the starting position.
+If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string.
+If the substring request is completely outside of the original string, a null string located at the end of the original string is returned.
+The substring operation can also appear on the left hand side of the assignment operator.
+The substring is replaced by the value on the right hand side of the assignment.
+The length of the right-hand-side value may be shorter, the same length, or longer than the length of the substring that is selected on the left hand side of the assignment.
+\begin{cfa}
+digit( 3, 3 ) = "";                             $\C{// digit is assigned "0156789"}$
+digit( 4, 3 ) = "xyz";                          $\C{// digit is assigned "015xyz9"}$
+digit( 7, 0 ) = "***";                          $\C{// digit is assigned "015xyz***9"}$
+digit(-4, 3 ) = "$\tt\$\$\$$";          $\C{// digit is assigned "015xyz\$\$\$9"}$
+\end{cfa}
+A substring is treated as a pointer into the base (substringed) string rather than creating a copy of the subtext.
+As with all pointers, if the item they are pointing at is changed, then the pointer is referring to the changed item.
+Pointers to the result value of a substring operation are defined to always start at the same location in their base string as long as that starting location exists, independent of changes to themselves or the base string.
+However, if the base string value changes, this may affect the values of one or more of the substrings to that base string.
+If the base string value shortens so that its end is before the starting location of a substring, resulting in the substring starting location disappearing, the substring becomes a null string located at the end of the base string.
+The following example illustrates passing the results of substring operations by reference and by value to a subprogram.
+Notice the side-effects to other reference parameters as one is modified.
+\begin{cfa}
+main() {
+        string x = "xxxxxxxxxxxxx";
+        test( x, x(1,3), x(3,3), x(5,5), x(9,5), x(9,5) );
+}
+// x, a, b, c, & d are substring results passed by reference
+// e is a substring result passed by value
+void test(string &x, string &a, string &b, string &c, string &d, string e) {
+                                                                        $\C{//   x                                a               b               c               d               e}$
+        a( 1, 2 ) = "aaa";                              $\C{// aaaxxxxxxxxxxx   aaax    axx             xxxxx   xxxxx   xxxxx}$
+        b( 2, 12 ) = "bbb";                             $\C{// aaabbbxxxxxxxxx  aaab    abbb    bbxxx   xxxxx   xxxxx}$
+        c( 4, 5 ) = "ccc";                              $\C{// aaabbbxcccxxxxxx aaab    abbb    bbxccc  ccxxx   xxxxx}$
+        c = "yyy";                                              $\C{// aaabyyyxxxxxx    aaab    abyy    yyy             xxxxx   xxxxx}$
+        d( 1, 3 ) = "ddd";                              $\C{// aaabyyyxdddxx    aaab    abyy    yyy             dddxx   xxxxx}$
+        e( 1, 3 ) = "eee";                              $\C{// aaabyyyxdddxx    aaab    abyy    yyy             dddxx   eeexx}$
+        x = e;                                                  $\C{// eeexx                    eeex    exx             x                               eeexx}$
+}
+\end{cfa}
+There is an assignment form of substring in which only the starting position is specified and the length is assumed to be the remainder of the string.
+\begin{cfa}
+string operator () (int start);
+\end{cfa}
+For example:
+\begin{cfa}
+s = name( 2 );                                          $\C{// s is assigned "ETER"}$
+name( 2 ) = "IPER";                             $\C{// name is assigned "PIPER"}$
+\end{cfa}
+It is also possible to substring using a string as the index for selecting the substring portion of the string.
+\begin{cfa}
+string operator () (const string &index);
+\end{cfa}
+For example:
+\begin{cfa}[mathescape=false]
+digit( "xyz$\$\$\$$" ) = "678";         $\C{// digit is assigned "0156789"}$
+digit( "234") = "***";                          $\C{// digit is assigned "0156789***"}$
+\end{cfa}
+\subsection{Searching}
+The @index@ operation
+\begin{cfa}
+int index( const string &key, int start = 1, occurrence occ = first );
+\end{cfa}
+returns the position of the first or last occurrence of the @key@ (depending on the occurrence indicator @occ@ that is either @first@ or @last@) in the current string starting the search at position @start@.
+If the @key@ does not appear in the current string, the length of the current string plus one is returned.
+%If the @key@ has zero length, the value 1 is returned regardless of what the current string contains.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+i = digit.index( "567" );                       $\C{// i is assigned 3}$
+i = digit.index( "567", 7 );            $\C{// i is assigned 11}$
+i = digit.index( "567", -1, last );     $\C{// i is assigned 3}$
+i = name.index( "E", 5, last ); $\C{// i is assigned 4}$
+\end{cfa}
+The next two string operations test a string to see if it is or is not composed completely of a particular class of characters.
+For example, are the characters of a string all alphabetic or all numeric?
+Use of these operations involves a two step operation.
+First, it is necessary to create an instance of type @strmask@ and initialize it to a string containing the characters of the particular character class, as in:
+\begin{cfa}
+strmask digitmask = digit;
+strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" );
+\end{cfa}
+Second, the character mask is used in the functions @include@ and @exclude@ to check a string for compliance of its characters with the characters indicated by the mask.
+The @include@ operation
+\begin{cfa}
+int include( const strmask &, int = 1, occurrence occ = first );
+\end{cfa}
+returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does not appear in the @mask@ starting the search at position @start@;
+hence it skips over characters in the current string that are included (in) the @mask@.
+The characters in the current string do not have to be in the same order as the @mask@.
+If all the characters in the current string appear in the @mask@, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+i = name.include( digitmask );          $\C{// i is assigned 1}$
+i = name.include( alphamask );          $\C{// i is assigned 6}$
+\end{cfa}
+The @exclude@ operation
+\begin{cfa}
+int exclude( string &mask, int start = 1, occurrence occ = first )
+\end{cfa}
+returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does appear in the @mask@ string starting the search at position @start@;
+hence it skips over characters in the current string that are excluded from (not in) in the @mask@ string.
+The characters in the current string do not have to be in the same order as the @mask@ string.
+If all the characters in the current string do NOT appear in the @mask@ string, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+i = name.exclude( digitmask );          $\C{// i is assigned 6}$
+i = ifstmt.exclude( strmask( punctuation ) ); $\C{// i is assigned 4}$
+\end{cfa}
+The @includeStr@ operation:
+\begin{cfa}
+string includeStr( strmask &mask, int start = 1, occurrence occ = first )
+\end{cfa}
+returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that ARE included in the @mask@ string starting the search at position @start@.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+s = name.includeStr( alphamask );       $\C{// s is assigned "MIKE"}$
+s = ifstmt.includeStr( alphamask );     $\C{// s is assigned "IF"}$
+s = name.includeStr( digitmask );       $\C{// s is assigned ""}$
+\end{cfa}
+The @excludeStr@ operation:
+\begin{cfa}
+string excludeStr( strmask &mask, int start = 1, occurrence = first )
+\end{cfa}
+returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that are excluded (NOT) in the @mask@ string starting the search at position @start@.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+s = name.excludeStr( digitmask);        $\C{// s is assigned "MIKE"}$
+s = ifstmt.excludeStr( strmask( punctuation ) ); $\C{// s is assigned "IF "}$
+s = name.excludeStr( alphamask);        $\C{// s is assigned ""}$
+\end{cfa}
+\subsection{Miscellaneous}
+The @trim@ operation
+\begin{cfa}
+string trim( string &mask, occurrence occ = first )
+\end{cfa}
+returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) which ARE included in the @mask@ are removed.
+\begin{cfa}
+// remove leading blanks
+s = string( "   ABC" ).trim( " " );     $\C{// s is assigned "ABC",}$
+// remove trailing blanks
+s = string( "ABC   " ).trim( " ", last ); $\C{// s is assigned "ABC",}$
+\end{cfa}
+The @translate@ operation
+\begin{cfa}
+string translate( string &from, string &to )
+\end{cfa}
+returns a string that is the same length as the original string in which all occurrences of the characters that appear in the @from@ string have been translated into their corresponding character in the @to@ string.
+Translation is done on a character by character basis between the @from@ and @to@ strings; hence these two strings must be the same length.
+If a character in the original string does not appear in the @from@ string, then it simply appears as is in the resulting string.
+\begin{cfa}
+// upper to lower case
+name = name.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
+                        // name is assigned "name"
+s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
+                        // ifstmt is assigned "if (a > b) {"
+// lower to upper case
+name = name.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" );
+                        // name is assigned "MIKE"
+\end{cfa}
+The @replace@ operation
+\begin{cfa}
+string replace( string &from, string &to )
+\end{cfa}
+returns a string in which all occurrences of the @from@ string in the current string have been replaced by the @to@ string.
+\begin{cfa}
+s = name.replace( "E", "XX" );          $\C{// s is assigned "PXXTXXR"}$
+\end{cfa}
+The replacement is done left-to-right.
+When an instance of the @from@ string is found and changed to the @to@ string, it is NOT examined again for further replacement.
+\subsection{Returning N+1 on Failure}
+Any of the string search routines can fail at some point during the search.
+When this happens it is necessary to return indicating the failure.
+Many string types in other languages use some special value to indicate the failure.
+This value is often 0 or -1 (PL/I returns 0).
+This section argues that a value of N+1, where N is the length of the base string in the search, is a more useful value to return.
+The index-of function in APL returns N+1.
+These are the boundary situations and are often overlooked when designing a string type.
+The situation that can be optimized by returning N+1 is when a search is performed to find the starting location for a substring operation.
+For example, in a program that is extracting words from a text file, it is necessary to scan from left to right over whitespace until the first alphabetic character is found.
+\begin{cfa}
+line = line( line.exclude( alpha ) );
+\end{cfa}
+If a text line contains all whitespaces, the exclude operation fails to find an alphabetic character.
+If @exclude@ returns 0 or -1, the result of the substring operation is unclear.
+Most string types generate an error, or clip the starting value to 1, resulting in the entire whitespace string being selected.
+If @exclude@ returns N+1, the starting position for the substring operation is beyond the end of the string leaving a null string.
+The same situation occurs when scanning off a word.
+\begin{cfa}
+start = line.include(alpha);
+word = line(1, start - 1);
+\end{cfa}
+If the entire line is composed of a word, the include operation will  fail to find a non-alphabetic character.
+In general, returning 0 or -1 is not an appropriate starting position for the substring, which must substring off the word leaving a null string.
+However, returning N+1 will substring off the word leaving a null string.
+\subsection{C Compatibility}
+To ease conversion from C to \CFA, there are companion @string@ routines for C strings.
+\VRef[Table]{t:CompanionStringRoutines} shows the C routines on the left that also work with @string@ and the rough equivalent @string@ opeation of the right.
+Hence, it is possible to directly convert a block of C string operations into @string@ just by changing the
+\begin{table}
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{1}{c|}{\lstinline{char []}}        & \multicolumn{1}{c}{\lstinline{string}}        \\
+\hline
+@strcpy@, @strncpy@             & @=@                                                                   \\
+@strcat@, @strncat@             & @+@                                                                   \\
+@strcmp@, @strncmp@             & @==@, @!=@, @<@, @<=@, @>@, @>=@              \\
+@strlen@                                & @size@                                                                \\
+@[]@                                    & @[]@                                                                  \\
+@strstr@                                & @find@                                                                \\
+@strcspn@                               & @find_first_of@, @find_last_of@               \\
+@strspc@                                & @find_fist_not_of@, @find_last_not_of@
+\end{tabular}
+\end{cquote}
+\caption{Companion Routines for \CFA \lstinline{string} to C Strings}
+\label{t:CompanionStringRoutines}
+\end{table}
+For example, this block of C code can be converted to \CFA by simply changing the type of variable @s@ from @char []@ to @string@.
+\begin{cfa}
+        char s[32];
+        //string s;
+        strcpy( s, "abc" );                             PRINT( %s, s );
+        strncpy( s, "abcdef", 3 );              PRINT( %s, s );
+        strcat( s, "xyz" );                             PRINT( %s, s );
+        strncat( s, "uvwxyz", 3 );              PRINT( %s, s );
+        PRINT( %zd, strlen( s ) );
+        PRINT( %c, s[3] );
+        PRINT( %s, strstr( s, "yzu" ) ) ;
+        PRINT( %s, strstr( s, 'y' ) ) ;
+\end{cfa}
+However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated.
+\subsection{Input/Output Operators}
+Both the \CC operators @<<@ and @>>@ are defined on type @string@.
+However, input of a string value is different from input of a @char *@ value.
+When a string value is read, \emph{all} input characters from the current point in the input stream to either the end of line (@'\n'@) or the end of file are read.
+\section{Implementation Details}
 While \VRef[Figure]{f:StrApiCompare} emphasizes cross-language similarities, it elides many specific operational differences.
 …
 \subsection{Methodology}
+These tests use randomly generated varying-length strings (string content is immaterial).
+A collection of such fragments is a \emph{corpus}.
+The mean length of a fragment from a corpus is a typical explanatory variable.
+Such a length is used in one of three modes:
+These tests use a \emph{corpus} of strings (string content is immaterial).
+For varying-length strings, the mean length comes from a geometric distribution, which implies that lengths much longer than the mean occur frequently.
+The string sizes are:
 \begin{description}
         \item [Fixed-size] means all string fragments are of the stated size.
         \item [Varying from 1] means string lengths are drawn from a geometric distribution with the stated mean, and all lengths occur.
         \item [Varying from 16] means string lengths are drawn from a geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16.
+        \item [Fixed-size] all string lengths are of the stated size.
+        \item [Varying from 1 to N] means the string lengths are drawn from the geometric distribution with a stated mean and all lengths occur.
+        \item [Varying from 16 to N] means string lengths are drawn from the geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16.
 \end{description}
+The geometric distribution implies that lengths much longer than the mean occur frequently.
+The special treatment of length 16 deals with comparison to STL, given that STL has a short-string optimization (see [TODO: write and cross-ref future-work SSO]), currently not implemented in \CFA.
+The means for the geometric distribution are the X-axis values in experiments.
+The special treatment of length 16 deals with the short-string optimization (SSO) in STL @string@, currently not implemented in \CFA.
+When an STL string can fit into a heap pointer, the optimization uses the pointer storage to eliminate using the heap.
+\begin{c++}
+class string {
+        union {
+                struct { $\C{// long string, string storage in heap}$
+                        size_t size;
+                        char * strptr;
+                } lstr;
+                char sstr[sizeof(lstr)]; $\C{// short string 8-16 characters, in situ}$
+        };
+        bool tag; $\C{// string kind, short or long}$
+        ... $\C{// other storage}$
+};
+\end{c++}
 When success is illustrated, notwithstanding SSO, a fixed-size or from-16 distribution ensures that extra-optimized cases are not part of the mix on the STL side.
 In all experiments that use a corpus, its text is generated and loaded into the SUT before the timed phase begins.
+In all experiments that use a corpus, its text is generated and loaded into the system under test before the timed phase begins.
 To discuss: vocabulary for reused case variables
 …
 To discuss: hardware and such
+To discuss: memory allocator
+To ensure comparable results, a common memory allocator is used for \CFA and \CC.
+The llheap allocator~\cite{Zulfiqar22} is embedded into \CFA and is used standalone with \CC.
 \subsection{Test: Append}
+This test measures the speed of appending fragments of text onto a growing string.
+Its subcases include both \CFA being similar to STL and their designs offering a tradeoff.
+One experimental variable is logically equivalent operations such as @a = a + b@ \vs @a += b@.
+This test measures the speed of appending randomly-size text onto a growing string.
+\begin{cquote}
+\setlength{\tabcolsep}{20pt}
+\begin{tabular}{@{}ll@{}}
+% \multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\
+\begin{cfa}
+for ( ... ) {
+        @string x;@       // fresh
+        for ( ... )
+                x @+=@ ...
+}
+\end{cfa}
+&
+\begin{cfa}
+string x;
+for ( ... ) {
+        @x = "";@  $\C[1in]{// reuse}$
+        for ( ... )
+                x @+=@ ...  $\C{// append, alternative x = x + ...}\CRT$
+}
+\end{cfa}
+\end{tabular}
+\end{cquote}
+The benchmark's outer loop executes ``until a sample-worthy amount of execution has happened'' and an inner loop for ``building up the desired-length string.''
+Its subcases include,
+\begin{enumerate}[leftmargin=*]
+\item
+\CFA nosharing/sharing \vs \CC nosharing.
+\item
+Difference between the logically equivalent operations @x += ...@ \vs @x = x + ...@.
 For numeric types, the generated code is equivalence, giving identical performance.
 However, for string types there can be a significant difference in performance, especially if this code appears in a loop iterating a large number of times.
 This difference might not be intuitive to beginners.
+Another experimental variable is whether the user's logical allocation is fresh \vs reused.
+Here, \emph{reusing a logical allocation}, means that the program variable, into which the user is concatenating, previously held a long string:
+\begin{cquote}
+\setlength{\tabcolsep}{20pt}
+\begin{tabular}{@{}ll@{}}
+\multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\
+\begin{cfa}
+for ( ... ) {
+        @string x;@
+        for ( ... )
+                x += ...
+}
+\end{cfa}
+&
+\begin{cfa}
+string x;
+for ( ... ) {
+        @x = "";@
+        for ( ... )
+                x += ...
+}
+\end{cfa}
+\end{tabular}
+\end{cquote}
+All benchmark drivers have an outer loop for ``until a sample-worthy amount of execution has happened'' and an inner loop for ``building up the desired-length string.''
+\item
+Coding practice where the user's logical allocation is fresh \vs reused.
+Here, \emph{reusing a logical allocation}, means that the program variable, into which the user is concatenating, previously held a long string.
 In general, a user should not have to care about this difference, yet the STL performs differently in these cases.
 Furthermore, if a routine takes a string by reference, if cannot use the fresh approach.
 Concretely, both cases incur the cost of copying characters into the target string, but only the allocation-fresh case incurs a further reallocation cost, which is generally paid at points of doubling the length.
 For the STL, this cost includes obtaining a fresh buffer from the memory allocator and copying older characters into the new buffer, while \CFA-sharing hides such a cost entirely.
+The fresh \vs reuse distinction is only relevant in the \emph{append} tests.
+%The fresh \vs reuse distinction is only relevant in the \emph{append} tests.
+\end{enumerate}
 \begin{figure}
 …
 \end{figure}
 The \emph{append} tests use the varying-from-1 corpus construction, \ie they assume the STL's advantage of small-string optimization.
+This tests use the varying-from-1 corpus construction, \ie it assumes the STL's advantage of small-string optimization.
 \PAB{To discuss: any other case variables introduced in the performance intro}
 \VRef[Figure]{fig:string-graph-peq-cppemu} shows this behaviour, by the STL and by \CFA in STL emulation mode.
 \CFA reproduces STL's performance, up to a 15\% penalty averaged over the cases shown, diminishing with larger strings, and 50\% in the worst case.
 This penalty characterizes the amount of implementation fine tuning done with STL and not done with \CFA in present state.
+\PAB{The larger inherent penalty, for a user mismanaging reuse, is 40\% averaged over the cases shown, is minimally 24\%, shows up consistently between the STL and \CFA implementations, and increases with larger strings.}
+There is a larger penalty for redeclaring the string each loop iteration (fresh) versus hosting it out of the loop and reseting it to the null string (reuse).
+The cost is 40\% averaged over the cases shown and minimally 24\%, and shows up consistently between the \CFA and STL implementations, and increases with larger strings.
 \begin{figure}
 …
 \subsubsection{Test: Pass argument}
+To have introduced:  STL string library forces users to think about memory management when communicating values across a function call
+STL has a penalty for passing a string by value, which indirectly forces users to think about memory management when communicating values to a function.
 \begin{cfa}
 void foo( string s );
 …
 foo( s );
 \end{cfa}
-STL charges a prohibitive penalty for passing a string by value.
 With implicit sharing active, \CFA treats this operation as normal and supported.
 This test illustrates a main advantage of the \CFA sharing algorithm.
 …
+}
 \end{lstlisting}
-\section{String I/O}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset fc8ec54 for doc

Legend:

doc/theses/mike_brooks_MMath/string.tex

Download in other formats: