Changeset fc8ec54 for doc/theses
- Timestamp:
- Mar 31, 2025, 9:33:29 PM (6 months ago)
- Branches:
- master
- Children:
- 7e8db9e
- Parents:
- 41c3e46
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
r41c3e46 rfc8ec54 48 48 % Instead, each \CC string is null terminated just in case it might be needed for this purpose. 49 49 Providing this backwards compatibility with C has a ubiquitous performance and storage cost. 50 51 52 \section{\CFA \lstinline{string} type} 53 \label{s:stringType} 54 55 The \CFA string type is for manipulation of dynamically-sized character-strings versus C @char *@ type for manipulation of statically-sized null-terminated character-strings. 56 Hence, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time. 57 As a result, a @string@ declaration does not specify a maximum length, where a C string must. 58 The maximum storage for a \CFA @string@ value is @size_t@ characters, which is $2^{32}$ or $2^{64}$ respectively. 59 A \CFA string manages its length separately from the string, so there is no null (@'\0'@) terminating value at the end of a string value. 60 Hence, a \CFA string cannot be passed to a C string manipulation routine, such as @strcat@. 61 Like C strings, the characters in a @string@ are numbered starting from 0. 62 63 The following operations have been defined to manipulate an instance of type @string@. 64 The discussion assumes the following declarations and assignment statements are executed. 65 \begin{cfa} 66 #include @<string.hfa>@ 67 @string@ s, name, digit, alpha, punctuation, ifstmt; 68 int i; 69 name = "MIKE"; 70 digit = "0123456789"; 71 punctuation = "().,"; 72 ifstmt = "IF (A > B) {"; 73 \end{cfa} 74 Note, the include file @string.hfa@ to access type @string@. 75 76 77 \subsection{Implicit String Conversions} 78 79 The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O. 80 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@. 81 \VRef[Figure]{f:ImplicitConversionsString} shows examples of implicit conversions. 82 Conversions can be explicitly specified using a compound literal: 83 \begin{cfa} 84 s = (string){ "abc" }; $\C{// converts char * to string}$ 85 s = (string){ 5 }; $\C{// converts int to string}$ 86 s = (string){ 5.5 }; $\C{// converts double to string}$ 87 \end{cfa} 88 Conversions from @string@ to @char *@, attempt to be safe: 89 either by requiring the maximum length of the @char *@ storage (@strncpy@) or allocating the @char *@ storage for the string characters (ownership), meaning the programmer must free the storage. 90 As well, a C string is always null terminates, implying a minimum size of 1 character. 91 \begin{cquote} 92 \setlength{\tabcolsep}{15pt} 93 \begin{tabular}{@{}l|l@{}} 94 \begin{cfa} 95 string s = "abcde"; 96 char cs[3]; 97 strncpy( cs, s, sizeof(cs) ); 98 char * cp = s; 99 delete( cp ); 100 cp = s + ' ' + s; 101 delete( cp ); 102 \end{cfa} 103 & 104 \begin{cfa} 105 "abcde" 106 107 "ab\0", in place 108 "abcde\0", malloc 109 110 "abcde abcde\0", malloc 111 112 \end{cfa} 113 \end{tabular} 114 \end{cquote} 115 116 \begin{figure} 117 \begin{tabular}{@{}l|l@{}} 118 \setlength{\tabcolsep}{15pt} 119 \begin{cfa} 120 // string s = 5; 121 string s; 122 // conversion of char and char * to string 123 s = 'x'; 124 s = "abc"; 125 char cs[5] = "abc"; 126 s = cs; 127 // conversion of integral, floating-point, and complex to string 128 s = 45hh; 129 s = 45h; 130 s = -(ssize_t)MAX - 1; 131 s = (size_t)MAX; 132 s = 5.5; 133 s = 5.5L; 134 s = 5.5+3.4i; 135 s = 5.5L+3.4Li; 136 \end{cfa} 137 & 138 \begin{cfa} 139 140 141 142 "x" 143 "abc" 144 145 "abc" 146 147 "45" 148 "45" 149 "-9223372036854775808" 150 "18446744073709551615" 151 "5.5" 152 "5.5" 153 "5.5+3.4i" 154 "5.5+3.4i" 155 \end{cfa} 156 \end{tabular} 157 \caption{Implicit Conversions to String} 158 \label{f:ImplicitConversionsString} 159 \end{figure} 160 161 162 \subsection{Length} 163 164 The @len@ operation returns the length of a string using prefix call. 165 \begin{cquote} 166 \setlength{\tabcolsep}{15pt} 167 \begin{tabular}{@{}l|l@{}} 168 \begin{cfa} 169 const char * cs = "abc"; 170 i = ""`len; 171 i = "abc"`len; 172 i = cs`len; 173 i = name`len; 174 \end{cfa} 175 & 176 \begin{cfa} 177 178 0 179 3 180 3 181 4 182 \end{cfa} 183 \end{tabular} 184 \end{cquote} 185 186 187 \subsection{Comparison Operators} 188 189 The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare strings using lexicographical ordering, where longer strings are greater than shorter strings. 190 C strings use function @strcmp@, as the relational/equality operators compare C string pointers not their values, which does not match normal programmer expectation. 191 192 193 \subsection{Concatenation} 194 195 The binary operators @+@ and @+=@ concatenate two strings, creating the sum of the strings. 196 \begin{cquote} 197 \setlength{\tabcolsep}{15pt} 198 \begin{tabular}{@{}l|l@{}} 199 \begin{cfa} 200 s = name + ' ' + digit; 201 s += name; 202 s = s + 'a' + 'b'; 203 s = s + "a" + "abc"; 204 s = 'a' + 'b' + s; 205 s = "a" + "abc" + s; 206 \end{cfa} 207 & 208 \begin{cfa} 209 "MIKE 0123456789" 210 "MIKE 0123456789MIKE" 211 212 213 $\CC$ unsupported 214 $\CC$ unsupported 215 \end{cfa} 216 \end{tabular} 217 \end{cquote} 218 The \CFA type-system allows full commutativity with character and C strings; 219 \CC does not. 220 221 222 \subsection{Repetition} 223 224 The binary operators @*@ and @*=@ repeat a string $N$ times. 225 If $N = 0$, a zero length string, @""@ is returned. 226 \begin{cquote} 227 \setlength{\tabcolsep}{15pt} 228 \begin{tabular}{@{}l|l@{}} 229 \begin{cfa} 230 s = 'x' * 3; 231 s = "abc" * 3; 232 s = (name + ' ') * 3; 233 \end{cfa} 234 & 235 \begin{cfa} 236 xxx 237 abcabcabc 238 MIKE MIKE MIKE 239 \end{cfa} 240 \end{tabular} 241 \end{cquote} 242 243 244 \subsection{Substring} 245 The substring operation returns a subset of the string starting at a position in the string and traversing a length. 246 \begin{cquote} 247 \setlength{\tabcolsep}{15pt} 248 \begin{tabular}{@{}l|l@{}} 249 \begin{cfa} 250 s = name( 2, 2 ); 251 s = name( 3, -2 ); 252 s = name( 2, 8 ); 253 s = name( 0, -1 ); 254 s = name( -1, -1 ); 255 s = name( -3 ); 256 \end{cfa} 257 & 258 \begin{cfa} 259 "KE" 260 "IK", length is opposite direction 261 "KE", length is clipped to 2 262 "", beyond string so clipped to null 263 "K", start $and$ length are negative 264 "IKE", to end of string 265 \end{cfa} 266 \end{tabular} 267 \end{cquote} 268 A negative starting position is a specification from the right end of the string. 269 A negative length means that characters are selected in the opposite (right to left) direction from the starting position. 270 If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string. 271 If the substring request is completely outside of the original string, a null string located at the end of the original string is returned. 272 273 The substring operation can also appear on the left hand side of the assignment operator. 274 The substring is replaced by the value on the right hand side of the assignment. 275 The length of the right-hand-side value may be shorter, the same length, or longer than the length of the substring that is selected on the left hand side of the assignment. 276 \begin{cfa} 277 digit( 3, 3 ) = ""; $\C{// digit is assigned "0156789"}$ 278 digit( 4, 3 ) = "xyz"; $\C{// digit is assigned "015xyz9"}$ 279 digit( 7, 0 ) = "***"; $\C{// digit is assigned "015xyz***9"}$ 280 digit(-4, 3 ) = "$\tt\$\$\$$"; $\C{// digit is assigned "015xyz\$\$\$9"}$ 281 \end{cfa} 282 A substring is treated as a pointer into the base (substringed) string rather than creating a copy of the subtext. 283 As with all pointers, if the item they are pointing at is changed, then the pointer is referring to the changed item. 284 Pointers to the result value of a substring operation are defined to always start at the same location in their base string as long as that starting location exists, independent of changes to themselves or the base string. 285 However, if the base string value changes, this may affect the values of one or more of the substrings to that base string. 286 If the base string value shortens so that its end is before the starting location of a substring, resulting in the substring starting location disappearing, the substring becomes a null string located at the end of the base string. 287 288 The following example illustrates passing the results of substring operations by reference and by value to a subprogram. 289 Notice the side-effects to other reference parameters as one is modified. 290 \begin{cfa} 291 main() { 292 string x = "xxxxxxxxxxxxx"; 293 test( x, x(1,3), x(3,3), x(5,5), x(9,5), x(9,5) ); 294 } 295 296 // x, a, b, c, & d are substring results passed by reference 297 // e is a substring result passed by value 298 void test(string &x, string &a, string &b, string &c, string &d, string e) { 299 $\C{// x a b c d e}$ 300 a( 1, 2 ) = "aaa"; $\C{// aaaxxxxxxxxxxx aaax axx xxxxx xxxxx xxxxx}$ 301 b( 2, 12 ) = "bbb"; $\C{// aaabbbxxxxxxxxx aaab abbb bbxxx xxxxx xxxxx}$ 302 c( 4, 5 ) = "ccc"; $\C{// aaabbbxcccxxxxxx aaab abbb bbxccc ccxxx xxxxx}$ 303 c = "yyy"; $\C{// aaabyyyxxxxxx aaab abyy yyy xxxxx xxxxx}$ 304 d( 1, 3 ) = "ddd"; $\C{// aaabyyyxdddxx aaab abyy yyy dddxx xxxxx}$ 305 e( 1, 3 ) = "eee"; $\C{// aaabyyyxdddxx aaab abyy yyy dddxx eeexx}$ 306 x = e; $\C{// eeexx eeex exx x eeexx}$ 307 } 308 \end{cfa} 309 310 There is an assignment form of substring in which only the starting position is specified and the length is assumed to be the remainder of the string. 311 \begin{cfa} 312 string operator () (int start); 313 \end{cfa} 314 For example: 315 \begin{cfa} 316 s = name( 2 ); $\C{// s is assigned "ETER"}$ 317 name( 2 ) = "IPER"; $\C{// name is assigned "PIPER"}$ 318 \end{cfa} 319 It is also possible to substring using a string as the index for selecting the substring portion of the string. 320 \begin{cfa} 321 string operator () (const string &index); 322 \end{cfa} 323 For example: 324 \begin{cfa}[mathescape=false] 325 digit( "xyz$\$\$\$$" ) = "678"; $\C{// digit is assigned "0156789"}$ 326 digit( "234") = "***"; $\C{// digit is assigned "0156789***"}$ 327 \end{cfa} 328 329 330 \subsection{Searching} 331 332 The @index@ operation 333 \begin{cfa} 334 int index( const string &key, int start = 1, occurrence occ = first ); 335 \end{cfa} 336 returns the position of the first or last occurrence of the @key@ (depending on the occurrence indicator @occ@ that is either @first@ or @last@) in the current string starting the search at position @start@. 337 If the @key@ does not appear in the current string, the length of the current string plus one is returned. 338 %If the @key@ has zero length, the value 1 is returned regardless of what the current string contains. 339 A negative starting position is a specification from the right end of the string. 340 \begin{cfa} 341 i = digit.index( "567" ); $\C{// i is assigned 3}$ 342 i = digit.index( "567", 7 ); $\C{// i is assigned 11}$ 343 i = digit.index( "567", -1, last ); $\C{// i is assigned 3}$ 344 i = name.index( "E", 5, last ); $\C{// i is assigned 4}$ 345 \end{cfa} 346 347 The next two string operations test a string to see if it is or is not composed completely of a particular class of characters. 348 For example, are the characters of a string all alphabetic or all numeric? 349 Use of these operations involves a two step operation. 350 First, it is necessary to create an instance of type @strmask@ and initialize it to a string containing the characters of the particular character class, as in: 351 \begin{cfa} 352 strmask digitmask = digit; 353 strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" ); 354 \end{cfa} 355 Second, the character mask is used in the functions @include@ and @exclude@ to check a string for compliance of its characters with the characters indicated by the mask. 356 357 The @include@ operation 358 \begin{cfa} 359 int include( const strmask &, int = 1, occurrence occ = first ); 360 \end{cfa} 361 returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does not appear in the @mask@ starting the search at position @start@; 362 hence it skips over characters in the current string that are included (in) the @mask@. 363 The characters in the current string do not have to be in the same order as the @mask@. 364 If all the characters in the current string appear in the @mask@, the length of the current string plus one is returned, regardless of which occurrence is being searched for. 365 A negative starting position is a specification from the right end of the string. 366 \begin{cfa} 367 i = name.include( digitmask ); $\C{// i is assigned 1}$ 368 i = name.include( alphamask ); $\C{// i is assigned 6}$ 369 \end{cfa} 370 371 The @exclude@ operation 372 \begin{cfa} 373 int exclude( string &mask, int start = 1, occurrence occ = first ) 374 \end{cfa} 375 returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does appear in the @mask@ string starting the search at position @start@; 376 hence it skips over characters in the current string that are excluded from (not in) in the @mask@ string. 377 The characters in the current string do not have to be in the same order as the @mask@ string. 378 If all the characters in the current string do NOT appear in the @mask@ string, the length of the current string plus one is returned, regardless of which occurrence is being searched for. 379 A negative starting position is a specification from the right end of the string. 380 \begin{cfa} 381 i = name.exclude( digitmask ); $\C{// i is assigned 6}$ 382 i = ifstmt.exclude( strmask( punctuation ) ); $\C{// i is assigned 4}$ 383 \end{cfa} 384 385 The @includeStr@ operation: 386 \begin{cfa} 387 string includeStr( strmask &mask, int start = 1, occurrence occ = first ) 388 \end{cfa} 389 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that ARE included in the @mask@ string starting the search at position @start@. 390 A negative starting position is a specification from the right end of the string. 391 \begin{cfa} 392 s = name.includeStr( alphamask ); $\C{// s is assigned "MIKE"}$ 393 s = ifstmt.includeStr( alphamask ); $\C{// s is assigned "IF"}$ 394 s = name.includeStr( digitmask ); $\C{// s is assigned ""}$ 395 \end{cfa} 396 397 The @excludeStr@ operation: 398 \begin{cfa} 399 string excludeStr( strmask &mask, int start = 1, occurrence = first ) 400 \end{cfa} 401 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that are excluded (NOT) in the @mask@ string starting the search at position @start@. 402 A negative starting position is a specification from the right end of the string. 403 \begin{cfa} 404 s = name.excludeStr( digitmask); $\C{// s is assigned "MIKE"}$ 405 s = ifstmt.excludeStr( strmask( punctuation ) ); $\C{// s is assigned "IF "}$ 406 s = name.excludeStr( alphamask); $\C{// s is assigned ""}$ 407 \end{cfa} 408 409 410 \subsection{Miscellaneous} 411 412 The @trim@ operation 413 \begin{cfa} 414 string trim( string &mask, occurrence occ = first ) 415 \end{cfa} 416 returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) which ARE included in the @mask@ are removed. 417 \begin{cfa} 418 // remove leading blanks 419 s = string( " ABC" ).trim( " " ); $\C{// s is assigned "ABC",}$ 420 // remove trailing blanks 421 s = string( "ABC " ).trim( " ", last ); $\C{// s is assigned "ABC",}$ 422 \end{cfa} 423 424 The @translate@ operation 425 \begin{cfa} 426 string translate( string &from, string &to ) 427 \end{cfa} 428 returns a string that is the same length as the original string in which all occurrences of the characters that appear in the @from@ string have been translated into their corresponding character in the @to@ string. 429 Translation is done on a character by character basis between the @from@ and @to@ strings; hence these two strings must be the same length. 430 If a character in the original string does not appear in the @from@ string, then it simply appears as is in the resulting string. 431 \begin{cfa} 432 // upper to lower case 433 name = name.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" ); 434 // name is assigned "name" 435 s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" ); 436 // ifstmt is assigned "if (a > b) {" 437 // lower to upper case 438 name = name.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ); 439 // name is assigned "MIKE" 440 \end{cfa} 441 442 The @replace@ operation 443 \begin{cfa} 444 string replace( string &from, string &to ) 445 \end{cfa} 446 returns a string in which all occurrences of the @from@ string in the current string have been replaced by the @to@ string. 447 \begin{cfa} 448 s = name.replace( "E", "XX" ); $\C{// s is assigned "PXXTXXR"}$ 449 \end{cfa} 450 The replacement is done left-to-right. 451 When an instance of the @from@ string is found and changed to the @to@ string, it is NOT examined again for further replacement. 452 453 \subsection{Returning N+1 on Failure} 454 455 Any of the string search routines can fail at some point during the search. 456 When this happens it is necessary to return indicating the failure. 457 Many string types in other languages use some special value to indicate the failure. 458 This value is often 0 or -1 (PL/I returns 0). 459 This section argues that a value of N+1, where N is the length of the base string in the search, is a more useful value to return. 460 The index-of function in APL returns N+1. 461 These are the boundary situations and are often overlooked when designing a string type. 462 463 The situation that can be optimized by returning N+1 is when a search is performed to find the starting location for a substring operation. 464 For example, in a program that is extracting words from a text file, it is necessary to scan from left to right over whitespace until the first alphabetic character is found. 465 \begin{cfa} 466 line = line( line.exclude( alpha ) ); 467 \end{cfa} 468 If a text line contains all whitespaces, the exclude operation fails to find an alphabetic character. 469 If @exclude@ returns 0 or -1, the result of the substring operation is unclear. 470 Most string types generate an error, or clip the starting value to 1, resulting in the entire whitespace string being selected. 471 If @exclude@ returns N+1, the starting position for the substring operation is beyond the end of the string leaving a null string. 472 473 The same situation occurs when scanning off a word. 474 \begin{cfa} 475 start = line.include(alpha); 476 word = line(1, start - 1); 477 \end{cfa} 478 If the entire line is composed of a word, the include operation will fail to find a non-alphabetic character. 479 In general, returning 0 or -1 is not an appropriate starting position for the substring, which must substring off the word leaving a null string. 480 However, returning N+1 will substring off the word leaving a null string. 481 482 483 \subsection{C Compatibility} 484 485 To ease conversion from C to \CFA, there are companion @string@ routines for C strings. 486 \VRef[Table]{t:CompanionStringRoutines} shows the C routines on the left that also work with @string@ and the rough equivalent @string@ opeation of the right. 487 Hence, it is possible to directly convert a block of C string operations into @string@ just by changing the 488 489 \begin{table} 490 \begin{cquote} 491 \begin{tabular}{@{}l|l@{}} 492 \multicolumn{1}{c|}{\lstinline{char []}} & \multicolumn{1}{c}{\lstinline{string}} \\ 493 \hline 494 @strcpy@, @strncpy@ & @=@ \\ 495 @strcat@, @strncat@ & @+@ \\ 496 @strcmp@, @strncmp@ & @==@, @!=@, @<@, @<=@, @>@, @>=@ \\ 497 @strlen@ & @size@ \\ 498 @[]@ & @[]@ \\ 499 @strstr@ & @find@ \\ 500 @strcspn@ & @find_first_of@, @find_last_of@ \\ 501 @strspc@ & @find_fist_not_of@, @find_last_not_of@ 502 \end{tabular} 503 \end{cquote} 504 \caption{Companion Routines for \CFA \lstinline{string} to C Strings} 505 \label{t:CompanionStringRoutines} 506 \end{table} 507 508 For example, this block of C code can be converted to \CFA by simply changing the type of variable @s@ from @char []@ to @string@. 509 \begin{cfa} 510 char s[32]; 511 //string s; 512 strcpy( s, "abc" ); PRINT( %s, s ); 513 strncpy( s, "abcdef", 3 ); PRINT( %s, s ); 514 strcat( s, "xyz" ); PRINT( %s, s ); 515 strncat( s, "uvwxyz", 3 ); PRINT( %s, s ); 516 PRINT( %zd, strlen( s ) ); 517 PRINT( %c, s[3] ); 518 PRINT( %s, strstr( s, "yzu" ) ) ; 519 PRINT( %s, strstr( s, 'y' ) ) ; 520 \end{cfa} 521 However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated. 522 523 524 \subsection{Input/Output Operators} 525 526 Both the \CC operators @<<@ and @>>@ are defined on type @string@. 527 However, input of a string value is different from input of a @char *@ value. 528 When a string value is read, \emph{all} input characters from the current point in the input stream to either the end of line (@'\n'@) or the end of file are read. 529 530 531 \section{Implementation Details} 50 532 51 533 While \VRef[Figure]{f:StrApiCompare} emphasizes cross-language similarities, it elides many specific operational differences. … … 532 1014 \subsection{Methodology} 533 1015 534 These tests use randomly generated varying-length strings (string content is immaterial). 535 A collection of such fragments is a \emph{corpus}. 536 The mean length of a fragment from a corpus is a typical explanatory variable. 537 Such a length is used in one of three modes: 1016 These tests use a \emph{corpus} of strings (string content is immaterial). 1017 For varying-length strings, the mean length comes from a geometric distribution, which implies that lengths much longer than the mean occur frequently. 1018 The string sizes are: 538 1019 \begin{description} 539 \item [Fixed-size] means all string fragments are of the stated size.540 \item [Varying from 1 ] means string lengths are drawn from a geometric distribution with the stated mean,and all lengths occur.541 \item [Varying from 16 ] means string lengths are drawn from ageometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16.1020 \item [Fixed-size] all string lengths are of the stated size. 1021 \item [Varying from 1 to N] means the string lengths are drawn from the geometric distribution with a stated mean and all lengths occur. 1022 \item [Varying from 16 to N] means string lengths are drawn from the geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16. 542 1023 \end{description} 543 The geometric distribution implies that lengths much longer than the mean occur frequently. 544 The special treatment of length 16 deals with comparison to STL, given that STL has a short-string optimization (see [TODO: write and cross-ref future-work SSO]), currently not implemented in \CFA. 1024 The means for the geometric distribution are the X-axis values in experiments. 1025 The special treatment of length 16 deals with the short-string optimization (SSO) in STL @string@, currently not implemented in \CFA. 1026 When an STL string can fit into a heap pointer, the optimization uses the pointer storage to eliminate using the heap. 1027 \begin{c++} 1028 class string { 1029 union { 1030 struct { $\C{// long string, string storage in heap}$ 1031 size_t size; 1032 char * strptr; 1033 } lstr; 1034 char sstr[sizeof(lstr)]; $\C{// short string 8-16 characters, in situ}$ 1035 }; 1036 bool tag; $\C{// string kind, short or long}$ 1037 ... $\C{// other storage}$ 1038 }; 1039 \end{c++} 1040 545 1041 When success is illustrated, notwithstanding SSO, a fixed-size or from-16 distribution ensures that extra-optimized cases are not part of the mix on the STL side. 546 In all experiments that use a corpus, its text is generated and loaded into the SUTbefore the timed phase begins.1042 In all experiments that use a corpus, its text is generated and loaded into the system under test before the timed phase begins. 547 1043 548 1044 To discuss: vocabulary for reused case variables … … 552 1048 To discuss: hardware and such 553 1049 554 To discuss: memory allocator 1050 To ensure comparable results, a common memory allocator is used for \CFA and \CC. 1051 The llheap allocator~\cite{Zulfiqar22} is embedded into \CFA and is used standalone with \CC. 555 1052 556 1053 557 1054 \subsection{Test: Append} 558 1055 559 This test measures the speed of appending fragments of text onto a growing string. 560 Its subcases include both \CFA being similar to STL and their designs offering a tradeoff. 561 562 One experimental variable is logically equivalent operations such as @a = a + b@ \vs @a += b@. 1056 This test measures the speed of appending randomly-size text onto a growing string. 1057 \begin{cquote} 1058 \setlength{\tabcolsep}{20pt} 1059 \begin{tabular}{@{}ll@{}} 1060 % \multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\ 1061 \begin{cfa} 1062 1063 for ( ... ) { 1064 @string x;@ // fresh 1065 for ( ... ) 1066 x @+=@ ... 1067 } 1068 \end{cfa} 1069 & 1070 \begin{cfa} 1071 string x; 1072 for ( ... ) { 1073 @x = "";@ $\C[1in]{// reuse}$ 1074 for ( ... ) 1075 x @+=@ ... $\C{// append, alternative x = x + ...}\CRT$ 1076 } 1077 \end{cfa} 1078 \end{tabular} 1079 \end{cquote} 1080 The benchmark's outer loop executes ``until a sample-worthy amount of execution has happened'' and an inner loop for ``building up the desired-length string.'' 1081 Its subcases include, 1082 \begin{enumerate}[leftmargin=*] 1083 \item 1084 \CFA nosharing/sharing \vs \CC nosharing. 1085 \item 1086 Difference between the logically equivalent operations @x += ...@ \vs @x = x + ...@. 563 1087 For numeric types, the generated code is equivalence, giving identical performance. 564 1088 However, for string types there can be a significant difference in performance, especially if this code appears in a loop iterating a large number of times. 565 1089 This difference might not be intuitive to beginners. 566 567 Another experimental variable is whether the user's logical allocation is fresh \vs reused. 568 Here, \emph{reusing a logical allocation}, means that the program variable, into which the user is concatenating, previously held a long string: 569 \begin{cquote} 570 \setlength{\tabcolsep}{20pt} 571 \begin{tabular}{@{}ll@{}} 572 \multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\ 573 \begin{cfa} 574 575 for ( ... ) { 576 @string x;@ 577 for ( ... ) 578 x += ... 579 } 580 \end{cfa} 581 & 582 \begin{cfa} 583 string x; 584 for ( ... ) { 585 @x = "";@ 586 for ( ... ) 587 x += ... 588 } 589 \end{cfa} 590 \end{tabular} 591 \end{cquote} 592 593 All benchmark drivers have an outer loop for ``until a sample-worthy amount of execution has happened'' and an inner loop for ``building up the desired-length string.'' 1090 \item 1091 Coding practice where the user's logical allocation is fresh \vs reused. 1092 Here, \emph{reusing a logical allocation}, means that the program variable, into which the user is concatenating, previously held a long string. 594 1093 In general, a user should not have to care about this difference, yet the STL performs differently in these cases. 595 1094 Furthermore, if a routine takes a string by reference, if cannot use the fresh approach. 596 1095 Concretely, both cases incur the cost of copying characters into the target string, but only the allocation-fresh case incurs a further reallocation cost, which is generally paid at points of doubling the length. 597 1096 For the STL, this cost includes obtaining a fresh buffer from the memory allocator and copying older characters into the new buffer, while \CFA-sharing hides such a cost entirely. 598 The fresh \vs reuse distinction is only relevant in the \emph{append} tests. 1097 %The fresh \vs reuse distinction is only relevant in the \emph{append} tests. 1098 \end{enumerate} 599 1099 600 1100 \begin{figure} … … 606 1106 \end{figure} 607 1107 608 Th e \emph{append} tests use the varying-from-1 corpus construction, \ie they assumethe STL's advantage of small-string optimization.1108 This tests use the varying-from-1 corpus construction, \ie it assumes the STL's advantage of small-string optimization. 609 1109 \PAB{To discuss: any other case variables introduced in the performance intro} 610 1110 \VRef[Figure]{fig:string-graph-peq-cppemu} shows this behaviour, by the STL and by \CFA in STL emulation mode. 611 1111 \CFA reproduces STL's performance, up to a 15\% penalty averaged over the cases shown, diminishing with larger strings, and 50\% in the worst case. 612 1112 This penalty characterizes the amount of implementation fine tuning done with STL and not done with \CFA in present state. 613 \PAB{The larger inherent penalty, for a user mismanaging reuse, is 40\% averaged over the cases shown, is minimally 24\%, shows up consistently between the STL and \CFA implementations, and increases with larger strings.} 1113 There is a larger penalty for redeclaring the string each loop iteration (fresh) versus hosting it out of the loop and reseting it to the null string (reuse). 1114 The cost is 40\% averaged over the cases shown and minimally 24\%, and shows up consistently between the \CFA and STL implementations, and increases with larger strings. 614 1115 615 1116 \begin{figure} … … 640 1141 \subsubsection{Test: Pass argument} 641 1142 642 To have introduced: STL string library forces users to think about memory management when communicating values across a function call 1143 STL has a penalty for passing a string by value, which indirectly forces users to think about memory management when communicating values to a function. 643 1144 \begin{cfa} 644 1145 void foo( string s ); … … 646 1147 foo( s ); 647 1148 \end{cfa} 648 STL charges a prohibitive penalty for passing a string by value.649 1149 With implicit sharing active, \CFA treats this operation as normal and supported. 650 1150 This test illustrates a main advantage of the \CFA sharing algorithm. … … 742 1242 } 743 1243 \end{lstlisting} 744 745 \section{String I/O}
Note:
See TracChangeset
for help on using the changeset viewer.