Changeset 931f1b4 for doc/theses/mike_brooks_MMath/string.tex
- Timestamp:
- Apr 10, 2025, 7:31:11 AM (5 months ago)
- Branches:
- master
- Children:
- 9018dcf
- Parents:
- d914750
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
rd914750 r931f1b4 7 7 8 8 \section{String Operations} 9 10 % https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(string_functions) 9 11 10 12 \VRef[Figure]{f:StrApiCompare} shows a general comparison of string APIs for C, \CC, Java and \CFA. … … 23 25 @strlen@ & @length@, @size@ & @length@ & @size@ \\ 24 26 @[ ]@ & @[ ]@ & @charAt@ & @[ ]@ \\ 25 @strncpy@ & @substr@ & @substring@ & @( )@ \\26 @strncpy@ & @replace@ & @replace@ & @ =@ \emph{(on a substring)}\\27 @strncpy@ & @substr@ & @substring@ & @( )@ RHS @=@ \\ 28 @strncpy@ & @replace@ & @replace@ & @( )@ LHS @=@ \\ 27 29 @strstr@ & @find@ & @indexOf@ & @find@ \\ 28 30 @strcspn@ & @find_first_of@ & @matches@ & @include@ \\ … … 62 64 \begin{cquote} 63 65 \sf 64 \begin{tabular}{@{}rrrr rl@{}}65 \small\tt a & \small\tt b & \small\tt c & \small\tt d & \small\tt e\\66 \begin{tabular}{@{}rrrrll@{}} 67 \small\tt "a & \small\tt b & \small\tt c & \small\tt d & \small\tt e" \\ 66 68 0 & 1 & 2 & 3 & 4 & left to right index \\ 67 69 -5 & -4 & -3 & -2 & -1 & right to left index … … 72 74 \begin{cfa} 73 75 #include @<string.hfa>@ 74 @string@ s = "abcde", name = "MIKE", digit , alpha, punctuation, ifstmt;76 @string@ s = "abcde", name = "MIKE", digit = "0123456789"; 75 77 const char cs[] = "abc"; 76 78 int i; 77 digit = "0123456789"; 78 punctuation = "().,"; 79 ifstmt = "IF (A > B) {"; 80 \end{cfa} 81 Note, the include file @string.hfa@ to access type @string@. 79 \end{cfa} 80 Note, the include file @<string.hfa>@ to access type @string@. 82 81 83 82 … … 394 393 Extending the pattern to a regular expression is a possible extension. 395 394 396 397 \subsection{Searching} 398 399 The @index@ operation 400 \begin{cfa} 401 int index( const string & key, int start = 1, occurrence occ = first ); 402 \end{cfa} 403 returns the position of the first or last occurrence of the @key@ (depending on the occurrence indicator @occ@ that is either @first@ or @last@) in the current string starting the search at position @start@. 404 If the @key@ does not appear in the current string, the length of the current string plus one is returned. 405 %If the @key@ has zero length, the value 1 is returned regardless of what the current string contains. 406 A negative starting position is a specification from the right end of the string. 395 The replace operation returns a string in which all occurrences of a substring are replaced by another string. 407 396 \begin{cquote} 408 397 \setlength{\tabcolsep}{15pt} 409 398 \begin{tabular}{@{}l|l@{}} 410 399 \begin{cfa} 411 i = find( digit, "567" ); 412 i = find( digit, "567", 7 ); 413 i = digit.index( "567", -1, last ); 414 i = name.index( "E", 5, last ); 415 \end{cfa} 416 & 417 \begin{cfa} 400 s = replace( "PETER", "E", "XX" ); 401 s = replace( "PETER", "ET", "XX" ); 402 s = replace( "PETER", "W", "XX" ); 403 \end{cfa} 404 & 405 \begin{cfa} 406 "PXXTXXR" 407 "PXXER" 408 "PETER" 409 \end{cfa} 410 \end{tabular} 411 \end{cquote} 412 The replacement is done left-to-right and substituted text is not examined for replacement. 413 414 415 \subsection{Searching} 416 417 The find operation returns the position of the first occurrence of a key string in a string. 418 If the key does not appear in the current string, the length of the current string plus one is returned. 419 \begin{cquote} 420 \setlength{\tabcolsep}{15pt} 421 \begin{tabular}{@{}l|l@{}} 422 \begin{cfa} 423 i = find( digit, '3' ); 424 i = "45" ^ digit; // python style "45" in digit 425 string x = "567"; 426 i = find( digit, x ); 427 \end{cfa} 428 & 429 \begin{cfa} 430 3 431 4 432 418 433 5 419 10 420 421 422 \end{cfa} 423 \end{tabular} 424 \end{cquote} 425 The next two string operations test a string to see if it is or is not composed completely of a particular class of characters. 426 For example, are the characters of a string all alphabetic or all numeric? 427 Use of these operations involves a two step operation. 428 First, it is necessary to create an instance of type @strmask@ and initialize it to a string containing the characters of the particular character class, as in: 429 \begin{cfa} 430 strmask digitmask = digit; 431 strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" ); 432 \end{cfa} 433 Second, the character mask is used in the functions @include@ and @exclude@ to check a string for compliance of its characters with the characters indicated by the mask. 434 435 The @include@ operation 436 \begin{cfa} 437 int include( const strmask &, int = 1, occurrence occ = first ); 438 \end{cfa} 439 returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does not appear in the @mask@ starting the search at position @start@; 440 hence it skips over characters in the current string that are included (in) the @mask@. 441 The characters in the current string do not have to be in the same order as the @mask@. 442 If all the characters in the current string appear in the @mask@, the length of the current string plus one is returned, regardless of which occurrence is being searched for. 443 A negative starting position is a specification from the right end of the string. 444 \begin{cfa} 445 i = name.include( digitmask ); $\C{// i is assigned 1}$ 446 i = name.include( alphamask ); $\C{// i is assigned 6}$ 447 \end{cfa} 448 449 The @exclude@ operation 450 \begin{cfa} 451 int exclude( string &mask, int start = 1, occurrence occ = first ) 452 \end{cfa} 453 returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does appear in the @mask@ string starting the search at position @start@; 454 hence it skips over characters in the current string that are excluded from (not in) in the @mask@ string. 455 The characters in the current string do not have to be in the same order as the @mask@ string. 456 If all the characters in the current string do NOT appear in the @mask@ string, the length of the current string plus one is returned, regardless of which occurrence is being searched for. 457 A negative starting position is a specification from the right end of the string. 458 \begin{cfa} 459 i = name.exclude( digitmask ); $\C{// i is assigned 6}$ 460 i = ifstmt.exclude( strmask( punctuation ) ); $\C{// i is assigned 4}$ 461 \end{cfa} 462 463 The @includeStr@ operation: 464 \begin{cfa} 465 string includeStr( strmask &mask, int start = 1, occurrence occ = first ) 466 \end{cfa} 467 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that ARE included in the @mask@ string starting the search at position @start@. 468 A negative starting position is a specification from the right end of the string. 469 \begin{cfa} 470 s = name.includeStr( alphamask ); $\C{// s is assigned "MIKE"}$ 471 s = ifstmt.includeStr( alphamask ); $\C{// s is assigned "IF"}$ 472 s = name.includeStr( digitmask ); $\C{// s is assigned ""}$ 473 \end{cfa} 474 475 The @excludeStr@ operation: 476 \begin{cfa} 477 string excludeStr( strmask &mask, int start = 1, occurrence = first ) 478 \end{cfa} 479 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that are excluded (NOT) in the @mask@ string starting the search at position @start@. 480 A negative starting position is a specification from the right end of the string. 481 \begin{cfa} 482 s = name.excludeStr( digitmask); $\C{// s is assigned "MIKE"}$ 483 s = ifstmt.excludeStr( strmask( punctuation ) ); $\C{// s is assigned "IF "}$ 484 s = name.excludeStr( alphamask); $\C{// s is assigned ""}$ 485 \end{cfa} 486 487 488 \subsection{Miscellaneous} 489 490 The @trim@ operation 491 \begin{cfa} 492 string trim( string &mask, occurrence occ = first ) 493 \end{cfa} 494 returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) which ARE included in the @mask@ are removed. 495 \begin{cfa} 496 // remove leading blanks 497 s = string( " ABC" ).trim( " " ); $\C{// s is assigned "ABC",}$ 498 // remove trailing blanks 499 s = string( "ABC " ).trim( " ", last ); $\C{// s is assigned "ABC",}$ 500 \end{cfa} 501 502 The @translate@ operation 503 \begin{cfa} 504 string translate( string &from, string &to ) 505 \end{cfa} 506 returns a string that is the same length as the original string in which all occurrences of the characters that appear in the @from@ string have been translated into their corresponding character in the @to@ string. 507 Translation is done on a character by character basis between the @from@ and @to@ strings; hence these two strings must be the same length. 508 If a character in the original string does not appear in the @from@ string, then it simply appears as is in the resulting string. 509 \begin{cfa} 510 // upper to lower case 511 name = name.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" ); 512 // name is assigned "name" 513 s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" ); 514 // ifstmt is assigned "if (a > b) {" 515 // lower to upper case 516 name = name.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ); 517 // name is assigned "MIKE" 518 \end{cfa} 519 520 The @replace@ operation 521 \begin{cfa} 522 string replace( string &from, string &to ) 523 \end{cfa} 524 returns a string in which all occurrences of the @from@ string in the current string have been replaced by the @to@ string. 525 \begin{cfa} 526 s = name.replace( "E", "XX" ); $\C{// s is assigned "PXXTXXR"}$ 527 \end{cfa} 528 The replacement is done left-to-right. 529 When an instance of the @from@ string is found and changed to the @to@ string, it is NOT examined again for further replacement. 434 \end{cfa} 435 \end{tabular} 436 \end{cquote} 437 The character-class operations indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc. 438 \begin{cquote} 439 \setlength{\tabcolsep}{15pt} 440 \begin{tabular}{@{}l|l@{}} 441 \begin{cfa} 442 charclass vowels{ "aeiouy" }; 443 i = include( "aaeiuyoo", vowels ); 444 i = include( "aabiuyoo", vowels ); 445 \end{cfa} 446 & 447 \begin{cfa} 448 449 8 // compliant 450 2 // b non-compliant 451 \end{cfa} 452 \end{tabular} 453 \end{cquote} 454 @vowels@ defines a character class and function @include@ checks if all characters in the string are included in the class (compliance). 455 The position of the last character plus 1 is return if the string is compliant or the position of the first non-compliant character. 456 There is no relationship between the order of characters in the two strings. 457 Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance). 458 \begin{cquote} 459 \setlength{\tabcolsep}{15pt} 460 \begin{tabular}{@{}l|l@{}} 461 \begin{cfa} 462 i = exclude( "cdbfghmk", vowels ); 463 i = exclude( "cdyfghmk", vowels ); 464 \end{cfa} 465 & 466 \begin{cfa} 467 8 // compliant 468 2 // y non-compliant 469 \end{cfa} 470 \end{tabular} 471 \end{cquote} 472 Both forms can return the longest substring of compliant characters. 473 \begin{cquote} 474 \setlength{\tabcolsep}{15pt} 475 \begin{tabular}{@{}l|l@{}} 476 \begin{cfa} 477 s = include( "aaeiuyoo", vowels ); 478 s = include( "aabiuyoo", vowels ); 479 s = exclude( "cdbfghmk", vowels ); 480 s = exclude( "cdyfghmk", vowels ); 481 \end{cfa} 482 & 483 \begin{cfa} 484 "aaeiuyoo" 485 "aa" 486 "cdbfghmk" 487 "cd" 488 \end{cfa} 489 \end{tabular} 490 \end{cquote} 491 492 The test operation checks if each character in a string is in one of the C character classes. 493 \begin{cquote} 494 \setlength{\tabcolsep}{15pt} 495 \begin{tabular}{@{}l|l@{}} 496 \begin{cfa} 497 i = test( "1FeC34aB", @isxdigit@ ); 498 i = test( ".,;'!\"", @ispunct@ ); 499 i = test( "XXXx", @isupper@ ); 500 \end{cfa} 501 & 502 \begin{cfa} 503 8 // compliant 504 6 // compliant 505 3 // non-compliant 506 \end{cfa} 507 \end{tabular} 508 \end{cquote} 509 The position of the last character plus 1 is return if the string is compliant or the position of the first non-compliant character. 510 511 Combining substring and search allows actions like trimming whitespace from the start of a line. 512 \begin{cquote} 513 \setlength{\tabcolsep}{15pt} 514 \begin{tabular}{@{}l|l@{}} 515 \begin{cfa} 516 string line = " \t xxx yyy zzz"; 517 string trim = line( test( line, isspace ) ); 518 \end{cfa} 519 & 520 \begin{cfa} 521 522 "xxx yyy zzz" 523 \end{cfa} 524 \end{tabular} 525 \end{cquote} 526 527 The translate operation returns a string with each character transformed by one of the C character transformation functions. 528 \begin{cquote} 529 \setlength{\tabcolsep}{15pt} 530 \begin{tabular}{@{}l|l@{}} 531 \begin{cfa} 532 s = translate( "abc", @toupper@ ); 533 s = translate( "ABC", @tolower@ ); 534 int tospace( int c ) { return isspace( c ) ? ' ' : c; } 535 s = translate( "X X\tX\nX", @tospace@ ); 536 \end{cfa} 537 & 538 \begin{cfa} 539 "ABC" 540 "abc" 541 542 "X X X X" 543 \end{cfa} 544 \end{tabular} 545 \end{cquote} 530 546 531 547
Note:
See TracChangeset
for help on using the changeset viewer.