Ignore:
Timestamp:
Apr 10, 2025, 7:31:11 AM (5 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
9018dcf
Parents:
d914750
Message:

more proofreading string chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/string.tex

    rd914750 r931f1b4  
    77
    88\section{String Operations}
     9
     10% https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(string_functions)
    911
    1012\VRef[Figure]{f:StrApiCompare} shows a general comparison of string APIs for C, \CC, Java and \CFA.
     
    2325@strlen@                                & @length@, @size@              & @length@                      & @size@        \\
    2426@[ ]@                                   & @[ ]@                                 & @charAt@          & @[ ]@     \\
    25 @strncpy@                               & @substr@                              & @substring@       & @( )@     \\
    26 @strncpy@                               & @replace@                             & @replace@         & @=@ \emph{(on a substring)}\\
     27@strncpy@                               & @substr@                              & @substring@       & @( )@ RHS @=@     \\
     28@strncpy@                               & @replace@                             & @replace@         & @( )@ LHS @=@ \\
    2729@strstr@                                & @find@                                & @indexOf@         & @find@ \\
    2830@strcspn@                               & @find_first_of@               & @matches@         & @include@ \\
     
    6264\begin{cquote}
    6365\sf
    64 \begin{tabular}{@{}rrrrrl@{}}
    65 \small\tt a & \small\tt b & \small\tt c & \small\tt d & \small\tt e \\
     66\begin{tabular}{@{}rrrrll@{}}
     67\small\tt "a & \small\tt b & \small\tt c & \small\tt d & \small\tt e" \\
    66680 & 1 & 2 & 3 & 4 & left to right index \\
    6769-5 & -4 & -3 & -2 & -1 & right to left index
     
    7274\begin{cfa}
    7375#include @<string.hfa>@
    74 @string@ s = "abcde", name = "MIKE", digit, alpha, punctuation, ifstmt;
     76@string@ s = "abcde", name = "MIKE", digit = "0123456789";
    7577const char cs[] = "abc";
    7678int i;
    77 digit  = "0123456789";
    78 punctuation = "().,";
    79 ifstmt = "IF (A > B) {";
    80 \end{cfa}
    81 Note, the include file @string.hfa@ to access type @string@.
     79\end{cfa}
     80Note, the include file @<string.hfa>@ to access type @string@.
    8281
    8382
     
    394393Extending the pattern to a regular expression is a possible extension.
    395394
    396 
    397 \subsection{Searching}
    398 
    399 The @index@ operation
    400 \begin{cfa}
    401 int index( const string & key, int start = 1, occurrence occ = first );
    402 \end{cfa}
    403 returns the position of the first or last occurrence of the @key@ (depending on the occurrence indicator @occ@ that is either @first@ or @last@) in the current string starting the search at position @start@.
    404 If the @key@ does not appear in the current string, the length of the current string plus one is returned.
    405 %If the @key@ has zero length, the value 1 is returned regardless of what the current string contains.
    406 A negative starting position is a specification from the right end of the string.
     395The replace operation returns a string in which all occurrences of a substring are replaced by another string.
    407396\begin{cquote}
    408397\setlength{\tabcolsep}{15pt}
    409398\begin{tabular}{@{}l|l@{}}
    410399\begin{cfa}
    411 i = find( digit, "567" );
    412 i = find( digit, "567", 7 );
    413 i = digit.index( "567", -1, last );
    414 i = name.index( "E", 5, last );
    415 \end{cfa}
    416 &
    417 \begin{cfa}
     400s = replace( "PETER", "E", "XX" );
     401s = replace( "PETER", "ET", "XX" );
     402s = replace( "PETER", "W", "XX" );
     403\end{cfa}
     404&
     405\begin{cfa}
     406"PXXTXXR"
     407"PXXER"
     408"PETER"
     409\end{cfa}
     410\end{tabular}
     411\end{cquote}
     412The replacement is done left-to-right and substituted text is not examined for replacement.
     413
     414
     415\subsection{Searching}
     416
     417The find operation returns the position of the first occurrence of a key string in a string.
     418If the key does not appear in the current string, the length of the current string plus one is returned.
     419\begin{cquote}
     420\setlength{\tabcolsep}{15pt}
     421\begin{tabular}{@{}l|l@{}}
     422\begin{cfa}
     423i = find( digit, '3' );
     424i = "45" ^ digit; // python style "45" in digit
     425string x = "567";
     426i = find( digit, x );
     427\end{cfa}
     428&
     429\begin{cfa}
     4303
     4314
     432
    4184335
    419 10
    420 
    421 
    422 \end{cfa}
    423 \end{tabular}
    424 \end{cquote}
    425 The next two string operations test a string to see if it is or is not composed completely of a particular class of characters.
    426 For example, are the characters of a string all alphabetic or all numeric?
    427 Use of these operations involves a two step operation.
    428 First, it is necessary to create an instance of type @strmask@ and initialize it to a string containing the characters of the particular character class, as in:
    429 \begin{cfa}
    430 strmask digitmask = digit;
    431 strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" );
    432 \end{cfa}
    433 Second, the character mask is used in the functions @include@ and @exclude@ to check a string for compliance of its characters with the characters indicated by the mask.
    434 
    435 The @include@ operation
    436 \begin{cfa}
    437 int include( const strmask &, int = 1, occurrence occ = first );
    438 \end{cfa}
    439 returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does not appear in the @mask@ starting the search at position @start@;
    440 hence it skips over characters in the current string that are included (in) the @mask@.
    441 The characters in the current string do not have to be in the same order as the @mask@.
    442 If all the characters in the current string appear in the @mask@, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
    443 A negative starting position is a specification from the right end of the string.
    444 \begin{cfa}
    445 i = name.include( digitmask );          $\C{// i is assigned 1}$
    446 i = name.include( alphamask );          $\C{// i is assigned 6}$
    447 \end{cfa}
    448 
    449 The @exclude@ operation
    450 \begin{cfa}
    451 int exclude( string &mask, int start = 1, occurrence occ = first )
    452 \end{cfa}
    453 returns the position of the first or last character (depending on the occurrence indicator, which is either @first@ or @last@) in the current string that does appear in the @mask@ string starting the search at position @start@;
    454 hence it skips over characters in the current string that are excluded from (not in) in the @mask@ string.
    455 The characters in the current string do not have to be in the same order as the @mask@ string.
    456 If all the characters in the current string do NOT appear in the @mask@ string, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
    457 A negative starting position is a specification from the right end of the string.
    458 \begin{cfa}
    459 i = name.exclude( digitmask );          $\C{// i is assigned 6}$
    460 i = ifstmt.exclude( strmask( punctuation ) ); $\C{// i is assigned 4}$
    461 \end{cfa}
    462 
    463 The @includeStr@ operation:
    464 \begin{cfa}
    465 string includeStr( strmask &mask, int start = 1, occurrence occ = first )
    466 \end{cfa}
    467 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that ARE included in the @mask@ string starting the search at position @start@.
    468 A negative starting position is a specification from the right end of the string.
    469 \begin{cfa}
    470 s = name.includeStr( alphamask );       $\C{// s is assigned "MIKE"}$
    471 s = ifstmt.includeStr( alphamask );     $\C{// s is assigned "IF"}$
    472 s = name.includeStr( digitmask );       $\C{// s is assigned ""}$
    473 \end{cfa}
    474 
    475 The @excludeStr@ operation:
    476 \begin{cfa}
    477 string excludeStr( strmask &mask, int start = 1, occurrence = first )
    478 \end{cfa}
    479 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) of the current string that are excluded (NOT) in the @mask@ string starting the search at position @start@.
    480 A negative starting position is a specification from the right end of the string.
    481 \begin{cfa}
    482 s = name.excludeStr( digitmask);        $\C{// s is assigned "MIKE"}$
    483 s = ifstmt.excludeStr( strmask( punctuation ) ); $\C{// s is assigned "IF "}$
    484 s = name.excludeStr( alphamask);        $\C{// s is assigned ""}$
    485 \end{cfa}
    486 
    487 
    488 \subsection{Miscellaneous}
    489 
    490 The @trim@ operation
    491 \begin{cfa}
    492 string trim( string &mask, occurrence occ = first )
    493 \end{cfa}
    494 returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either @first@ or @last@) which ARE included in the @mask@ are removed.
    495 \begin{cfa}
    496 // remove leading blanks
    497 s = string( "   ABC" ).trim( " " );     $\C{// s is assigned "ABC",}$
    498 // remove trailing blanks
    499 s = string( "ABC   " ).trim( " ", last ); $\C{// s is assigned "ABC",}$
    500 \end{cfa}
    501 
    502 The @translate@ operation
    503 \begin{cfa}
    504 string translate( string &from, string &to )
    505 \end{cfa}
    506 returns a string that is the same length as the original string in which all occurrences of the characters that appear in the @from@ string have been translated into their corresponding character in the @to@ string.
    507 Translation is done on a character by character basis between the @from@ and @to@ strings; hence these two strings must be the same length.
    508 If a character in the original string does not appear in the @from@ string, then it simply appears as is in the resulting string.
    509 \begin{cfa}
    510 // upper to lower case
    511 name = name.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
    512                         // name is assigned "name"
    513 s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
    514                         // ifstmt is assigned "if (a > b) {"
    515 // lower to upper case
    516 name = name.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" );
    517                         // name is assigned "MIKE"
    518 \end{cfa}
    519 
    520 The @replace@ operation
    521 \begin{cfa}
    522 string replace( string &from, string &to )
    523 \end{cfa}
    524 returns a string in which all occurrences of the @from@ string in the current string have been replaced by the @to@ string.
    525 \begin{cfa}
    526 s = name.replace( "E", "XX" );          $\C{// s is assigned "PXXTXXR"}$
    527 \end{cfa}
    528 The replacement is done left-to-right.
    529 When an instance of the @from@ string is found and changed to the @to@ string, it is NOT examined again for further replacement.
     434\end{cfa}
     435\end{tabular}
     436\end{cquote}
     437The character-class operations indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
     438\begin{cquote}
     439\setlength{\tabcolsep}{15pt}
     440\begin{tabular}{@{}l|l@{}}
     441\begin{cfa}
     442charclass vowels{ "aeiouy" };
     443i = include( "aaeiuyoo", vowels );
     444i = include( "aabiuyoo", vowels );
     445\end{cfa}
     446&
     447\begin{cfa}
     448
     4498  // compliant
     4502  // b non-compliant
     451\end{cfa}
     452\end{tabular}
     453\end{cquote}
     454@vowels@ defines a character class and function @include@ checks if all characters in the string are included in the class (compliance).
     455The position of the last character plus 1 is return if the string is compliant or the position of the first non-compliant character.
     456There is no relationship between the order of characters in the two strings.
     457Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance).
     458\begin{cquote}
     459\setlength{\tabcolsep}{15pt}
     460\begin{tabular}{@{}l|l@{}}
     461\begin{cfa}
     462i = exclude( "cdbfghmk", vowels );
     463i = exclude( "cdyfghmk", vowels );
     464\end{cfa}
     465&
     466\begin{cfa}
     4678  // compliant
     4682  // y non-compliant
     469\end{cfa}
     470\end{tabular}
     471\end{cquote}
     472Both forms can return the longest substring of compliant characters.
     473\begin{cquote}
     474\setlength{\tabcolsep}{15pt}
     475\begin{tabular}{@{}l|l@{}}
     476\begin{cfa}
     477s = include( "aaeiuyoo", vowels );
     478s = include( "aabiuyoo", vowels );
     479s = exclude( "cdbfghmk", vowels );
     480s = exclude( "cdyfghmk", vowels );
     481\end{cfa}
     482&
     483\begin{cfa}
     484"aaeiuyoo"
     485"aa"
     486"cdbfghmk"
     487"cd"
     488\end{cfa}
     489\end{tabular}
     490\end{cquote}
     491
     492The test operation checks if each character in a string is in one of the C character classes.
     493\begin{cquote}
     494\setlength{\tabcolsep}{15pt}
     495\begin{tabular}{@{}l|l@{}}
     496\begin{cfa}
     497i = test( "1FeC34aB", @isxdigit@ );
     498i = test( ".,;'!\"", @ispunct@ );
     499i = test( "XXXx", @isupper@ );
     500\end{cfa}
     501&
     502\begin{cfa}
     5038   // compliant
     5046   // compliant
     5053   // non-compliant
     506\end{cfa}
     507\end{tabular}
     508\end{cquote}
     509The position of the last character plus 1 is return if the string is compliant or the position of the first non-compliant character.
     510
     511Combining substring and search allows actions like trimming whitespace from the start of a line.
     512\begin{cquote}
     513\setlength{\tabcolsep}{15pt}
     514\begin{tabular}{@{}l|l@{}}
     515\begin{cfa}
     516string line = "  \t  xxx yyy zzz";
     517string trim = line( test( line, isspace ) );
     518\end{cfa}
     519&
     520\begin{cfa}
     521
     522"xxx yyy zzz"
     523\end{cfa}
     524\end{tabular}
     525\end{cquote}
     526
     527The translate operation returns a string with each character transformed by one of the C character transformation functions.
     528\begin{cquote}
     529\setlength{\tabcolsep}{15pt}
     530\begin{tabular}{@{}l|l@{}}
     531\begin{cfa}
     532s = translate( "abc", @toupper@ );
     533s = translate( "ABC", @tolower@ );
     534int tospace( int c ) { return isspace( c ) ? ' ' : c; }
     535s = translate( "X X\tX\nX", @tospace@ );
     536\end{cfa}
     537&
     538\begin{cfa}
     539"ABC"
     540"abc"
     541
     542"X X X X"
     543\end{cfa}
     544\end{tabular}
     545\end{cquote}
    530546
    531547
Note: See TracChangeset for help on using the changeset viewer.