Changeset 780727f


Ignore:
Timestamp:
Sep 15, 2025, 9:20:53 PM (7 weeks ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
4b465445
Parents:
8317671
Message:

harmonize user manual string discussion with string chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/string.tex

    r8317671 r780727f  
    5757
    5858The \CFA string type is for manipulation of dynamically-sized character-strings versus C @char *@ type for manipulation of statically-sized null-terminated character-strings.
    59 Hence, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
    60 As a result, a @string@ declaration does not specify a maximum length, where a C string must.
     59Therefore, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
     60As a result, a @string@ declaration does not specify a maximum length, where a C string array does.
     61For \CFA, as a @string@ dynamically grows and shrinks in size, so does its underlying storage.
     62For C, as a string dynamically grows and shrinks in size, but its underlying storage does not.
    6163The maximum storage for a \CFA @string@ value is @size_t@ characters, which is $2^{32}$ or $2^{64}$ respectively.
    6264A \CFA string manages its length separately from the string, so there is no null (@'\0'@) terminating value at the end of a string value.
     
    8688Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ (as in Java).
    8789\begin{cquote}
    88 \setlength{\tabcolsep}{15pt}
    8990\begin{tabular}{@{}l|ll|l@{}}
    9091\begin{cfa}
    91 string s;
     92string s = 5;
    9293s = 'x';
    9394s = "abc";
    94 s = cs;
    95 s = 45hh;
    96 s = 45h;
    97 \end{cfa}
    98 &
    99 \begin{cfa}
    100 
     95s = 42hh;               /* signed char */
     96s = 42h;                /* short int */
     97s = 0xff;
     98\end{cfa}
     99&
     100\begin{cfa}
     101"5"
    101102"x"
    102103"abc"
    103 "abc"
    104 "45"
    105 "45"
    106 \end{cfa}
    107 &
    108 \begin{cfa}
    109         s = (ssize_t)MIN;
    110         s = (size_t)MAX;
    111         s = 5.5;
    112         s = 5.5L;
    113         s = 5.5+3.4i;
    114         s = 5.5L+3.4Li;
     104"42"
     105"42"
     106"255"
     107\end{cfa}
     108&
     109\begin{cfa}
     110s = (ssize_t)MIN;
     111s = (size_t)MAX;
     112s = 5.5;
     113s = 5.5L;
     114s = 5.5+3.4i;
     115s = 5.5L+3.4Li;
    115116\end{cfa}
    116117&
     
    127128Conversions can be explicitly specified using a compound literal.
    128129\begin{cfa}
    129 s = (string){ "abc" };                          $\C{// converts char * to string}$
    130 s = (string){ 5 };                                      $\C{// converts int to string}$
    131 s = (string){ 5.5 };                            $\C{// converts double to string}$
    132 \end{cfa}
    133 
    134 Conversions from @string@ to @char *@ attempt to be safe:
    135 either by requiring the maximum length of the @char *@ storage (@strncpy@) or allocating the @char *@ storage for the string characters (ownership), meaning the programmer must free the storage.
    136 Note, a C string is always null terminated, implying a minimum size of 1 character.
    137 \begin{cquote}
    138 \setlength{\tabcolsep}{15pt}
    139 \begin{tabular}{@{}l|l@{}}
    140 \begin{cfa}
     130s = (string){ 5 };    s = (string){ "abc" };   s = (string){ 5.5 };
     131\end{cfa}
     132
     133Conversions from @string@ to @char *@ attempt to be safe.
     134The @strncpy@ conversion requires the maximum length for the pointer's target buffer.
     135The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it.
     136Note, a C string is always null terminated, implying storage is always necessary for the null.
     137\begin{cquote}
     138\begin{tabular}{@{}l|l@{}}
     139\begin{cfa}
     140string s = "abcde";
     141char cs[4];
    141142strncpy( cs, s, sizeof(cs) );
    142 char * cp = s;
     143char * cp = s;          // ownership
    143144delete( cp );
    144 cp = s + ' ' + s;
     145cp = s + ' ' + s;       // ownership
    145146delete( cp );
    146147\end{cfa}
    147148&
    148149\begin{cfa}
     150
     151
    149152"abc\0", in place
    150153"abcde\0", malloc
    151 ownership
     154
    152155"abcde abcde\0", malloc
    153 ownership
     156
    154157\end{cfa}
    155158\end{tabular}
     
    162165For compatibility, @strlen@ also works with \CFA strings.
    163166\begin{cquote}
    164 \setlength{\tabcolsep}{15pt}
    165167\begin{tabular}{@{}l|l@{}}
    166168\begin{cfa}
     
    187189\subsection{Comparison Operators}
    188190
    189 The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare \CFA string values using lexicographical ordering, where longer strings are greater than shorter strings.
     191The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare \CFA strings using lexicographical ordering, where longer strings are greater than shorter strings.
    190192In C, these operators compare the C string pointer not its value, which does not match programmer expectation.
    191193C strings use function @strcmp@ to lexicographically compare the string value.
     
    196198
    197199The binary operators @+@ and @+=@ concatenate C @char@, @char *@ and \CFA strings, creating the sum of the characters.
    198 \par\noindent
     200\begin{cquote}
    199201\begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}}
    200202\begin{cfa}
     
    246248\end{cfa}
    247249\end{tabular}
    248 \par\noindent
     250\end{cquote}
    249251However, including @<string.hfa>@ can result in ambiguous uses of the overloaded @+@ operator.\footnote{Combining multiple packages in any programming language can result in name clashes or ambiguities.}
    250 While subtracting characters or pointers has a low-level use-case
    251 \begin{cfa}
    252 ch - '0'    $\C[2in]{// find character offset}$
    253 cs - cs2;  $\C{// find pointer offset}\CRT$
     252For example, subtracting characters or pointers has valid use-cases:
     253\begin{cfa}
     254ch - '0'        $\C[2in]{// find character offset}$
     255cs - cs2;       $\C{// find pointer offset}\CRT$
    254256\end{cfa}
    255257addition is less obvious
    256258\begin{cfa}
    257 ch + 'b'    $\C[2in]{// add character values}$
    258 cs + 'a';  $\C{// move pointer cs['a']}\CRT$
     259ch + 'b'        $\C[2in]{// add character values}$
     260cs + 'a';       $\C{// move pointer cs['a']}\CRT$
    259261\end{cfa}
    260262There are legitimate use cases for arithmetic with @signed@/@unsigned@ characters (bytes), and these types are treated differently from @char@ in \CC and \CFA.
     
    262264Similarly, it is impossible to restrict or remove addition on type @char *@ because (unfortunately) it is subscripting: @cs + 'a'@ implies @cs['a']@ or @'a'[cs]@.
    263265
    264 The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly.
     266The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ constants work correctly (variables are the same).
    265267The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs.
    266268Hence, the type system correctly handles all uses of addition (explicit or implicit) for @char *@.
     
    270272Only @char@ addition can result in ambiguities, and only when there is no left-hand information.
    271273\begin{cfa}
    272 ch = ch + 'b'; $\C[2in]{// LHS disambiguate, add character values}$
    273 s = 'a' + 'b'; $\C{// LHS disambiguate, concatenate characters}$
     274ch = ch + 'b';          $\C[2in]{// LHS disambiguate, add character values}$
     275s = 'a' + 'b';          $\C{// LHS disambiguate, concatenate characters}$
    274276printf( "%c\n", @'a' + 'b'@ ); $\C{// no LHS information, ambiguous}$
    275277printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast}\CRT$
     
    277279The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion).
    278280Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator @+@ for @string@ types is not a problem.
    279 Note, other programming languages that repurpose @+@ for concatenation, could have similar ambiguity issues.
     281Note, other programming languages that repurpose @+@ for concatenation, can have similar ambiguity issues.
    280282
    281283Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution.
     
    297299If $N = 0$, a zero length string, @""@, is returned.
    298300\begin{cquote}
    299 \setlength{\tabcolsep}{15pt}
    300301\begin{tabular}{@{}l|l@{}}
    301302\begin{cfa}
     
    303304s = 'x' * 3;
    304305s = "abc" * 3;
    305 s = (name + ' ') * 3;
    306 \end{cfa}
    307 &
    308 \begin{cfa}
    309 "
     306s = ("MIKE" + ' ') * 3;
     307\end{cfa}
     308&
     309\begin{cfa}
     310""
    310311"xxx"
    311312"abcabcabc"
     
    315316\end{cquote}
    316317Like concatenation, there is a potential ambiguity with multiplication of characters;
    317 multiplication for pointers does not exist in C.
    318 \begin{cfa}
    319 ch = ch * 3; $\C[2in]{// LHS disambiguate, multiply character values}$
    320 s = 'a' * 3; $\C{// LHS disambiguate, concatenate characters}$
     318multiplication of pointers does not exist in C.
     319\begin{cfa}
     320ch = ch * 3;            $\C[2in]{// LHS disambiguate, multiply character values}$
     321s = 'a' * 3;            $\C{// LHS disambiguate, concatenate characters}$
    321322printf( "%c\n", @'a' * 3@ ); $\C{// no LHS information, ambiguous}$
    322323printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast}\CRT$
     
    326327
    327328\subsection{Substring}
    328 The substring operation returns a subset of a string starting at a position in the string and traversing a length or matching a pattern string.
     329
     330The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string.
    329331\begin{cquote}
    330332\setlength{\tabcolsep}{10pt}
    331333\begin{tabular}{@{}l|ll|l@{}}
    332 \multicolumn{2}{c}{\textbf{length}} & \multicolumn{2}{c}{\textbf{pattern}} \\
    333 \begin{cfa}
    334 s = name( 2, 2 );
    335 s = name( 3, -2 );
    336 s = name( 2, 8 );
    337 s = name( 0, -1 );
    338 s = name( -1, -1 );
     334\multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\
     335\multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\
     336\begin{cfa}
     337s = name( 0, 4 );
     338s = name( 1, 4 );
     339s = name( 2, 4 );
     340s = name( 4, -2 );
     341s = name( 8, 2 );
     342s = name( 0, -2 );
     343s = name( -1, -2 );
    339344s = name( -3 );
    340345\end{cfa}
    341346&
    342347\begin{cfa}
    343 "KE"
    344 "IK"
    345 "KE", clip length to 2
    346 "", beyond string clip to null
    347 "K"
    348 "IKE", to end of string
    349 \end{cfa}
    350 &
    351 \begin{cfa}
    352 s = name( "IK" );
     348"PETE"
     349"ETER"
     350"TER"   // clip length to 3
     351"ER"
     352""                 // beyond string to right, clip to null
     353""                 // beyond string to left, clip to null
     354"ER"
     355"TER"   // to end of string
     356\end{cfa}
     357&
     358\begin{cfa}
     359s = name( "ET" );
    353360s = name( "WW" );
    354361
     
    356363
    357364
    358 \end{cfa}
    359 &
    360 \begin{cfa}
    361 "IK"
    362 ""
    363 
    364 
    365 
    366 
    367 \end{cfa}
    368 \end{tabular}
    369 \end{cquote}
    370 A negative starting position is a specification from the right end of the string.
     365
     366
     367\end{cfa}
     368&
     369\begin{cfa}
     370"ET"
     371""  // does not occur
     372
     373
     374
     375
     376
     377
     378\end{cfa}
     379\end{tabular}
     380\end{cquote}
     381For the length form, a negative starting position is a specification from the right end of the string.
    371382A negative length means that characters are selected in the opposite (right to left) direction from the starting position.
    372383If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string.
    373384If the substring request is completely outside of the original string, a null string is returned.
    374 The pattern-form either returns the pattern string is the pattern matches or a null string if the pattern does not match.
     385For the pattern-form, it returns the pattern string if the pattern matches or a null string if the pattern does not match.
    375386The usefulness of this mechanism is discussed next.
    376387
     
    379390Hence, the left string may decrease, stay the same, or increase in length.
    380391\begin{cquote}
    381 \setlength{\tabcolsep}{15pt}
    382392\begin{tabular}{@{}l|l@{}}
    383393\begin{cfa}[escapechar={}]
     
    398408\end{tabular}
    399409\end{cquote}
    400 Now pattern matching is useful on the left-hand side of assignment.
    401 \begin{cquote}
    402 \setlength{\tabcolsep}{15pt}
     410Now substring pattern matching is useful on the left-hand side of assignment.
     411\begin{cquote}
    403412\begin{tabular}{@{}l|l@{}}
    404413\begin{cfa}[escapechar={}]
     
    415424Extending the pattern to a regular expression is a possible extension.
    416425
    417 The replace operation extensions substring to substitute all occurrences.
    418 \begin{cquote}
    419 \setlength{\tabcolsep}{15pt}
     426The replace operation extends substring to substitute all occurrences.
     427\begin{cquote}
    420428\begin{tabular}{@{}l|l@{}}
    421429\begin{cfa}
     
    437445\subsection{Searching}
    438446
    439 The find operation returns the position of the first occurrence of a key in a string.
     447The @find@ operation returns the position of the first occurrence of a key in a string.
    440448If the key does not appear in the string, the length of the string is returned.
    441449\begin{cquote}
    442 \setlength{\tabcolsep}{15pt}
    443450\begin{tabular}{@{}l|l@{}}
    444451\begin{cfa}
     
    458465A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
    459466\begin{cquote}
    460 \setlength{\tabcolsep}{15pt}
    461467\begin{tabular}{@{}l|l@{}}
    462468\begin{cfa}
     
    478484Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance).
    479485\begin{cquote}
    480 \setlength{\tabcolsep}{15pt}
    481486\begin{tabular}{@{}l|l@{}}
    482487\begin{cfa}
     
    493498Both forms can return the longest substring of compliant characters.
    494499\begin{cquote}
    495 \setlength{\tabcolsep}{15pt}
    496500\begin{tabular}{@{}l|l@{}}
    497501\begin{cfa}
     
    513517There are also versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.}
    514518\begin{cquote}
    515 \setlength{\tabcolsep}{15pt}
    516519\begin{tabular}{@{}l|l@{}}
    517520\begin{cfa}
     
    533536The translate operation returns a string with each character transformed by one of the C character transformation functions.
    534537\begin{cquote}
    535 \setlength{\tabcolsep}{15pt}
    536538\begin{tabular}{@{}l|l@{}}
    537539\begin{cfa}
     
    580582\begin{figure}
    581583\begin{cquote}
    582 \setlength{\tabcolsep}{15pt}
    583584\begin{tabular}{@{}l|l@{}}
    584585\multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\
     
    626627Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@.
    627628\begin{cquote}
    628 \setlength{\tabcolsep}{15pt}
    629629\begin{tabular}{@{}ll@{}}
    630630\begin{cfa}
     
    659659The \CC manipulators are @setw@, and its associated width controls @left@, @right@ and @setfill@.
    660660\begin{cquote}
    661 \setlength{\tabcolsep}{15pt}
    662661\begin{tabular}{@{}l|l@{}}
    663662\begin{c++}
     
    677676The \CFA manipulators are @bin@, @oct@, @hex@, @wd@, and its associated width control and @left@.
    678677\begin{cquote}
    679 \setlength{\tabcolsep}{15pt}
    680678\begin{tabular}{@{}l|l@{}}
    681679\begin{cfa}
     
    706704Reading into a @char@ is safe as the size is 1, @char *@ is unsafe without using @setw@ to constraint the length (which includes @'\0'@), @string@ is safe as its grows dynamically as characters are read.
    707705\begin{cquote}
    708 \setlength{\tabcolsep}{15pt}
    709706\begin{tabular}{@{}l|l@{}}
    710707\begin{c++}
     
    771768\CC modifies the mutable receiver object, replacing by position (zero origin) and length.
    772769\begin{cquote}
    773 \setlength{\tabcolsep}{15pt}
    774770\begin{tabular}{@{}l|l@{}}
    775771\begin{c++}
     
    787783\label{p:JavaReplace}
    788784\begin{cquote}
    789 \setlength{\tabcolsep}{15pt}
    790785\begin{tabular}{@{}l|l@{}}
    791786\begin{java}
     
    802797Java also provides a mutable @StringBuffer@, replacing by position (zero origin) and length.
    803798\begin{cquote}
    804 \setlength{\tabcolsep}{15pt}
    805799\begin{tabular}{@{}l|l@{}}
    806800\begin{java}
     
    12651259The common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present \CFA lowering.
    12661260\begin{cquote}
    1267 \setlength{\tabcolsep}{15pt}
    12681261\begin{tabular}{@{}l|l@{}}
    12691262\begin{cfa}
     
    13661359Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases need revisions.
    13671360\begin{cquote}
    1368 \setlength{\tabcolsep}{15pt}
    13691361\begin{tabular}{ll}
    13701362HL & LL \\
Note: See TracChangeset for help on using the changeset viewer.