Changeset 829a955 for doc/user


Ignore:
Timestamp:
Sep 15, 2025, 5:11:15 PM (8 days ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
8317671
Parents:
fae93a40
Message:

update strings, update for-control and string documentation

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/user/user.tex

    rfae93a40 r829a955  
    1111%% Created On       : Wed Apr  6 14:53:29 2016
    1212%% Last Modified By : Peter A. Buhr
    13 %% Last Modified On : Mon Apr 14 20:53:55 2025
    14 %% Update Count     : 7065
     13%% Last Modified On : Mon Sep 15 17:06:25 2025
     14%% Update Count     : 7216
    1515%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    1616
     
    6262\setlength{\topmargin}{-0.45in}                                                 % move running title into header
    6363\setlength{\headsep}{0.25in}
     64\setlength{\tabcolsep}{15pt}
    6465
    6566%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    703704In addition, inclusive ranges are allowed using symbol ©~© to specify a contiguous set of case values, both positive and negative.
    704705\begin{cquote}
    705 \setlength{\tabcolsep}{15pt}
    706706\begin{tabular}{@{}llll@{}}
    707707\multicolumn{1}{c}{\textbf{C}}  & \multicolumn{1}{c}{\textbf{\CFA}}     & \multicolumn{1}{c}{\textbf{©gcc©}}    \\
     
    10051005\end{tabular}
    10061006\end{cquote}
    1007 The target label must be below the \Indexc{fallthrough} and may not be nested in a control structure, and
    1008 the target label must be at the same or higher level as the containing \Indexc{case} clause and located at
    1009 the same level as a ©case© clause; the target label may be case \Indexc{default}, but only associated
    1010 with the current \Indexc{switch}/\Indexc{choose} statement.
     1007The target label must be below the \Indexc{fallthrough} and may not be nested in a control structure, and the target label must be at the same or higher level as the containing \Indexc{case} clause and located at the same level as a ©case© clause;
     1008the target label may be case \Indexc{default}, but only associated with the current \Indexc{switch}/\Indexc{choose} statement.
    10111009
    10121010
    10131011\subsection{Loop Control}
    10141012
     1013\CFA condenses writing loops to facilitate coding speed and safety.
     1014
     1015To simplify creating an infinite loop, the \Indexc{for}, \Indexc{while}, and \Indexc{do} loop-predicate\index{loop predicate} is extended with an empty conditional, meaning a comparison value of ©1© (true).
     1016\begin{cfa}
     1017while ( )                               §\C{// while ( true )}§
     1018for ( )                                 §\C{// for ( ; true; )}§
     1019do ... while ( )                §\C{// do ... while ( true )}§
     1020\end{cfa}
     1021
    10151022Looping a predefined number of times, possibly with a loop index, occurs frequently.
    1016 \CFA condenses writing loops to facilitate coding speed and safety.
    1017 
    1018 \Indexc{for}, \Indexc{while}, and \Indexc{do} loop-control\index{loop control} are extended with an empty conditional, meaning a comparison value of ©1© (true).
    1019 \begin{cfa}
    1020 while ( ®/* empty */®  )                                §\C{// while ( true )}§
    1021 for ( ®/* empty */®  )                                  §\C{// for ( ; true; )}§
    1022 do ... while ( ®/* empty */®  )                 §\C{// do ... while ( true )}§
    1023 \end{cfa}
    1024 
    10251023The ©for© control\index{for control}, \ie ©for ( /* control */ )©, is extended with a range and step.
    10261024A range is a set of values defined by an optional low value (default to 0), tilde, and high value, ©L ~ H©, with an optional step ©~ S© (default to 1), which means an ascending set of values from ©L© to ©H© in positive steps of ©S©.
     
    10311029\end{cfa}
    10321030\R{Warning}: A range in descending order, \eg ©5 ~ -3© is the null (empty) set, \ie no values in the set.
    1033 \R{Warning}: A ©0© or negative step is undefined.
    1034 Note, the order of values in a set may not be the order the values are presented during looping.
     1031As well, a ©0© or negative step is undefined.
    10351032
    10361033The range character, ©'~'©, is decorated on the left and right to control how the set values are presented in the loop body.
     
    10421039-8 ®§\Sp§®~ -2                                                  §\C{// ascending, no prefix}§
    104310400 ®+®~ 5                                                                §\C{// ascending, prefix}§
    1044 -3 ®-®~ 3                                                               §\C{// descending
     1041-3 ®-®~ 3                                                               §\C{// descending, prefix
    10451042\end{cfa}
    10461043For descending iteration, the ©L© and ©H© values are \emph{implicitly} switched, and the increment/decrement for ©S© is toggled.
     1044Hence, the order of values in a set may not be the order the values are presented during looping.
    10471045When changing the iteration direction, this form is faster and safer, \ie the direction prefix can be added/removed without changing existing (correct) program text.
    10481046\R{Warning}: reversing the range endpoints for descending order results in an empty set.
     
    10581056\index{-\~}\index{descending exclusive range}
    10591057\index{-\~=}\index{descending inclusive range}
     1058
     1059\begin{comment}
     1060To simplify loop iteration a range is provided, from low to high, and a traversal direction, ascending (©+©) or descending (©-©).
     1061The following is the syntax for the loop range, where ©[©\,©]© means optional.
     1062\begin{cfa}[deletekeywords=default]
     1063[ ®index ;® ] [ [ ®min® (default 0) ] [ direction ®+®/®-® (default +) ] ®~® [ ®=® (include endpoint) ] ] ®max® [ ®~ increment® ]
     1064\end{cfa}
     1065For ©=©, the range includes the endpoint (©max©/©min©) depending on the direction (©+©/©-©).
     1066\end{comment}
    10601067
    10611068©for© control is formalized by the following regular expression:
     
    24222429\label{s:stringType}
    24232430
    2424 The \CFA \Indexc{string} type is for manipulation of dynamically-size character-strings versus C \Indexc{char *} type for manipulation of statically-size null-terminated character-strings.
    2425 That is, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
    2426 Hence, a ©string© declaration does not specify a maximum length;
    2427 as a string dynamically grows and shrinks in size, so does its underlying storage.
    2428 In contrast, a C string also dynamically grows and shrinks is size, but its underlying storage is fixed.
     2431A string is a sequence of symbols, where the form of a symbol can vary significantly: regular 7/8-bit ASCII/Latin-1, or wide 2/4/8-byte UNICODE or variable length UTF-8/16/32.
     2432A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes ©"xyz\n"©.
     2433Currently, \CFA strings only support regular characters.
     2434
     2435A string type is designed to operate on groups of characters for assigning, copying, scanning, and updating, rather than working with individual characters.
     2436The \CFA \Indexc{string} type is for manipulation of dynamically-sized strings versus C \Indexc{char *} type for manipulation of statically-sized null-terminated strings.
     2437Therefore, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
     2438As a result, a ©string© declaration does not specify a maximum length, where a C string array does.
     2439For \CFA, as a ©string© dynamically grows and shrinks in size, so does its underlying storage.
     2440For C, as a string dynamically grows and shrinks in size, but its underlying storage does not.
    24292441The maximum storage for a \CFA ©string© value is ©size_t© characters, which is $2^{32}$ or $2^{64}$ respectively.
    24302442A \CFA string manages its length separately from the string, so there is no null (©'\0'©) terminating value at the end of a string value.
    24312443Hence, a \CFA string cannot be passed to a C string manipulation routine, such as ©strcat©.
    2432 Like C strings, the characters in a ©string© are numbered starting from 0.
    2433 
    2434 The following operations have been defined to manipulate an instance of type ©string©.
    2435 The discussion assumes the following declarations and assignment statements are executed.
    2436 \begin{cfa}
    2437 #include ®<string.hfa>®
    2438 ®string® s, peter, digit, alpha, punctuation, ifstmt;
    2439 int i;
    2440 peter  = "PETER";
    2441 digit  = "0123456789";
    2442 punctuation = "().,";
    2443 ifstmt = "IF (A > B) {";
    2444 \end{cfa}
    2445 Note, the include file \Indexc{string.hfa} to access type ©string©.
    2446 
    2447 
    2448 \subsection{Implicit String Conversions}
    2449 
    2450 The types ©char©, ©char *©, ©int©, ©double©, ©_Complex©, including different signness and sizes, implicitly convert to type ©string©.
    2451 \VRef[Figure]{f:ImplicitConversionsString} shows examples of implicit conversions between C strings, integral, floating-point and complex types to ©string©.
    2452 A conversions can be explicitly specified:
    2453 \begin{cfa}
    2454 s = string( "abc" );                            §\C{// converts char * to string}§
    2455 s = string( 5 );                                        §\C{// converts int to string}§
    2456 s = string( 5.5 );                                      §\C{// converts double to string}§
    2457 \end{cfa}
    2458 All conversions from ©string© to ©char *©, attempt to be safe:
    2459 either by requiring the maximum length of the ©char *© storage (©strncpy©) or allocating the ©char *© storage for the string characters (ownership), meaning the programmer must free the storage.
    2460 As well, a string is always null terminates, implying a minimum size of 1 character.
     2444Like C strings, characters in a ©string© are numbered from the left starting at 0 (because subscripting is zero-origin), and in \CFA numbered from the right starting at -1.
    24612445\begin{cquote}
    2462 \begin{tabular}{@{}l@{\hspace{1.75in}}|@{\hspace{15pt}}l@{}}
    2463 \begin{cfa}
    2464 string s = "abcde";
    2465 char cs[3];
    2466 strncpy( cs, s, sizeof(cs) );           §\C{sout | cs;}§
    2467 char * cp = s;                                          §\C{sout | cp;}§
    2468 delete( cp );
    2469 cp = s + ' ' + s;                                       §\C{sout | cp;}§
    2470 delete( cp );
    2471 \end{cfa}
    2472 &
    2473 \begin{cfa}
    2474 
    2475 
    2476 ab
    2477 abcde
    2478 
    2479 abcde abcde
    2480 
    2481 \end{cfa}
     2446\rm
     2447\begin{tabular}{@{}rrrrll@{}}
     2448\small\tt "a & \small\tt b & \small\tt c & \small\tt d & \small\tt e" \\
     24490 & 1 & 2 & 3 & 4 & left to right index \\
     2450-5 & -4 & -3 & -2 & -1 & right to left index
    24822451\end{tabular}
    24832452\end{cquote}
    2484 
    2485 \begin{figure}
    2486 \begin{tabular}{@{}l@{\hspace{15pt}}|@{\hspace{15pt}}l@{}}
    2487 \begin{cfa}
    2488 //      string s = 5;                                   sout | s;
    2489         string s;
    2490         // conversion of char and char * to string
    2491         s = 'x';                                                §\C{sout | s;}§
    2492         s = "abc";                                              §\C{sout | s;}§
    2493         char cs[5] = "abc";
    2494         s = cs;                                                 §\C{sout | s;}§
    2495         // conversion of integral, floating-point, and complex to string
    2496         s = 45hh;                                               §\C{sout | s;}§
    2497         s = 45h;                                                §\C{sout | s;}§
    2498         s = -(ssize_t)MAX - 1;                  §\C{sout | s;}§
    2499         s = (size_t)MAX;                                §\C{sout | s;}§
    2500         s = 5.5;                                                §\C{sout | s;}§
    2501         s = 5.5L;                                               §\C{sout | s;}§
    2502         s = 5.5+3.4i;                                   §\C{sout | s;}§
    2503         s = 5.5L+3.4Li;                                 §\C{sout | s;}§
    2504 \end{cfa}
    2505 &
    2506 \begin{cfa}
    2507 
    2508 
    2509 
    2510 x
    2511 abc
    2512 
    2513 abc
    2514 
    2515 45
    2516 45
    2517 -9223372036854775808
    2518 18446744073709551615
    2519 5.5
    2520 5.5
    2521 5.5+3.4i
    2522 5.5+3.4i
     2453The include file \Indexc{string.hfa} is necessary to access type ©string©.
     2454
     2455
     2456\subsection{Implicit String Conversions}
     2457
     2458The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O.
     2459Hence, the basic types ©char©, ©char *©, ©int©, ©double©, ©_Complex©, including any signness and size variations, implicitly convert to type ©string© (as in Java).
     2460\begin{cquote}
     2461\begin{tabular}{@{}l|ll|l@{}}
     2462\begin{cfa}
     2463string s = 5;
     2464s = 'x';
     2465s = "abc";
     2466s = 42hh;               /* signed char */
     2467s = 42h;                /* short int */
     2468s = 0xff;
     2469\end{cfa}
     2470&
     2471\begin{cfa}
     2472"5"
     2473"x"
     2474"abc"
     2475"42"
     2476"42"
     2477"255"
     2478\end{cfa}
     2479&
     2480\begin{cfa}
     2481s = (ssize_t)MIN;
     2482s = (size_t)MAX;
     2483s = 5.5;
     2484s = 5.5L;
     2485s = 5.5+3.4i;
     2486s = 5.5L+3.4Li;
     2487\end{cfa}
     2488&
     2489\begin{cfa}
     2490"-9223372036854775808"
     2491"18446744073709551615"
     2492"5.5"
     2493"5.5"
     2494"5.5+3.4i"
     2495"5.5+3.4i"
    25232496\end{cfa}
    25242497\end{tabular}
    2525 \caption{Implicit Conversions to String}
    2526 \label{f:ImplicitConversionsString}
    2527 \end{figure}
    2528 
    2529 
    2530 \subsection{Size (length)}
    2531 
    2532 The ©size© operation returns the length of a string.
    2533 \begin{cfa}
    2534 i = size( "" );                                         §\C{// i is assigned 0}§
    2535 i = size( "abc" );                                      §\C{// i is assigned 3}§
    2536 i = size( peter );                                      §\C{// i is assigned 5}§
    2537 \end{cfa}
     2498\end{cquote}
     2499Conversions can be explicitly specified using a compound literal.
     2500\begin{cfa}
     2501s = (string){ 5 };    s = (string){ "abc" };   s = (string){ 5.5 };
     2502\end{cfa}
     2503
     2504Conversions from ©string© to ©char *© attempt to be safe.
     2505The ©strncpy© conversion requires the maximum length for the pointer's target buffer.
     2506The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it.
     2507Note, a C string is always null terminated, implying storage is always necessary for the null.
     2508\begin{cquote}
     2509\begin{tabular}{@{}l|l@{}}
     2510\begin{cfa}
     2511string s = "abcde";
     2512char cs[4];
     2513strncpy( cs, s, sizeof(cs) );
     2514char * cp = s;          // ownership
     2515delete( cp );
     2516cp = s + ' ' + s;       // ownership
     2517delete( cp );
     2518\end{cfa}
     2519&
     2520\begin{cfa}
     2521
     2522
     2523"abc\0", in place
     2524"abcde\0", malloc
     2525
     2526"abcde abcde\0", malloc
     2527
     2528\end{cfa}
     2529\end{tabular}
     2530\end{cquote}
     2531
     2532
     2533\subsection{Length}
     2534
     2535The ©len© operation (short for ©strlen©) returns the length of a C or \CFA string.
     2536For compatibility, ©strlen© works with \CFA strings.
     2537\begin{cquote}
     2538\begin{tabular}{@{}l|l@{}}
     2539\begin{cfa}
     2540i = len( "" );
     2541i = len( "abc" );
     2542i = len( cs );
     2543i = strlen( cs );
     2544i = len( name );
     2545i = strlen( name );
     2546\end{cfa}
     2547&
     2548\begin{cfa}
     25490
     25503
     25513
     25523
     25534
     25544
     2555\end{cfa}
     2556\end{tabular}
     2557\end{cquote}
    25382558
    25392559
    25402560\subsection{Comparison Operators}
    25412561
    2542 The binary \Index{relational operator}s, ©<©, ©<=©, ©>©, ©>=©, and \Index{equality operator}s, ©==©, ©!=©, compare strings using lexicographical ordering, where longer strings are greater than shorter strings.
     2562The binary relational\index{string!relational opertors}, \Indexc{<}, \Indexc{<=}, \Indexc{>}, \Indexc{>=}, and equality\index{string!equality operators}, \Indexc{==}, \Indexc{!=}, operators compare \CFA strings using lexicographical ordering, where longer strings are greater than shorter strings.
     2563In C, these operators compare the C string pointer not its value, which does not match programmer expectation.
     2564C strings use function ©strcmp© to lexicographically compare the string value.
     2565Java has the same issue with ©==© and ©.equals©.
    25432566
    25442567
    25452568\subsection{Concatenation}
    25462569
    2547 The binary operators \Indexc{+} and \Indexc{+=} concatenate two strings, creating the sum of the strings.
    2548 \begin{cfa}
    2549 s = peter + ' ' + digit;                        §\C{// s is assigned "PETER 0123456789"}§
    2550 s += peter;                                                     §\C{// s is assigned "PETER 0123456789PETER"}§
    2551 \end{cfa}
     2570The binary operators \Indexc{+} and \Indexc{+=} concatenate C ©char©, ©char *© and \CFA strings, creating the sum of the characters.
     2571\begin{cquote}
     2572\begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}}
     2573\begin{cfa}
     2574s = "";
     2575s = 'a' + 'b';
     2576s = 'a' + "b";
     2577s = "a" + 'b';
     2578s = "a" + "b";
     2579\end{cfa}
     2580&
     2581\begin{cfa}
     2582
     2583"ab"
     2584"ab"
     2585"ab"
     2586"ab"
     2587\end{cfa}
     2588&
     2589\begin{cfa}
     2590s = "";
     2591s = 'a' + 'b' + s;
     2592s = 'a' + 'b' + s;
     2593s = 'a' + "b" + s;
     2594s = "a" + 'b' + s;
     2595\end{cfa}
     2596&
     2597\begin{cfa}
     2598
     2599"ab"
     2600"abab"
     2601"ababab"
     2602"abababab"
     2603\end{cfa}
     2604&
     2605\begin{cfa}
     2606s = "";
     2607s = s + 'a' + 'b';
     2608s = s + 'a' + "b";
     2609s = s + "a" + 'b';
     2610s = s + "a" + "b";
     2611\end{cfa}
     2612&
     2613\begin{cfa}
     2614
     2615"ab"
     2616"abab"
     2617"ababab"
     2618"abababab"
     2619\end{cfa}
     2620\end{tabular}
     2621\end{cquote}
     2622However, including ©<string.hfa>© can result in ambiguous uses of the overloaded ©+© operator.\footnote{Combining multiple packages in any programming language can result in name clashes or ambiguities.}
     2623For example, subtracting characters or pointers has valid use-cases:
     2624\begin{cfa}
     2625ch - '0'        §\C[2in]{// find character offset}§
     2626cs - cs2;       §\C{// find pointer offset}\CRT§
     2627\end{cfa}
     2628addition is less obvious:
     2629\begin{cfa}
     2630ch + 'b'        §\C[2in]{// add character values}§
     2631cs + 'a';       §\C{// move pointer cs['a']}\CRT§
     2632\end{cfa}
     2633There are legitimate use cases for arithmetic with ©signed©/©unsigned© characters (bytes), and these types are treated differently from ©char© in \CC and \CFA.
     2634However, backwards compatibility makes it impossible to restrict or remove addition on type ©char©.
     2635Similarly, it is impossible to restrict or remove addition on type ©char *© because (unfortunately) it is subscripting: ©cs + 'a'© implies ©cs['a']© or ©'a'[cs]©.
     2636
     2637The prior \CFA concatenation examples show complex mixed-mode interactions among ©char©, ©char *©, and ©string© constants work correctly (variables are the same).
     2638The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs.
     2639Hence, the type system correctly handles all uses of addition (explicit or implicit) for ©char *©.
     2640\begin{cfa}
     2641printf( "%s %s %s %c %c\n", "abc", cs, cs + 3, cs['a'], 'a'[cs] );
     2642\end{cfa}
     2643Only ©char© addition can result in ambiguities, and only when there is no left-hand information.
     2644\begin{cfa}
     2645ch = ch + 'b';          §\C[2in]{// LHS disambiguate, add character values}§
     2646s = 'a' + 'b';          §\C{// LHS disambiguate, concatenate characters}§
     2647printf( "%c\n", ®'a' + 'b'® ); §\C{// no LHS information, ambiguous}§
     2648printf( "%c\n", ®(return char)®('a' + 'b') ); §\C{// disambiguate with ascription cast}\CRT§
     2649\end{cfa}
     2650The ascription cast, ©(return T)©, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion).
     2651Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator ©+© for ©string© types is not a problem.
     2652Note, other programming languages that repurpose ©+© for concatenation, can have similar ambiguity issues.
     2653
     2654Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution.
     2655While it can special case some combinations:
     2656\begin{C++}
     2657s = 'a' + s; §\C[2in]{// compiles in C++}§
     2658s = "a" + s;
     2659\end{C++}
     2660it cannot generalize to any number of steps:
     2661\begin{C++}
     2662s = 'a' + 'b' + s; §\C{// does not compile in C++}\CRT§
     2663s = "a" + "b" + s;
     2664\end{C++}
    25522665
    25532666
     
    25552668
    25562669The binary operators \Indexc{*} and \Indexc{*=} repeat a string $N$ times.
    2557 If $N = 0$, a zero length string, ©""© is returned.
    2558 \begin{cfa}
    2559 s = 'x' * 3;                            §\C{// s is assigned "PETER PETER PETER "}§
    2560 s = (peter + ' ') * 3;                          §\C{// s is assigned "PETER PETER PETER "}§
    2561 \end{cfa}
     2670If $N = 0$, a zero length string, ©""©, is returned.
     2671\begin{cquote}
     2672\begin{tabular}{@{}l|l@{}}
     2673\begin{cfa}
     2674s = 'x' * 0;
     2675s = 'x' * 3;
     2676s = "abc" * 3;
     2677s = ("Peter" + ' ') * 3;
     2678\end{cfa}
     2679&
     2680\begin{cfa}
     2681""
     2682"xxx"
     2683"abcabcabc"
     2684"Peter Peter Peter "
     2685\end{cfa}
     2686\end{tabular}
     2687\end{cquote}
     2688Like concatenation, there is a potential ambiguity with multiplication of characters;
     2689multiplication of pointers does not exist in C.
     2690\begin{cfa}
     2691ch = ch * 3;            §\C[2in]{// LHS disambiguate, multiply character values}§
     2692s = 'a' * 3;            §\C{// LHS disambiguate, concatenate characters}§
     2693printf( "%c\n", ®'a' * 3® ); §\C{// no LHS information, ambiguous}§
     2694printf( "%c\n", ®(return char)®('a' * 3) ); §\C{// disambiguate with ascription cast}\CRT§
     2695\end{cfa}
     2696Fortunately, character multiplication without LHS information is even rarer than addition, so repurposing the operator ©*© for ©string© types is not a problem.
    25622697
    25632698
    25642699\subsection{Substring}
    2565 The substring operation returns a subset of the string starting at a position in the string and traversing a length.
    2566 \begin{cfa}
    2567 s = peter( 2, 3 );                                      §\C{// s is assigned "ETE"}§
    2568 s = peter( 4, -3 );                                     §\C{// s is assigned "ETE", length is opposite direction}§
    2569 s = peter( 2, 8 );                                      §\C{// s is assigned "ETER", length is clipped to 4}§
    2570 s = peter( 0, -1 );                                     §\C{// s is assigned "", beyond string so clipped to null}§
    2571 s = peter(-1, -1 );                                     §\C{// s is assigned "R", start and length are negative}§
    2572 \end{cfa}
    2573 A negative starting position is a specification from the right end of the string.
     2700
     2701The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string.
     2702\begin{cquote}
     2703\setlength{\tabcolsep}{10pt}
     2704\begin{tabular}{@{}l|ll|l@{}}
     2705\multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\
     2706\multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\
     2707\begin{cfa}
     2708s = name( 0, 4 );
     2709s = name( 1, 4 );
     2710s = name( 2, 4 );
     2711s = name( 4, -2 );
     2712s = name( 8, 2 );
     2713s = name( 0, -2 );
     2714s = name( -1, -2 );
     2715s = name( -3 );
     2716\end{cfa}
     2717&
     2718\begin{cfa}
     2719"PETE"
     2720"ETER"
     2721"TER"   // clip length to 3
     2722"ER"
     2723""                 // beyond string to right, clip to null
     2724""                 // beyond string to left, clip to null
     2725"ER"
     2726"TER"   // to end of string
     2727\end{cfa}
     2728&
     2729\begin{cfa}
     2730s = name( "ET" );
     2731s = name( "WW" );
     2732
     2733
     2734
     2735
     2736
     2737
     2738\end{cfa}
     2739&
     2740\begin{cfa}
     2741"ET"
     2742""  // does not occur
     2743
     2744
     2745
     2746
     2747
     2748
     2749\end{cfa}
     2750\end{tabular}
     2751\end{cquote}
     2752For the length form, a negative starting position is a specification from the right end of the string.
    25742753A negative length means that characters are selected in the opposite (right to left) direction from the starting position.
    25752754If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string.
    2576 If the substring request is completely outside of the original string, a null string located at the end of the original string is returned.
    2577 The substring operation can also appear on the left hand side of the assignment operator.
    2578 The substring is replaced by the value on the right hand side of the assignment.
    2579 The length of the right-hand-side value may be shorter, the same length, or longer than the length of the substring that is selected on the left hand side of the assignment.
    2580 \begin{cfa}
    2581 digit( 3, 3 ) = "";                             §\C{// digit is assigned "0156789"}§
    2582 digit( 4, 3 ) = "xyz";                          §\C{// digit is assigned "015xyz9"}§
    2583 digit( 7, 0 ) = "***";                          §\C{// digit is assigned "015xyz***9"}§
    2584 digit(-4, 3 ) = "$$$";                          §\C{// digit is assigned "015xyz\$\$\$9"}§
    2585 \end{cfa}
     2755If the substring request is completely outside of the original string, a null string is returned.
     2756For the pattern-form, it returns the pattern string if the pattern matches or a null string if the pattern does not match.
     2757The usefulness of this mechanism is discussed next.
     2758
     2759The substring operation can appear on the left side of assignment, where it defines a replacement substring.
     2760The length of the right string may be shorter, the same, or longer than the length of left string.
     2761Hence, the left string may decrease, stay the same, or increase in length.
     2762\begin{cquote}
     2763\begin{tabular}{@{}l|l@{}}
     2764\multicolumn{2}{@{}l}{\lstinline{string digit = "0123456789"}} \\
     2765\begin{cfa}[escapechar={}]
     2766digit( 3, 3 ) = "";
     2767digit( 4, 3 ) = "xyz";
     2768digit( 7, 0 ) = "***";
     2769digit(-4, 3 ) = "$$$";
     2770digit( 5 ) = "LLL";
     2771\end{cfa}
     2772&
     2773\begin{cfa}[escapechar={}]
     2774"0126789"
     2775"0126xyz"
     2776"0126xyz"
     2777"012$$$z"
     2778"012$$LLL"
     2779\end{cfa}
     2780\end{tabular}
     2781\end{cquote}
     2782Now substring pattern matching is useful on the left-hand side of assignment.
     2783\begin{cquote}
     2784\begin{tabular}{@{}l|l@{}}
     2785\begin{cfa}[escapechar={}]
     2786digit( "$$" ) = "345";
     2787digit( "LLL") = "6789";
     2788\end{cfa}
     2789&
     2790\begin{cfa}
     2791"012345LLL"
     2792"0123456789"
     2793\end{cfa}
     2794\end{tabular}
     2795\end{cquote}
     2796The ©replace© operation extends substring to substitute all occurrences.
     2797\begin{cquote}
     2798\begin{tabular}{@{}l|l@{}}
     2799\begin{cfa}
     2800s = replace( "PETER", "E", "XX" );
     2801s = replace( "PETER", "ET", "XX" );
     2802s = replace( "PETER", "W", "XX" );
     2803\end{cfa}
     2804&
     2805\begin{cfa}
     2806"PXXTXXR"
     2807"PXXER"
     2808"PETER"
     2809\end{cfa}
     2810\end{tabular}
     2811\end{cquote}
     2812The replacement is done left-to-right and substituted text is not examined for replacement.
     2813
     2814
     2815\subsection{Searching}
     2816
     2817The ©find© operation returns the position of the first occurrence of a key in a string.
     2818If the key does not appear in the string, the length of the string is returned.
     2819\begin{cquote}
     2820\begin{tabular}{@{}l|l@{}}
     2821\multicolumn{2}{@{}l}{\lstinline{string digit = "0123456789"}} \\
     2822\begin{cfa}
     2823i = find( digit, '3' );
     2824i = find( digit, "45" );
     2825i = find( digit, "abc" );
     2826\end{cfa}
     2827&
     2828\begin{cfa}
     28293
     28304
     283110
     2832\end{cfa}
     2833\end{tabular}
     2834\end{cquote}
     2835
     2836A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
     2837\begin{cquote}
     2838\begin{tabular}{@{}l|l@{}}
     2839\begin{cfa}
     2840charclass vowels{ "aeiouy" };
     2841i = include( "aaeiuyoo", vowels );
     2842i = include( "aabiuyoo", vowels );
     2843\end{cfa}
     2844&
     2845\begin{cfa}
     2846
     28478  // compliant
     28482  // b non-compliant
     2849\end{cfa}
     2850\end{tabular}
     2851\end{cquote}
     2852©vowels© defines a character class and function ©include© checks if all characters in the string appear in the class (compliance).
     2853The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
     2854There is no relationship between the order of characters in the two strings.
     2855Function ©exclude© is the reverse of ©include©, checking if all characters in the string are excluded from the class (compliance).
     2856\begin{cquote}
     2857\begin{tabular}{@{}l|l@{}}
     2858\begin{cfa}
     2859i = exclude( "cdbfghmk", vowels );
     2860i = exclude( "cdyfghmk", vowels );
     2861\end{cfa}
     2862&
     2863\begin{cfa}
     28648  // compliant
     28652  // y non-compliant
     2866\end{cfa}
     2867\end{tabular}
     2868\end{cquote}
     2869Both forms can return the longest substring of compliant characters.
     2870\begin{cquote}
     2871\begin{tabular}{@{}l|l@{}}
     2872\begin{cfa}
     2873s = include( "aaeiuyoo", vowels );
     2874s = include( "aabiuyoo", vowels );
     2875s = exclude( "cdbfghmk", vowels );
     2876s = exclude( "cdyfghmk", vowels );
     2877\end{cfa}
     2878&
     2879\begin{cfa}
     2880"aaeiuyoo"
     2881"aa"
     2882"cdbfghmk"
     2883"cd"
     2884\end{cfa}
     2885\end{tabular}
     2886\end{cquote}
     2887
     2888There are also versions of ©include© and ©exclude©, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.}
     2889\begin{cquote}
     2890\begin{tabular}{@{}l|l@{}}
     2891\begin{cfa}
     2892i = include( "1FeC34aB", ®isxdigit® );
     2893i = include( ".,;'!\"", ®ispunct® );
     2894i = include( "XXXx", ®isupper® );
     2895\end{cfa}
     2896&
     2897\begin{cfa}
     28988   // compliant
     28996   // compliant
     29003   // non-compliant
     2901\end{cfa}
     2902\end{tabular}
     2903\end{cquote}
     2904These operations perform an \emph{apply} of the validation function to each character, where the function returns a boolean indicating a stopping condition for the search.
     2905The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
     2906
     2907The translate operation returns a string with each character transformed by one of the C character transformation functions.
     2908\begin{cquote}
     2909\begin{tabular}{@{}l|l@{}}
     2910\begin{cfa}
     2911s = translate( "abc", ®toupper® );
     2912s = translate( "ABC", ®tolower® );
     2913int tospace( int c ) { return isspace( c ) ? ' ' : c; }
     2914s = translate( "X X\tX\nX", ®tospace® );
     2915\end{cfa}
     2916&
     2917\begin{cfa}
     2918"ABC"
     2919"abc"
     2920
     2921"X X X X"
     2922\end{cfa}
     2923\end{tabular}
     2924\end{cquote}
     2925
     2926
     2927\subsection{Returning N on Search Failure}
     2928
     2929Some of the prior string operations are composite, \eg string operations returning the longest substring of compliant characters (©include©) are built using a search and then substring the appropriate text.
     2930However, string search can fail, which is reported as an alternate search outcome, possibly an exception.
     2931Many string libraries use a return code to indicate search failure, with a failure value of ©0© or ©-1© (PL/I~\cite{PLI} returns ©0©).
     2932This semantics leads to the awkward pattern, which can appear many times in a string library or user code.
     2933\begin{cfa}
     2934i = exclude( s, alpha );
     2935if ( i != -1 ) return s( 0, i );
     2936else return "";
     2937\end{cfa}
     2938
     2939\CFA adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin).
     2940This semantics allows many search and substring functions to be written without conditions, \eg:
     2941\begin{cfa}
     2942string include( const string & s, int (*f)( int ) ) { return ®s( 0, include( s, f ) )®; }
     2943string exclude( const string & s, int (*f)( int ) ) { return ®s( 0, exclude( s, f ) )®; }
     2944\end{cfa}
     2945In string systems with an $O(1)$ length operator, checking for failure is low cost.
     2946\begin{cfa}
     2947if ( include( line, alpha ) == len( line ) ) ... // not found, 0 origin
     2948\end{cfa}
     2949\VRef[Figure]{f:ExtractingWordsText} compares \CC and \CFA string code for extracting words from a line of text, repeatedly removing non-word text and then a word until the line is empty.
     2950The \CFA code is simpler solely because of the choice for indicating search failure.
     2951(A simplification of the \CC version is to concatenate a sentinel character at the end of the line so the call to ©find_first_not_of© does not fail.)
     2952
     2953\begin{figure}
     2954\begin{cquote}
     2955\setlength{\tabcolsep}{15pt}
     2956\begin{tabular}{@{}l|l@{}}
     2957\multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\
     2958\begin{cfa}
     2959for ( ;; ) {
     2960        string::size_type posn = line.find_first_of( alpha );
     2961  if ( posn == string::npos ) break;
     2962        line = line.substr( posn );
     2963        posn = line.find_first_not_of( alpha );
     2964        if ( posn != string::npos ) {
     2965                cout << line.substr( 0, posn ) << endl;
     2966                line = line.substr( posn );
     2967        } else {
     2968                cout << line << endl;
     2969                line = "";
     2970        }
     2971}
     2972\end{cfa}
     2973&
     2974\begin{cfa}
     2975for () {
     2976        size_t posn = exclude( line, alpha );
     2977  if ( posn == len( line ) ) break;
     2978        line = line( posn );
     2979        posn = include( line, alpha );
     2980
     2981        sout | line( 0, posn );
     2982        line = line( posn );
     2983
     2984
     2985
     2986
     2987}
     2988\end{cfa}
     2989\end{tabular}
     2990\end{cquote}
     2991\caption{Extracting Words from Line of Text}
     2992\label{f:ExtractingWordsText}
     2993\end{figure}
     2994
     2995
     2996\subsection{C Compatibility}
     2997
     2998To ease conversion from C to \CFA, \CFA provides companion C @string@ functions.
     2999Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@.
     3000\begin{cquote}
     3001\setlength{\tabcolsep}{15pt}
     3002\begin{tabular}{@{}ll@{}}
     3003\begin{cfa}
     3004char s[32];   // string s;
     3005strlen( s );
     3006strnlen( s, 3 );
     3007strcmp( s, "abc" );
     3008strncmp( s, "abc", 3 );
     3009\end{cfa}
     3010&
     3011\begin{cfa}
     3012
     3013strcpy( s, "abc" );
     3014strncpy( s, "abcdef", 3 );
     3015strcat( s, "xyz" );
     3016strncat( s, "uvwxyz", 3 );
     3017\end{cfa}
     3018\end{tabular}
     3019\end{cquote}
     3020However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated.
     3021Nevertheless, this capability does provide a useful starting point for conversion to safer \CFA strings.
     3022
     3023
     3024\subsection{I/O Operators}
     3025
     3026The ability to input and output strings is as essential as for any other type.
     3027The goal for character I/O is to also work with groups rather than individual characters.
     3028A comparison with \CC string I/O is presented as a counterpoint to \CFA string I/O.
     3029
     3030The \CC ooutput ©<<© and input ©>>© operators are defined on type ©string©.
     3031\CC output for ©char©, ©char *©, and ©string© are similar.
     3032The \CC manipulators are ©setw©, and its associated width controls ©left©, ©right© and ©setfill©.
     3033\begin{cquote}
     3034\setlength{\tabcolsep}{15pt}
     3035\begin{tabular}{@{}l|l@{}}
     3036\begin{C++}
     3037string s = "abc";
     3038cout << setw(10) << left << setfill( 'x' ) << s << endl;
     3039\end{C++}
     3040&
     3041\begin{C++}
     3042
     3043"abcxxxxxxx"
     3044\end{C++}
     3045\end{tabular}
     3046\end{cquote}
     3047
     3048The \CFA input/output operator ©|© is defined on type ©string©.
     3049\CFA output for ©char©, ©char *©, and ©string© are similar.
     3050The \CFA manipulators are ©bin©, ©oct©, ©hex©, ©wd©, and its associated width control and ©left©.
     3051\begin{cquote}
     3052\setlength{\tabcolsep}{15pt}
     3053\begin{tabular}{@{}l|l@{}}
     3054\begin{cfa}
     3055string s = "abc";
     3056sout | bin( s ) | nl
     3057           | oct( s ) | nl
     3058           | hex( s ) | nl
     3059           | wd( 10, s ) | nl
     3060           | wd( 10, 2, s ) | nl
     3061           | left( wd( 10, s ) );
     3062\end{cfa}
     3063&
     3064\begin{cfa}
     3065
     3066"0b1100001 0b1100010 0b1100011"
     3067"0141 0142 0143"
     3068"0x61 0x62 0x63"
     3069"       abc"
     3070"        ab"
     3071"abc       "
     3072\end{cfa}
     3073\end{tabular}
     3074\end{cquote}
     3075\CC ©setfill© is not considered an important string manipulator.
     3076
     3077\CC input matching for ©char©, ©char *©, and ©string© are similar, where \emph{all} input characters are read from the current point in the input stream to the end of the type size, format width, whitespace, end of line (©'\n'©), or end of file.
     3078The \CC manipulator is ©setw© to restrict the size.
     3079Reading into a ©char© is safe as the size is 1, ©char *© is unsafe without using ©setw© to constraint the length (which includes ©'\0'©), ©string© is safe as its grows dynamically as characters are read.
     3080\begin{cquote}
     3081\setlength{\tabcolsep}{15pt}
     3082\begin{tabular}{@{}l|l@{}}
     3083\begin{C++}
     3084char ch, c[10];
     3085string s;
     3086cin >> ch >> setw( 5 ) >> c  >> s;
     3087®abcde   fg®
     3088\end{C++}
     3089&
     3090\begin{C++}
     3091
     3092
     3093'a' "bcde" "fg"
     3094
     3095\end{C++}
     3096\end{tabular}
     3097\end{cquote}
     3098Input text can be \emph{gulped}, including whitespace, from the current point to an arbitrary delimiter character using ©getline©.
     3099
     3100The \CFA philosophy for input is that, for every constant type in C, these constants should be usable as input.
     3101For example, the complex constant ©3.5+4.1i© can appear as input to a complex variable.
     3102\CFA input matching for ©char©, ©char *©, and ©string© are similar.
     3103C-strings may only be read with a width field, which should match the string size.
     3104Certain input manipulators support a scanset, which is a simple regular expression from ©printf©.
     3105The \CFA manipulators for these types are ©wdi©,\footnote{Due to an overloading issue in the type-resolver, the input width name must be temporarily different from the output, \lstinline{wdi} versus \lstinline{wd}.} and its associated width control and ©left©, ©quote©, ©incl©, ©excl©, and ©getline©.
     3106\begin{cquote}
     3107\setlength{\tabcolsep}{10pt}
     3108\begin{tabular}{@{}l|l@{}}
     3109\begin{C++}
     3110char ch, c[10];
     3111string s;
     3112sin | ch | wdi( 5, c ) | s;
     3113®abcde fg®
     3114sin | quote( ch ) | quote( wdi( sizeof(c), c ) ) | quote( s, '[', ']' ) | nl;
     3115®'a' "bcde" [fg]®
     3116sin | incl( "a-zA-Z0-9 ?!&\n", s ) | nl;
     3117®x?&000xyz TOM !.®
     3118sin | excl( "a-zA-Z0-9 ?!&\n", s );
     3119®<>{}{}STOP®
     3120\end{C++}
     3121&
     3122\begin{C++}
     3123
     3124
     3125'a' "bcde" "fg"
     3126
     3127'a' "bcde" "fg"
     3128
     3129"x?&000xyz TOM !"
     3130
     3131"<>{}{}"
     3132
     3133\end{C++}
     3134\end{tabular}
     3135\end{cquote}
     3136Note, the ability to read in quoted strings with whitespace to match with program string constants.
     3137The ©nl© at the end of an input ignores the rest of the line.
     3138
     3139
     3140\begin{comment}
    25863141A substring is treated as a pointer into the base (substringed) string rather than creating a copy of the subtext.
    25873142As with all pointers, if the item they are pointing at is changed, then the pointer is referring to the changed item.
     
    26113166}
    26123167\end{cfa}
    2613 
    2614 There is an assignment form of substring in which only the starting position is specified and the length is assumed to be the remainder of the string.
    2615 \begin{cfa}
    2616 string operator () (int start);
    2617 \end{cfa}
    2618 For example:
    2619 \begin{cfa}
    2620 s = peter( 2 );                                         §\C{// s is assigned "ETER"}§
    2621 peter( 2 ) = "IPER";                            §\C{// peter is assigned "PIPER"}§
    2622 \end{cfa}
    2623 It is also possible to substring using a string as the index for selecting the substring portion of the string.
    2624 \begin{cfa}
    2625 string operator () (const string &index);
    2626 \end{cfa}
    2627 For example:
    2628 \begin{cfa}[mathescape=false]
    2629 digit( "xyz$$$" ) = "678";                      §\C{// digit is assigned "0156789"}§
    2630 digit( "234") = "***";                          §\C{// digit is assigned "0156789***"}§
    2631 \end{cfa}
    2632 
    2633 
    2634 \subsection{Searching}
    2635 
    2636 The ©index© operation
    2637 \begin{cfa}
    2638 int index( const string &key, int start = 1, occurrence occ = first );
    2639 \end{cfa}
    2640 returns the position of the first or last occurrence of the ©key© (depending on the occurrence indicator ©occ© that is either ©first© or ©last©) in the current string starting the search at position ©start©.
    2641 If the ©key© does not appear in the current string, the length of the current string plus one is returned.
    2642 %If the ©key© has zero length, the value 1 is returned regardless of what the current string contains.
    2643 A negative starting position is a specification from the right end of the string.
    2644 \begin{cfa}
    2645 i = digit.index( "567" );                       §\C{// i is assigned 3}§
    2646 i = digit.index( "567", 7 );            §\C{// i is assigned 11}§
    2647 i = digit.index( "567", -1, last );     §\C{// i is assigned 3}§
    2648 i = peter.index( "E", 5, last );        §\C{// i is assigned 4}§
    2649 \end{cfa}
    2650 
    2651 The next two string operations test a string to see if it is or is not composed completely of a particular class of characters.
    2652 For example, are the characters of a string all alphabetic or all numeric?
    2653 Use of these operations involves a two step operation.
    2654 First, it is necessary to create an instance of type ©strmask© and initialize it to a string containing the characters of the particular character class, as in:
    2655 \begin{cfa}
    2656 strmask digitmask = digit;
    2657 strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" );
    2658 \end{cfa}
    2659 Second, the character mask is used in the functions ©include© and ©exclude© to check a string for compliance of its characters with the characters indicated by the mask.
    2660 
    2661 The ©include© operation
    2662 \begin{cfa}
    2663 int include( const strmask &, int = 1, occurrence occ = first );
    2664 \end{cfa}
    2665 returns the position of the first or last character (depending on the occurrence indicator, which is either ©first© or ©last©) in the current string that does not appear in the ©mask© starting the search at position ©start©;
    2666 hence it skips over characters in the current string that are included (in) the ©mask©.
    2667 The characters in the current string do not have to be in the same order as the ©mask©.
    2668 If all the characters in the current string appear in the ©mask©, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
    2669 A negative starting position is a specification from the right end of the string.
    2670 \begin{cfa}
    2671 i = peter.include( digitmask );         §\C{// i is assigned 1}§
    2672 i = peter.include( alphamask );         §\C{// i is assigned 6}§
    2673 \end{cfa}
    2674 
    2675 The ©exclude© operation
    2676 \begin{cfa}
    2677 int exclude( string &mask, int start = 1, occurrence occ = first )
    2678 \end{cfa}
    2679 returns the position of the first or last character (depending on the occurrence indicator, which is either ©first© or ©last©) in the current string that does appear in the ©mask© string starting the search at position ©start©;
    2680 hence it skips over characters in the current string that are excluded from (not in) in the ©mask© string.
    2681 The characters in the current string do not have to be in the same order as the ©mask© string.
    2682 If all the characters in the current string do NOT appear in the ©mask© string, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
    2683 A negative starting position is a specification from the right end of the string.
    2684 \begin{cfa}
    2685 i = peter.exclude( digitmask );         §\C{// i is assigned 6}§
    2686 i = ifstmt.exclude( strmask( punctuation ) ); §\C{// i is assigned 4}§
    2687 \end{cfa}
    2688 
    2689 The ©includeStr© operation:
    2690 \begin{cfa}
    2691 string includeStr( strmask &mask, int start = 1, occurrence occ = first )
    2692 \end{cfa}
    2693 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) of the current string that ARE included in the ©mask© string starting the search at position ©start©.
    2694 A negative starting position is a specification from the right end of the string.
    2695 \begin{cfa}
    2696 s = peter.includeStr( alphamask );      §\C{// s is assigned "PETER"}§
    2697 s = ifstmt.includeStr( alphamask );     §\C{// s is assigned "IF"}§
    2698 s = peter.includeStr( digitmask );      §\C{// s is assigned ""}§
    2699 \end{cfa}
    2700 
    2701 The ©excludeStr© operation:
    2702 \begin{cfa}
    2703 string excludeStr( strmask &mask, int start = 1, occurrence = first )
    2704 \end{cfa}
    2705 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) of the current string that are excluded (NOT) in the ©mask© string starting the search at position ©start©.
    2706 A negative starting position is a specification from the right end of the string.
    2707 \begin{cfa}
    2708 s = peter.excludeStr( digitmask);       §\C{// s is assigned "PETER"}§
    2709 s = ifstmt.excludeStr( strmask( punctuation ) ); §\C{// s is assigned "IF "}§
    2710 s = peter.excludeStr( alphamask);       §\C{// s is assigned ""}§
    2711 \end{cfa}
    2712 
    2713 
    2714 \subsection{Miscellaneous}
    2715 
    2716 The ©trim© operation
    2717 \begin{cfa}
    2718 string trim( string &mask, occurrence occ = first )
    2719 \end{cfa}
    2720 returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) which ARE included in the ©mask© are removed.
    2721 \begin{cfa}
    2722 // remove leading blanks
    2723 s = string( "   ABC" ).trim( " " );     §\C{// s is assigned "ABC",}§
    2724 // remove trailing blanks
    2725 s = string( "ABC   " ).trim( " ", last ); §\C{// s is assigned "ABC",}§
    2726 \end{cfa}
    2727 
    2728 The ©translate© operation
    2729 \begin{cfa}
    2730 string translate( string &from, string &to )
    2731 \end{cfa}
    2732 returns a string that is the same length as the original string in which all occurrences of the characters that appear in the ©from© string have been translated into their corresponding character in the ©to© string.
    2733 Translation is done on a character by character basis between the ©from© and ©to© strings; hence these two strings must be the same length.
    2734 If a character in the original string does not appear in the ©from© string, then it simply appears as is in the resulting string.
    2735 \begin{cfa}
    2736 // upper to lower case
    2737 peter = peter.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
    2738                         // peter is assigned "peter"
    2739 s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
    2740                         // ifstmt is assigned "if (a > b) {"
    2741 // lower to upper case
    2742 peter = peter.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" );
    2743                         // peter is assigned "PETER"
    2744 \end{cfa}
    2745 
    2746 The ©replace© operation
    2747 \begin{cfa}
    2748 string replace( string &from, string &to )
    2749 \end{cfa}
    2750 returns a string in which all occurrences of the ©from© string in the current string have been replaced by the ©to© string.
    2751 \begin{cfa}
    2752 s = peter.replace( "E", "XX" );         §\C{// s is assigned "PXXTXXR"}§
    2753 \end{cfa}
    2754 The replacement is done left-to-right.
    2755 When an instance of the ©from© string is found and changed to the ©to© string, it is NOT examined again for further replacement.
    2756 
    2757 \subsection{Returning N+1 on Failure}
    2758 
    2759 Any of the string search routines can fail at some point during the search.
    2760 When this happens it is necessary to return indicating the failure.
    2761 Many string types in other languages use some special value to indicate the failure.
    2762 This value is often 0 or -1 (PL/I returns 0).
    2763 This section argues that a value of N+1, where N is the length of the base string in the search, is a more useful value to return.
    2764 The index-of function in APL returns N+1.
    2765 These are the boundary situations and are often overlooked when designing a string type.
    2766 
    2767 The situation that can be optimized by returning N+1 is when a search is performed to find the starting location for a substring operation.
    2768 For example, in a program that is extracting words from a text file, it is necessary to scan from left to right over whitespace until the first alphabetic character is found.
    2769 \begin{cfa}
    2770 line = line( line.exclude( alpha ) );
    2771 \end{cfa}
    2772 If a text line contains all whitespaces, the exclude operation fails to find an alphabetic character.
    2773 If ©exclude© returns 0 or -1, the result of the substring operation is unclear.
    2774 Most string types generate an error, or clip the starting value to 1, resulting in the entire whitespace string being selected.
    2775 If ©exclude© returns N+1, the starting position for the substring operation is beyond the end of the string leaving a null string.
    2776 
    2777 The same situation occurs when scanning off a word.
    2778 \begin{cfa}
    2779 start = line.include(alpha);
    2780 word = line(1, start - 1);
    2781 \end{cfa}
    2782 If the entire line is composed of a word, the include operation will  fail to find a non-alphabetic character.
    2783 In general, returning 0 or -1 is not an appropriate starting position for the substring, which must substring off the word leaving a null string.
    2784 However, returning N+1 will substring off the word leaving a null string.
    2785 
    2786 
    2787 \subsection{C Compatibility}
    2788 
    2789 To ease conversion from C to \CFA, there are companion ©string© routines for C strings.
    2790 \VRef[Table]{t:CompanionStringRoutines} shows the C routines on the left that also work with ©string© and the rough equivalent ©string© opeation of the right.
    2791 Hence, it is possible to directly convert a block of C string operations into @string@ just by changing the
    2792 
    2793 \begin{table}
    2794 \begin{cquote}
    2795 \begin{tabular}{@{}l|l@{}}
    2796 \multicolumn{1}{c|}{©char []©}  & \multicolumn{1}{c}{©string©}  \\
    2797 \hline
    2798 ©strcpy©, ©strncpy©             & ©=©                                                                   \\
    2799 ©strcat©, ©strncat©             & ©+©                                                                   \\
    2800 ©strcmp©, ©strncmp©             & ©==©, ©!=©, ©<©, ©<=©, ©>©, ©>=©              \\
    2801 ©strlen©                                & ©size©                                                                \\
    2802 ©[]©                                    & ©[]©                                                                  \\
    2803 ©strstr©                                & ©find©                                                                \\
    2804 ©strcspn©                               & ©find_first_of©, ©find_last_of©               \\
    2805 ©strspc©                                & ©find_fist_not_of©, ©find_last_not_of©
    2806 \end{tabular}
    2807 \end{cquote}
    2808 \caption{Companion Routines for \CFA \lstinline{string} to C Strings}
    2809 \label{t:CompanionStringRoutines}
    2810 \end{table}
    2811 
    2812 For example, this block of C code can be converted to \CFA by simply changing the type of variable ©s© from ©char []© to ©string©.
    2813 \begin{cfa}
    2814         char s[32];
    2815         //string s;
    2816         strcpy( s, "abc" );                             PRINT( %s, s );
    2817         strncpy( s, "abcdef", 3 );              PRINT( %s, s );
    2818         strcat( s, "xyz" );                             PRINT( %s, s );
    2819         strncat( s, "uvwxyz", 3 );              PRINT( %s, s );
    2820         PRINT( %zd, strlen( s ) );
    2821         PRINT( %c, s[3] );
    2822         PRINT( %s, strstr( s, "yzu" ) ) ;
    2823         PRINT( %s, strstr( s, 'y' ) ) ;
    2824 \end{cfa}
    2825 However, the conversion fails with I/O because ©printf© cannot print a ©string© using format code ©%s© because \CFA strings are not null terminated.
    2826 
    2827 
    2828 \subsection{Input/Output Operators}
    2829 
    2830 Both the \CC operators ©<<© and ©>>© are defined on type ©string©.
    2831 However, input of a string value is different from input of a ©char *© value.
    2832 When a string value is read, \emph{all} input characters from the current point in the input stream to either the end of line (©'\n'©) or the end of file are read.
     3168\end{comment}
    28333169
    28343170
     
    33403676allowable calls are:
    33413677\begin{cquote}
    3342 \setlength{\tabcolsep}{0.75in}
    33433678\begin{tabular}{@{}ll@{}}
    33443679\textbf{positional arguments} & \textbf{empty arguments} \\
Note: See TracChangeset for help on using the changeset viewer.