- Timestamp:
- Sep 15, 2025, 5:11:15 PM (8 days ago)
- Branches:
- master
- Children:
- 8317671
- Parents:
- fae93a40
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/user/user.tex
rfae93a40 r829a955 11 11 %% Created On : Wed Apr 6 14:53:29 2016 12 12 %% Last Modified By : Peter A. Buhr 13 %% Last Modified On : Mon Apr 14 20:53:55 202514 %% Update Count : 7 06513 %% Last Modified On : Mon Sep 15 17:06:25 2025 14 %% Update Count : 7216 15 15 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 16 16 … … 62 62 \setlength{\topmargin}{-0.45in} % move running title into header 63 63 \setlength{\headsep}{0.25in} 64 \setlength{\tabcolsep}{15pt} 64 65 65 66 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 703 704 In addition, inclusive ranges are allowed using symbol ©~© to specify a contiguous set of case values, both positive and negative. 704 705 \begin{cquote} 705 \setlength{\tabcolsep}{15pt}706 706 \begin{tabular}{@{}llll@{}} 707 707 \multicolumn{1}{c}{\textbf{C}} & \multicolumn{1}{c}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{©gcc©}} \\ … … 1005 1005 \end{tabular} 1006 1006 \end{cquote} 1007 The target label must be below the \Indexc{fallthrough} and may not be nested in a control structure, and 1008 the target label must be at the same or higher level as the containing \Indexc{case} clause and located at 1009 the same level as a ©case© clause; the target label may be case \Indexc{default}, but only associated 1010 with the current \Indexc{switch}/\Indexc{choose} statement. 1007 The target label must be below the \Indexc{fallthrough} and may not be nested in a control structure, and the target label must be at the same or higher level as the containing \Indexc{case} clause and located at the same level as a ©case© clause; 1008 the target label may be case \Indexc{default}, but only associated with the current \Indexc{switch}/\Indexc{choose} statement. 1011 1009 1012 1010 1013 1011 \subsection{Loop Control} 1014 1012 1013 \CFA condenses writing loops to facilitate coding speed and safety. 1014 1015 To simplify creating an infinite loop, the \Indexc{for}, \Indexc{while}, and \Indexc{do} loop-predicate\index{loop predicate} is extended with an empty conditional, meaning a comparison value of ©1© (true). 1016 \begin{cfa} 1017 while ( ) §\C{// while ( true )}§ 1018 for ( ) §\C{// for ( ; true; )}§ 1019 do ... while ( ) §\C{// do ... while ( true )}§ 1020 \end{cfa} 1021 1015 1022 Looping a predefined number of times, possibly with a loop index, occurs frequently. 1016 \CFA condenses writing loops to facilitate coding speed and safety.1017 1018 \Indexc{for}, \Indexc{while}, and \Indexc{do} loop-control\index{loop control} are extended with an empty conditional, meaning a comparison value of ©1© (true).1019 \begin{cfa}1020 while ( ®/* empty */® ) §\C{// while ( true )}§1021 for ( ®/* empty */® ) §\C{// for ( ; true; )}§1022 do ... while ( ®/* empty */® ) §\C{// do ... while ( true )}§1023 \end{cfa}1024 1025 1023 The ©for© control\index{for control}, \ie ©for ( /* control */ )©, is extended with a range and step. 1026 1024 A range is a set of values defined by an optional low value (default to 0), tilde, and high value, ©L ~ H©, with an optional step ©~ S© (default to 1), which means an ascending set of values from ©L© to ©H© in positive steps of ©S©. … … 1031 1029 \end{cfa} 1032 1030 \R{Warning}: A range in descending order, \eg ©5 ~ -3© is the null (empty) set, \ie no values in the set. 1033 \R{Warning}: A ©0© or negative step is undefined. 1034 Note, the order of values in a set may not be the order the values are presented during looping. 1031 As well, a ©0© or negative step is undefined. 1035 1032 1036 1033 The range character, ©'~'©, is decorated on the left and right to control how the set values are presented in the loop body. … … 1042 1039 -8 ®§\Sp§®~ -2 §\C{// ascending, no prefix}§ 1043 1040 0 ®+®~ 5 §\C{// ascending, prefix}§ 1044 -3 ®-®~ 3 §\C{// descending }§1041 -3 ®-®~ 3 §\C{// descending, prefix}§ 1045 1042 \end{cfa} 1046 1043 For descending iteration, the ©L© and ©H© values are \emph{implicitly} switched, and the increment/decrement for ©S© is toggled. 1044 Hence, the order of values in a set may not be the order the values are presented during looping. 1047 1045 When changing the iteration direction, this form is faster and safer, \ie the direction prefix can be added/removed without changing existing (correct) program text. 1048 1046 \R{Warning}: reversing the range endpoints for descending order results in an empty set. … … 1058 1056 \index{-\~}\index{descending exclusive range} 1059 1057 \index{-\~=}\index{descending inclusive range} 1058 1059 \begin{comment} 1060 To simplify loop iteration a range is provided, from low to high, and a traversal direction, ascending (©+©) or descending (©-©). 1061 The following is the syntax for the loop range, where ©[©\,©]© means optional. 1062 \begin{cfa}[deletekeywords=default] 1063 [ ®index ;® ] [ [ ®min® (default 0) ] [ direction ®+®/®-® (default +) ] ®~® [ ®=® (include endpoint) ] ] ®max® [ ®~ increment® ] 1064 \end{cfa} 1065 For ©=©, the range includes the endpoint (©max©/©min©) depending on the direction (©+©/©-©). 1066 \end{comment} 1060 1067 1061 1068 ©for© control is formalized by the following regular expression: … … 2422 2429 \label{s:stringType} 2423 2430 2424 The \CFA \Indexc{string} type is for manipulation of dynamically-size character-strings versus C \Indexc{char *} type for manipulation of statically-size null-terminated character-strings. 2425 That is, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time. 2426 Hence, a ©string© declaration does not specify a maximum length; 2427 as a string dynamically grows and shrinks in size, so does its underlying storage. 2428 In contrast, a C string also dynamically grows and shrinks is size, but its underlying storage is fixed. 2431 A string is a sequence of symbols, where the form of a symbol can vary significantly: regular 7/8-bit ASCII/Latin-1, or wide 2/4/8-byte UNICODE or variable length UTF-8/16/32. 2432 A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes ©"xyz\n"©. 2433 Currently, \CFA strings only support regular characters. 2434 2435 A string type is designed to operate on groups of characters for assigning, copying, scanning, and updating, rather than working with individual characters. 2436 The \CFA \Indexc{string} type is for manipulation of dynamically-sized strings versus C \Indexc{char *} type for manipulation of statically-sized null-terminated strings. 2437 Therefore, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time. 2438 As a result, a ©string© declaration does not specify a maximum length, where a C string array does. 2439 For \CFA, as a ©string© dynamically grows and shrinks in size, so does its underlying storage. 2440 For C, as a string dynamically grows and shrinks in size, but its underlying storage does not. 2429 2441 The maximum storage for a \CFA ©string© value is ©size_t© characters, which is $2^{32}$ or $2^{64}$ respectively. 2430 2442 A \CFA string manages its length separately from the string, so there is no null (©'\0'©) terminating value at the end of a string value. 2431 2443 Hence, a \CFA string cannot be passed to a C string manipulation routine, such as ©strcat©. 2432 Like C strings, the characters in a ©string© are numbered starting from 0. 2433 2434 The following operations have been defined to manipulate an instance of type ©string©. 2435 The discussion assumes the following declarations and assignment statements are executed. 2436 \begin{cfa} 2437 #include ®<string.hfa>® 2438 ®string® s, peter, digit, alpha, punctuation, ifstmt; 2439 int i; 2440 peter = "PETER"; 2441 digit = "0123456789"; 2442 punctuation = "().,"; 2443 ifstmt = "IF (A > B) {"; 2444 \end{cfa} 2445 Note, the include file \Indexc{string.hfa} to access type ©string©. 2446 2447 2448 \subsection{Implicit String Conversions} 2449 2450 The types ©char©, ©char *©, ©int©, ©double©, ©_Complex©, including different signness and sizes, implicitly convert to type ©string©. 2451 \VRef[Figure]{f:ImplicitConversionsString} shows examples of implicit conversions between C strings, integral, floating-point and complex types to ©string©. 2452 A conversions can be explicitly specified: 2453 \begin{cfa} 2454 s = string( "abc" ); §\C{// converts char * to string}§ 2455 s = string( 5 ); §\C{// converts int to string}§ 2456 s = string( 5.5 ); §\C{// converts double to string}§ 2457 \end{cfa} 2458 All conversions from ©string© to ©char *©, attempt to be safe: 2459 either by requiring the maximum length of the ©char *© storage (©strncpy©) or allocating the ©char *© storage for the string characters (ownership), meaning the programmer must free the storage. 2460 As well, a string is always null terminates, implying a minimum size of 1 character. 2444 Like C strings, characters in a ©string© are numbered from the left starting at 0 (because subscripting is zero-origin), and in \CFA numbered from the right starting at -1. 2461 2445 \begin{cquote} 2462 \begin{tabular}{@{}l@{\hspace{1.75in}}|@{\hspace{15pt}}l@{}} 2463 \begin{cfa} 2464 string s = "abcde"; 2465 char cs[3]; 2466 strncpy( cs, s, sizeof(cs) ); §\C{sout | cs;}§ 2467 char * cp = s; §\C{sout | cp;}§ 2468 delete( cp ); 2469 cp = s + ' ' + s; §\C{sout | cp;}§ 2470 delete( cp ); 2471 \end{cfa} 2472 & 2473 \begin{cfa} 2474 2475 2476 ab 2477 abcde 2478 2479 abcde abcde 2480 2481 \end{cfa} 2446 \rm 2447 \begin{tabular}{@{}rrrrll@{}} 2448 \small\tt "a & \small\tt b & \small\tt c & \small\tt d & \small\tt e" \\ 2449 0 & 1 & 2 & 3 & 4 & left to right index \\ 2450 -5 & -4 & -3 & -2 & -1 & right to left index 2482 2451 \end{tabular} 2483 2452 \end{cquote} 2484 2485 \begin{figure} 2486 \begin{tabular}{@{}l@{\hspace{15pt}}|@{\hspace{15pt}}l@{}} 2487 \begin{cfa} 2488 // string s = 5; sout | s; 2489 string s; 2490 // conversion of char and char * to string 2491 s = 'x'; §\C{sout | s;}§ 2492 s = "abc"; §\C{sout | s;}§ 2493 char cs[5] = "abc"; 2494 s = cs; §\C{sout | s;}§ 2495 // conversion of integral, floating-point, and complex to string 2496 s = 45hh; §\C{sout | s;}§ 2497 s = 45h; §\C{sout | s;}§ 2498 s = -(ssize_t)MAX - 1; §\C{sout | s;}§ 2499 s = (size_t)MAX; §\C{sout | s;}§ 2500 s = 5.5; §\C{sout | s;}§ 2501 s = 5.5L; §\C{sout | s;}§ 2502 s = 5.5+3.4i; §\C{sout | s;}§ 2503 s = 5.5L+3.4Li; §\C{sout | s;}§ 2504 \end{cfa} 2505 & 2506 \begin{cfa} 2507 2508 2509 2510 x 2511 abc 2512 2513 abc 2514 2515 45 2516 45 2517 -9223372036854775808 2518 18446744073709551615 2519 5.5 2520 5.5 2521 5.5+3.4i 2522 5.5+3.4i 2453 The include file \Indexc{string.hfa} is necessary to access type ©string©. 2454 2455 2456 \subsection{Implicit String Conversions} 2457 2458 The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O. 2459 Hence, the basic types ©char©, ©char *©, ©int©, ©double©, ©_Complex©, including any signness and size variations, implicitly convert to type ©string© (as in Java). 2460 \begin{cquote} 2461 \begin{tabular}{@{}l|ll|l@{}} 2462 \begin{cfa} 2463 string s = 5; 2464 s = 'x'; 2465 s = "abc"; 2466 s = 42hh; /* signed char */ 2467 s = 42h; /* short int */ 2468 s = 0xff; 2469 \end{cfa} 2470 & 2471 \begin{cfa} 2472 "5" 2473 "x" 2474 "abc" 2475 "42" 2476 "42" 2477 "255" 2478 \end{cfa} 2479 & 2480 \begin{cfa} 2481 s = (ssize_t)MIN; 2482 s = (size_t)MAX; 2483 s = 5.5; 2484 s = 5.5L; 2485 s = 5.5+3.4i; 2486 s = 5.5L+3.4Li; 2487 \end{cfa} 2488 & 2489 \begin{cfa} 2490 "-9223372036854775808" 2491 "18446744073709551615" 2492 "5.5" 2493 "5.5" 2494 "5.5+3.4i" 2495 "5.5+3.4i" 2523 2496 \end{cfa} 2524 2497 \end{tabular} 2525 \caption{Implicit Conversions to String} 2526 \label{f:ImplicitConversionsString} 2527 \end{figure} 2528 2529 2530 \subsection{Size (length)} 2531 2532 The ©size© operation returns the length of a string. 2533 \begin{cfa} 2534 i = size( "" ); §\C{// i is assigned 0}§ 2535 i = size( "abc" ); §\C{// i is assigned 3}§ 2536 i = size( peter ); §\C{// i is assigned 5}§ 2537 \end{cfa} 2498 \end{cquote} 2499 Conversions can be explicitly specified using a compound literal. 2500 \begin{cfa} 2501 s = (string){ 5 }; s = (string){ "abc" }; s = (string){ 5.5 }; 2502 \end{cfa} 2503 2504 Conversions from ©string© to ©char *© attempt to be safe. 2505 The ©strncpy© conversion requires the maximum length for the pointer's target buffer. 2506 The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it. 2507 Note, a C string is always null terminated, implying storage is always necessary for the null. 2508 \begin{cquote} 2509 \begin{tabular}{@{}l|l@{}} 2510 \begin{cfa} 2511 string s = "abcde"; 2512 char cs[4]; 2513 strncpy( cs, s, sizeof(cs) ); 2514 char * cp = s; // ownership 2515 delete( cp ); 2516 cp = s + ' ' + s; // ownership 2517 delete( cp ); 2518 \end{cfa} 2519 & 2520 \begin{cfa} 2521 2522 2523 "abc\0", in place 2524 "abcde\0", malloc 2525 2526 "abcde abcde\0", malloc 2527 2528 \end{cfa} 2529 \end{tabular} 2530 \end{cquote} 2531 2532 2533 \subsection{Length} 2534 2535 The ©len© operation (short for ©strlen©) returns the length of a C or \CFA string. 2536 For compatibility, ©strlen© works with \CFA strings. 2537 \begin{cquote} 2538 \begin{tabular}{@{}l|l@{}} 2539 \begin{cfa} 2540 i = len( "" ); 2541 i = len( "abc" ); 2542 i = len( cs ); 2543 i = strlen( cs ); 2544 i = len( name ); 2545 i = strlen( name ); 2546 \end{cfa} 2547 & 2548 \begin{cfa} 2549 0 2550 3 2551 3 2552 3 2553 4 2554 4 2555 \end{cfa} 2556 \end{tabular} 2557 \end{cquote} 2538 2558 2539 2559 2540 2560 \subsection{Comparison Operators} 2541 2561 2542 The binary \Index{relational operator}s, ©<©, ©<=©, ©>©, ©>=©, and \Index{equality operator}s, ©==©, ©!=©, compare strings using lexicographical ordering, where longer strings are greater than shorter strings. 2562 The binary relational\index{string!relational opertors}, \Indexc{<}, \Indexc{<=}, \Indexc{>}, \Indexc{>=}, and equality\index{string!equality operators}, \Indexc{==}, \Indexc{!=}, operators compare \CFA strings using lexicographical ordering, where longer strings are greater than shorter strings. 2563 In C, these operators compare the C string pointer not its value, which does not match programmer expectation. 2564 C strings use function ©strcmp© to lexicographically compare the string value. 2565 Java has the same issue with ©==© and ©.equals©. 2543 2566 2544 2567 2545 2568 \subsection{Concatenation} 2546 2569 2547 The binary operators \Indexc{+} and \Indexc{+=} concatenate two strings, creating the sum of the strings. 2548 \begin{cfa} 2549 s = peter + ' ' + digit; §\C{// s is assigned "PETER 0123456789"}§ 2550 s += peter; §\C{// s is assigned "PETER 0123456789PETER"}§ 2551 \end{cfa} 2570 The binary operators \Indexc{+} and \Indexc{+=} concatenate C ©char©, ©char *© and \CFA strings, creating the sum of the characters. 2571 \begin{cquote} 2572 \begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}} 2573 \begin{cfa} 2574 s = ""; 2575 s = 'a' + 'b'; 2576 s = 'a' + "b"; 2577 s = "a" + 'b'; 2578 s = "a" + "b"; 2579 \end{cfa} 2580 & 2581 \begin{cfa} 2582 2583 "ab" 2584 "ab" 2585 "ab" 2586 "ab" 2587 \end{cfa} 2588 & 2589 \begin{cfa} 2590 s = ""; 2591 s = 'a' + 'b' + s; 2592 s = 'a' + 'b' + s; 2593 s = 'a' + "b" + s; 2594 s = "a" + 'b' + s; 2595 \end{cfa} 2596 & 2597 \begin{cfa} 2598 2599 "ab" 2600 "abab" 2601 "ababab" 2602 "abababab" 2603 \end{cfa} 2604 & 2605 \begin{cfa} 2606 s = ""; 2607 s = s + 'a' + 'b'; 2608 s = s + 'a' + "b"; 2609 s = s + "a" + 'b'; 2610 s = s + "a" + "b"; 2611 \end{cfa} 2612 & 2613 \begin{cfa} 2614 2615 "ab" 2616 "abab" 2617 "ababab" 2618 "abababab" 2619 \end{cfa} 2620 \end{tabular} 2621 \end{cquote} 2622 However, including ©<string.hfa>© can result in ambiguous uses of the overloaded ©+© operator.\footnote{Combining multiple packages in any programming language can result in name clashes or ambiguities.} 2623 For example, subtracting characters or pointers has valid use-cases: 2624 \begin{cfa} 2625 ch - '0' §\C[2in]{// find character offset}§ 2626 cs - cs2; §\C{// find pointer offset}\CRT§ 2627 \end{cfa} 2628 addition is less obvious: 2629 \begin{cfa} 2630 ch + 'b' §\C[2in]{// add character values}§ 2631 cs + 'a'; §\C{// move pointer cs['a']}\CRT§ 2632 \end{cfa} 2633 There are legitimate use cases for arithmetic with ©signed©/©unsigned© characters (bytes), and these types are treated differently from ©char© in \CC and \CFA. 2634 However, backwards compatibility makes it impossible to restrict or remove addition on type ©char©. 2635 Similarly, it is impossible to restrict or remove addition on type ©char *© because (unfortunately) it is subscripting: ©cs + 'a'© implies ©cs['a']© or ©'a'[cs]©. 2636 2637 The prior \CFA concatenation examples show complex mixed-mode interactions among ©char©, ©char *©, and ©string© constants work correctly (variables are the same). 2638 The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs. 2639 Hence, the type system correctly handles all uses of addition (explicit or implicit) for ©char *©. 2640 \begin{cfa} 2641 printf( "%s %s %s %c %c\n", "abc", cs, cs + 3, cs['a'], 'a'[cs] ); 2642 \end{cfa} 2643 Only ©char© addition can result in ambiguities, and only when there is no left-hand information. 2644 \begin{cfa} 2645 ch = ch + 'b'; §\C[2in]{// LHS disambiguate, add character values}§ 2646 s = 'a' + 'b'; §\C{// LHS disambiguate, concatenate characters}§ 2647 printf( "%c\n", ®'a' + 'b'® ); §\C{// no LHS information, ambiguous}§ 2648 printf( "%c\n", ®(return char)®('a' + 'b') ); §\C{// disambiguate with ascription cast}\CRT§ 2649 \end{cfa} 2650 The ascription cast, ©(return T)©, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion). 2651 Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator ©+© for ©string© types is not a problem. 2652 Note, other programming languages that repurpose ©+© for concatenation, can have similar ambiguity issues. 2653 2654 Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution. 2655 While it can special case some combinations: 2656 \begin{C++} 2657 s = 'a' + s; §\C[2in]{// compiles in C++}§ 2658 s = "a" + s; 2659 \end{C++} 2660 it cannot generalize to any number of steps: 2661 \begin{C++} 2662 s = 'a' + 'b' + s; §\C{// does not compile in C++}\CRT§ 2663 s = "a" + "b" + s; 2664 \end{C++} 2552 2665 2553 2666 … … 2555 2668 2556 2669 The binary operators \Indexc{*} and \Indexc{*=} repeat a string $N$ times. 2557 If $N = 0$, a zero length string, ©""© is returned. 2558 \begin{cfa} 2559 s = 'x' * 3; §\C{// s is assigned "PETER PETER PETER "}§ 2560 s = (peter + ' ') * 3; §\C{// s is assigned "PETER PETER PETER "}§ 2561 \end{cfa} 2670 If $N = 0$, a zero length string, ©""©, is returned. 2671 \begin{cquote} 2672 \begin{tabular}{@{}l|l@{}} 2673 \begin{cfa} 2674 s = 'x' * 0; 2675 s = 'x' * 3; 2676 s = "abc" * 3; 2677 s = ("Peter" + ' ') * 3; 2678 \end{cfa} 2679 & 2680 \begin{cfa} 2681 "" 2682 "xxx" 2683 "abcabcabc" 2684 "Peter Peter Peter " 2685 \end{cfa} 2686 \end{tabular} 2687 \end{cquote} 2688 Like concatenation, there is a potential ambiguity with multiplication of characters; 2689 multiplication of pointers does not exist in C. 2690 \begin{cfa} 2691 ch = ch * 3; §\C[2in]{// LHS disambiguate, multiply character values}§ 2692 s = 'a' * 3; §\C{// LHS disambiguate, concatenate characters}§ 2693 printf( "%c\n", ®'a' * 3® ); §\C{// no LHS information, ambiguous}§ 2694 printf( "%c\n", ®(return char)®('a' * 3) ); §\C{// disambiguate with ascription cast}\CRT§ 2695 \end{cfa} 2696 Fortunately, character multiplication without LHS information is even rarer than addition, so repurposing the operator ©*© for ©string© types is not a problem. 2562 2697 2563 2698 2564 2699 \subsection{Substring} 2565 The substring operation returns a subset of the string starting at a position in the string and traversing a length. 2566 \begin{cfa} 2567 s = peter( 2, 3 ); §\C{// s is assigned "ETE"}§ 2568 s = peter( 4, -3 ); §\C{// s is assigned "ETE", length is opposite direction}§ 2569 s = peter( 2, 8 ); §\C{// s is assigned "ETER", length is clipped to 4}§ 2570 s = peter( 0, -1 ); §\C{// s is assigned "", beyond string so clipped to null}§ 2571 s = peter(-1, -1 ); §\C{// s is assigned "R", start and length are negative}§ 2572 \end{cfa} 2573 A negative starting position is a specification from the right end of the string. 2700 2701 The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string. 2702 \begin{cquote} 2703 \setlength{\tabcolsep}{10pt} 2704 \begin{tabular}{@{}l|ll|l@{}} 2705 \multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\ 2706 \multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\ 2707 \begin{cfa} 2708 s = name( 0, 4 ); 2709 s = name( 1, 4 ); 2710 s = name( 2, 4 ); 2711 s = name( 4, -2 ); 2712 s = name( 8, 2 ); 2713 s = name( 0, -2 ); 2714 s = name( -1, -2 ); 2715 s = name( -3 ); 2716 \end{cfa} 2717 & 2718 \begin{cfa} 2719 "PETE" 2720 "ETER" 2721 "TER" // clip length to 3 2722 "ER" 2723 "" // beyond string to right, clip to null 2724 "" // beyond string to left, clip to null 2725 "ER" 2726 "TER" // to end of string 2727 \end{cfa} 2728 & 2729 \begin{cfa} 2730 s = name( "ET" ); 2731 s = name( "WW" ); 2732 2733 2734 2735 2736 2737 2738 \end{cfa} 2739 & 2740 \begin{cfa} 2741 "ET" 2742 "" // does not occur 2743 2744 2745 2746 2747 2748 2749 \end{cfa} 2750 \end{tabular} 2751 \end{cquote} 2752 For the length form, a negative starting position is a specification from the right end of the string. 2574 2753 A negative length means that characters are selected in the opposite (right to left) direction from the starting position. 2575 2754 If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string. 2576 If the substring request is completely outside of the original string, a null string located at the end of the original string is returned. 2577 The substring operation can also appear on the left hand side of the assignment operator. 2578 The substring is replaced by the value on the right hand side of the assignment. 2579 The length of the right-hand-side value may be shorter, the same length, or longer than the length of the substring that is selected on the left hand side of the assignment. 2580 \begin{cfa} 2581 digit( 3, 3 ) = ""; §\C{// digit is assigned "0156789"}§ 2582 digit( 4, 3 ) = "xyz"; §\C{// digit is assigned "015xyz9"}§ 2583 digit( 7, 0 ) = "***"; §\C{// digit is assigned "015xyz***9"}§ 2584 digit(-4, 3 ) = "$$$"; §\C{// digit is assigned "015xyz\$\$\$9"}§ 2585 \end{cfa} 2755 If the substring request is completely outside of the original string, a null string is returned. 2756 For the pattern-form, it returns the pattern string if the pattern matches or a null string if the pattern does not match. 2757 The usefulness of this mechanism is discussed next. 2758 2759 The substring operation can appear on the left side of assignment, where it defines a replacement substring. 2760 The length of the right string may be shorter, the same, or longer than the length of left string. 2761 Hence, the left string may decrease, stay the same, or increase in length. 2762 \begin{cquote} 2763 \begin{tabular}{@{}l|l@{}} 2764 \multicolumn{2}{@{}l}{\lstinline{string digit = "0123456789"}} \\ 2765 \begin{cfa}[escapechar={}] 2766 digit( 3, 3 ) = ""; 2767 digit( 4, 3 ) = "xyz"; 2768 digit( 7, 0 ) = "***"; 2769 digit(-4, 3 ) = "$$$"; 2770 digit( 5 ) = "LLL"; 2771 \end{cfa} 2772 & 2773 \begin{cfa}[escapechar={}] 2774 "0126789" 2775 "0126xyz" 2776 "0126xyz" 2777 "012$$$z" 2778 "012$$LLL" 2779 \end{cfa} 2780 \end{tabular} 2781 \end{cquote} 2782 Now substring pattern matching is useful on the left-hand side of assignment. 2783 \begin{cquote} 2784 \begin{tabular}{@{}l|l@{}} 2785 \begin{cfa}[escapechar={}] 2786 digit( "$$" ) = "345"; 2787 digit( "LLL") = "6789"; 2788 \end{cfa} 2789 & 2790 \begin{cfa} 2791 "012345LLL" 2792 "0123456789" 2793 \end{cfa} 2794 \end{tabular} 2795 \end{cquote} 2796 The ©replace© operation extends substring to substitute all occurrences. 2797 \begin{cquote} 2798 \begin{tabular}{@{}l|l@{}} 2799 \begin{cfa} 2800 s = replace( "PETER", "E", "XX" ); 2801 s = replace( "PETER", "ET", "XX" ); 2802 s = replace( "PETER", "W", "XX" ); 2803 \end{cfa} 2804 & 2805 \begin{cfa} 2806 "PXXTXXR" 2807 "PXXER" 2808 "PETER" 2809 \end{cfa} 2810 \end{tabular} 2811 \end{cquote} 2812 The replacement is done left-to-right and substituted text is not examined for replacement. 2813 2814 2815 \subsection{Searching} 2816 2817 The ©find© operation returns the position of the first occurrence of a key in a string. 2818 If the key does not appear in the string, the length of the string is returned. 2819 \begin{cquote} 2820 \begin{tabular}{@{}l|l@{}} 2821 \multicolumn{2}{@{}l}{\lstinline{string digit = "0123456789"}} \\ 2822 \begin{cfa} 2823 i = find( digit, '3' ); 2824 i = find( digit, "45" ); 2825 i = find( digit, "abc" ); 2826 \end{cfa} 2827 & 2828 \begin{cfa} 2829 3 2830 4 2831 10 2832 \end{cfa} 2833 \end{tabular} 2834 \end{cquote} 2835 2836 A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc. 2837 \begin{cquote} 2838 \begin{tabular}{@{}l|l@{}} 2839 \begin{cfa} 2840 charclass vowels{ "aeiouy" }; 2841 i = include( "aaeiuyoo", vowels ); 2842 i = include( "aabiuyoo", vowels ); 2843 \end{cfa} 2844 & 2845 \begin{cfa} 2846 2847 8 // compliant 2848 2 // b non-compliant 2849 \end{cfa} 2850 \end{tabular} 2851 \end{cquote} 2852 ©vowels© defines a character class and function ©include© checks if all characters in the string appear in the class (compliance). 2853 The position of the last character is returned if the string is compliant or the position of the first non-compliant character. 2854 There is no relationship between the order of characters in the two strings. 2855 Function ©exclude© is the reverse of ©include©, checking if all characters in the string are excluded from the class (compliance). 2856 \begin{cquote} 2857 \begin{tabular}{@{}l|l@{}} 2858 \begin{cfa} 2859 i = exclude( "cdbfghmk", vowels ); 2860 i = exclude( "cdyfghmk", vowels ); 2861 \end{cfa} 2862 & 2863 \begin{cfa} 2864 8 // compliant 2865 2 // y non-compliant 2866 \end{cfa} 2867 \end{tabular} 2868 \end{cquote} 2869 Both forms can return the longest substring of compliant characters. 2870 \begin{cquote} 2871 \begin{tabular}{@{}l|l@{}} 2872 \begin{cfa} 2873 s = include( "aaeiuyoo", vowels ); 2874 s = include( "aabiuyoo", vowels ); 2875 s = exclude( "cdbfghmk", vowels ); 2876 s = exclude( "cdyfghmk", vowels ); 2877 \end{cfa} 2878 & 2879 \begin{cfa} 2880 "aaeiuyoo" 2881 "aa" 2882 "cdbfghmk" 2883 "cd" 2884 \end{cfa} 2885 \end{tabular} 2886 \end{cquote} 2887 2888 There are also versions of ©include© and ©exclude©, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.} 2889 \begin{cquote} 2890 \begin{tabular}{@{}l|l@{}} 2891 \begin{cfa} 2892 i = include( "1FeC34aB", ®isxdigit® ); 2893 i = include( ".,;'!\"", ®ispunct® ); 2894 i = include( "XXXx", ®isupper® ); 2895 \end{cfa} 2896 & 2897 \begin{cfa} 2898 8 // compliant 2899 6 // compliant 2900 3 // non-compliant 2901 \end{cfa} 2902 \end{tabular} 2903 \end{cquote} 2904 These operations perform an \emph{apply} of the validation function to each character, where the function returns a boolean indicating a stopping condition for the search. 2905 The position of the last character is returned if the string is compliant or the position of the first non-compliant character. 2906 2907 The translate operation returns a string with each character transformed by one of the C character transformation functions. 2908 \begin{cquote} 2909 \begin{tabular}{@{}l|l@{}} 2910 \begin{cfa} 2911 s = translate( "abc", ®toupper® ); 2912 s = translate( "ABC", ®tolower® ); 2913 int tospace( int c ) { return isspace( c ) ? ' ' : c; } 2914 s = translate( "X X\tX\nX", ®tospace® ); 2915 \end{cfa} 2916 & 2917 \begin{cfa} 2918 "ABC" 2919 "abc" 2920 2921 "X X X X" 2922 \end{cfa} 2923 \end{tabular} 2924 \end{cquote} 2925 2926 2927 \subsection{Returning N on Search Failure} 2928 2929 Some of the prior string operations are composite, \eg string operations returning the longest substring of compliant characters (©include©) are built using a search and then substring the appropriate text. 2930 However, string search can fail, which is reported as an alternate search outcome, possibly an exception. 2931 Many string libraries use a return code to indicate search failure, with a failure value of ©0© or ©-1© (PL/I~\cite{PLI} returns ©0©). 2932 This semantics leads to the awkward pattern, which can appear many times in a string library or user code. 2933 \begin{cfa} 2934 i = exclude( s, alpha ); 2935 if ( i != -1 ) return s( 0, i ); 2936 else return ""; 2937 \end{cfa} 2938 2939 \CFA adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin). 2940 This semantics allows many search and substring functions to be written without conditions, \eg: 2941 \begin{cfa} 2942 string include( const string & s, int (*f)( int ) ) { return ®s( 0, include( s, f ) )®; } 2943 string exclude( const string & s, int (*f)( int ) ) { return ®s( 0, exclude( s, f ) )®; } 2944 \end{cfa} 2945 In string systems with an $O(1)$ length operator, checking for failure is low cost. 2946 \begin{cfa} 2947 if ( include( line, alpha ) == len( line ) ) ... // not found, 0 origin 2948 \end{cfa} 2949 \VRef[Figure]{f:ExtractingWordsText} compares \CC and \CFA string code for extracting words from a line of text, repeatedly removing non-word text and then a word until the line is empty. 2950 The \CFA code is simpler solely because of the choice for indicating search failure. 2951 (A simplification of the \CC version is to concatenate a sentinel character at the end of the line so the call to ©find_first_not_of© does not fail.) 2952 2953 \begin{figure} 2954 \begin{cquote} 2955 \setlength{\tabcolsep}{15pt} 2956 \begin{tabular}{@{}l|l@{}} 2957 \multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\ 2958 \begin{cfa} 2959 for ( ;; ) { 2960 string::size_type posn = line.find_first_of( alpha ); 2961 if ( posn == string::npos ) break; 2962 line = line.substr( posn ); 2963 posn = line.find_first_not_of( alpha ); 2964 if ( posn != string::npos ) { 2965 cout << line.substr( 0, posn ) << endl; 2966 line = line.substr( posn ); 2967 } else { 2968 cout << line << endl; 2969 line = ""; 2970 } 2971 } 2972 \end{cfa} 2973 & 2974 \begin{cfa} 2975 for () { 2976 size_t posn = exclude( line, alpha ); 2977 if ( posn == len( line ) ) break; 2978 line = line( posn ); 2979 posn = include( line, alpha ); 2980 2981 sout | line( 0, posn ); 2982 line = line( posn ); 2983 2984 2985 2986 2987 } 2988 \end{cfa} 2989 \end{tabular} 2990 \end{cquote} 2991 \caption{Extracting Words from Line of Text} 2992 \label{f:ExtractingWordsText} 2993 \end{figure} 2994 2995 2996 \subsection{C Compatibility} 2997 2998 To ease conversion from C to \CFA, \CFA provides companion C @string@ functions. 2999 Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@. 3000 \begin{cquote} 3001 \setlength{\tabcolsep}{15pt} 3002 \begin{tabular}{@{}ll@{}} 3003 \begin{cfa} 3004 char s[32]; // string s; 3005 strlen( s ); 3006 strnlen( s, 3 ); 3007 strcmp( s, "abc" ); 3008 strncmp( s, "abc", 3 ); 3009 \end{cfa} 3010 & 3011 \begin{cfa} 3012 3013 strcpy( s, "abc" ); 3014 strncpy( s, "abcdef", 3 ); 3015 strcat( s, "xyz" ); 3016 strncat( s, "uvwxyz", 3 ); 3017 \end{cfa} 3018 \end{tabular} 3019 \end{cquote} 3020 However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated. 3021 Nevertheless, this capability does provide a useful starting point for conversion to safer \CFA strings. 3022 3023 3024 \subsection{I/O Operators} 3025 3026 The ability to input and output strings is as essential as for any other type. 3027 The goal for character I/O is to also work with groups rather than individual characters. 3028 A comparison with \CC string I/O is presented as a counterpoint to \CFA string I/O. 3029 3030 The \CC ooutput ©<<© and input ©>>© operators are defined on type ©string©. 3031 \CC output for ©char©, ©char *©, and ©string© are similar. 3032 The \CC manipulators are ©setw©, and its associated width controls ©left©, ©right© and ©setfill©. 3033 \begin{cquote} 3034 \setlength{\tabcolsep}{15pt} 3035 \begin{tabular}{@{}l|l@{}} 3036 \begin{C++} 3037 string s = "abc"; 3038 cout << setw(10) << left << setfill( 'x' ) << s << endl; 3039 \end{C++} 3040 & 3041 \begin{C++} 3042 3043 "abcxxxxxxx" 3044 \end{C++} 3045 \end{tabular} 3046 \end{cquote} 3047 3048 The \CFA input/output operator ©|© is defined on type ©string©. 3049 \CFA output for ©char©, ©char *©, and ©string© are similar. 3050 The \CFA manipulators are ©bin©, ©oct©, ©hex©, ©wd©, and its associated width control and ©left©. 3051 \begin{cquote} 3052 \setlength{\tabcolsep}{15pt} 3053 \begin{tabular}{@{}l|l@{}} 3054 \begin{cfa} 3055 string s = "abc"; 3056 sout | bin( s ) | nl 3057 | oct( s ) | nl 3058 | hex( s ) | nl 3059 | wd( 10, s ) | nl 3060 | wd( 10, 2, s ) | nl 3061 | left( wd( 10, s ) ); 3062 \end{cfa} 3063 & 3064 \begin{cfa} 3065 3066 "0b1100001 0b1100010 0b1100011" 3067 "0141 0142 0143" 3068 "0x61 0x62 0x63" 3069 " abc" 3070 " ab" 3071 "abc " 3072 \end{cfa} 3073 \end{tabular} 3074 \end{cquote} 3075 \CC ©setfill© is not considered an important string manipulator. 3076 3077 \CC input matching for ©char©, ©char *©, and ©string© are similar, where \emph{all} input characters are read from the current point in the input stream to the end of the type size, format width, whitespace, end of line (©'\n'©), or end of file. 3078 The \CC manipulator is ©setw© to restrict the size. 3079 Reading into a ©char© is safe as the size is 1, ©char *© is unsafe without using ©setw© to constraint the length (which includes ©'\0'©), ©string© is safe as its grows dynamically as characters are read. 3080 \begin{cquote} 3081 \setlength{\tabcolsep}{15pt} 3082 \begin{tabular}{@{}l|l@{}} 3083 \begin{C++} 3084 char ch, c[10]; 3085 string s; 3086 cin >> ch >> setw( 5 ) >> c >> s; 3087 ®abcde fg® 3088 \end{C++} 3089 & 3090 \begin{C++} 3091 3092 3093 'a' "bcde" "fg" 3094 3095 \end{C++} 3096 \end{tabular} 3097 \end{cquote} 3098 Input text can be \emph{gulped}, including whitespace, from the current point to an arbitrary delimiter character using ©getline©. 3099 3100 The \CFA philosophy for input is that, for every constant type in C, these constants should be usable as input. 3101 For example, the complex constant ©3.5+4.1i© can appear as input to a complex variable. 3102 \CFA input matching for ©char©, ©char *©, and ©string© are similar. 3103 C-strings may only be read with a width field, which should match the string size. 3104 Certain input manipulators support a scanset, which is a simple regular expression from ©printf©. 3105 The \CFA manipulators for these types are ©wdi©,\footnote{Due to an overloading issue in the type-resolver, the input width name must be temporarily different from the output, \lstinline{wdi} versus \lstinline{wd}.} and its associated width control and ©left©, ©quote©, ©incl©, ©excl©, and ©getline©. 3106 \begin{cquote} 3107 \setlength{\tabcolsep}{10pt} 3108 \begin{tabular}{@{}l|l@{}} 3109 \begin{C++} 3110 char ch, c[10]; 3111 string s; 3112 sin | ch | wdi( 5, c ) | s; 3113 ®abcde fg® 3114 sin | quote( ch ) | quote( wdi( sizeof(c), c ) ) | quote( s, '[', ']' ) | nl; 3115 ®'a' "bcde" [fg]® 3116 sin | incl( "a-zA-Z0-9 ?!&\n", s ) | nl; 3117 ®x?&000xyz TOM !.® 3118 sin | excl( "a-zA-Z0-9 ?!&\n", s ); 3119 ®<>{}{}STOP® 3120 \end{C++} 3121 & 3122 \begin{C++} 3123 3124 3125 'a' "bcde" "fg" 3126 3127 'a' "bcde" "fg" 3128 3129 "x?&000xyz TOM !" 3130 3131 "<>{}{}" 3132 3133 \end{C++} 3134 \end{tabular} 3135 \end{cquote} 3136 Note, the ability to read in quoted strings with whitespace to match with program string constants. 3137 The ©nl© at the end of an input ignores the rest of the line. 3138 3139 3140 \begin{comment} 2586 3141 A substring is treated as a pointer into the base (substringed) string rather than creating a copy of the subtext. 2587 3142 As with all pointers, if the item they are pointing at is changed, then the pointer is referring to the changed item. … … 2611 3166 } 2612 3167 \end{cfa} 2613 2614 There is an assignment form of substring in which only the starting position is specified and the length is assumed to be the remainder of the string. 2615 \begin{cfa} 2616 string operator () (int start); 2617 \end{cfa} 2618 For example: 2619 \begin{cfa} 2620 s = peter( 2 ); §\C{// s is assigned "ETER"}§ 2621 peter( 2 ) = "IPER"; §\C{// peter is assigned "PIPER"}§ 2622 \end{cfa} 2623 It is also possible to substring using a string as the index for selecting the substring portion of the string. 2624 \begin{cfa} 2625 string operator () (const string &index); 2626 \end{cfa} 2627 For example: 2628 \begin{cfa}[mathescape=false] 2629 digit( "xyz$$$" ) = "678"; §\C{// digit is assigned "0156789"}§ 2630 digit( "234") = "***"; §\C{// digit is assigned "0156789***"}§ 2631 \end{cfa} 2632 2633 2634 \subsection{Searching} 2635 2636 The ©index© operation 2637 \begin{cfa} 2638 int index( const string &key, int start = 1, occurrence occ = first ); 2639 \end{cfa} 2640 returns the position of the first or last occurrence of the ©key© (depending on the occurrence indicator ©occ© that is either ©first© or ©last©) in the current string starting the search at position ©start©. 2641 If the ©key© does not appear in the current string, the length of the current string plus one is returned. 2642 %If the ©key© has zero length, the value 1 is returned regardless of what the current string contains. 2643 A negative starting position is a specification from the right end of the string. 2644 \begin{cfa} 2645 i = digit.index( "567" ); §\C{// i is assigned 3}§ 2646 i = digit.index( "567", 7 ); §\C{// i is assigned 11}§ 2647 i = digit.index( "567", -1, last ); §\C{// i is assigned 3}§ 2648 i = peter.index( "E", 5, last ); §\C{// i is assigned 4}§ 2649 \end{cfa} 2650 2651 The next two string operations test a string to see if it is or is not composed completely of a particular class of characters. 2652 For example, are the characters of a string all alphabetic or all numeric? 2653 Use of these operations involves a two step operation. 2654 First, it is necessary to create an instance of type ©strmask© and initialize it to a string containing the characters of the particular character class, as in: 2655 \begin{cfa} 2656 strmask digitmask = digit; 2657 strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" ); 2658 \end{cfa} 2659 Second, the character mask is used in the functions ©include© and ©exclude© to check a string for compliance of its characters with the characters indicated by the mask. 2660 2661 The ©include© operation 2662 \begin{cfa} 2663 int include( const strmask &, int = 1, occurrence occ = first ); 2664 \end{cfa} 2665 returns the position of the first or last character (depending on the occurrence indicator, which is either ©first© or ©last©) in the current string that does not appear in the ©mask© starting the search at position ©start©; 2666 hence it skips over characters in the current string that are included (in) the ©mask©. 2667 The characters in the current string do not have to be in the same order as the ©mask©. 2668 If all the characters in the current string appear in the ©mask©, the length of the current string plus one is returned, regardless of which occurrence is being searched for. 2669 A negative starting position is a specification from the right end of the string. 2670 \begin{cfa} 2671 i = peter.include( digitmask ); §\C{// i is assigned 1}§ 2672 i = peter.include( alphamask ); §\C{// i is assigned 6}§ 2673 \end{cfa} 2674 2675 The ©exclude© operation 2676 \begin{cfa} 2677 int exclude( string &mask, int start = 1, occurrence occ = first ) 2678 \end{cfa} 2679 returns the position of the first or last character (depending on the occurrence indicator, which is either ©first© or ©last©) in the current string that does appear in the ©mask© string starting the search at position ©start©; 2680 hence it skips over characters in the current string that are excluded from (not in) in the ©mask© string. 2681 The characters in the current string do not have to be in the same order as the ©mask© string. 2682 If all the characters in the current string do NOT appear in the ©mask© string, the length of the current string plus one is returned, regardless of which occurrence is being searched for. 2683 A negative starting position is a specification from the right end of the string. 2684 \begin{cfa} 2685 i = peter.exclude( digitmask ); §\C{// i is assigned 6}§ 2686 i = ifstmt.exclude( strmask( punctuation ) ); §\C{// i is assigned 4}§ 2687 \end{cfa} 2688 2689 The ©includeStr© operation: 2690 \begin{cfa} 2691 string includeStr( strmask &mask, int start = 1, occurrence occ = first ) 2692 \end{cfa} 2693 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) of the current string that ARE included in the ©mask© string starting the search at position ©start©. 2694 A negative starting position is a specification from the right end of the string. 2695 \begin{cfa} 2696 s = peter.includeStr( alphamask ); §\C{// s is assigned "PETER"}§ 2697 s = ifstmt.includeStr( alphamask ); §\C{// s is assigned "IF"}§ 2698 s = peter.includeStr( digitmask ); §\C{// s is assigned ""}§ 2699 \end{cfa} 2700 2701 The ©excludeStr© operation: 2702 \begin{cfa} 2703 string excludeStr( strmask &mask, int start = 1, occurrence = first ) 2704 \end{cfa} 2705 returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) of the current string that are excluded (NOT) in the ©mask© string starting the search at position ©start©. 2706 A negative starting position is a specification from the right end of the string. 2707 \begin{cfa} 2708 s = peter.excludeStr( digitmask); §\C{// s is assigned "PETER"}§ 2709 s = ifstmt.excludeStr( strmask( punctuation ) ); §\C{// s is assigned "IF "}§ 2710 s = peter.excludeStr( alphamask); §\C{// s is assigned ""}§ 2711 \end{cfa} 2712 2713 2714 \subsection{Miscellaneous} 2715 2716 The ©trim© operation 2717 \begin{cfa} 2718 string trim( string &mask, occurrence occ = first ) 2719 \end{cfa} 2720 returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) which ARE included in the ©mask© are removed. 2721 \begin{cfa} 2722 // remove leading blanks 2723 s = string( " ABC" ).trim( " " ); §\C{// s is assigned "ABC",}§ 2724 // remove trailing blanks 2725 s = string( "ABC " ).trim( " ", last ); §\C{// s is assigned "ABC",}§ 2726 \end{cfa} 2727 2728 The ©translate© operation 2729 \begin{cfa} 2730 string translate( string &from, string &to ) 2731 \end{cfa} 2732 returns a string that is the same length as the original string in which all occurrences of the characters that appear in the ©from© string have been translated into their corresponding character in the ©to© string. 2733 Translation is done on a character by character basis between the ©from© and ©to© strings; hence these two strings must be the same length. 2734 If a character in the original string does not appear in the ©from© string, then it simply appears as is in the resulting string. 2735 \begin{cfa} 2736 // upper to lower case 2737 peter = peter.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" ); 2738 // peter is assigned "peter" 2739 s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" ); 2740 // ifstmt is assigned "if (a > b) {" 2741 // lower to upper case 2742 peter = peter.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ); 2743 // peter is assigned "PETER" 2744 \end{cfa} 2745 2746 The ©replace© operation 2747 \begin{cfa} 2748 string replace( string &from, string &to ) 2749 \end{cfa} 2750 returns a string in which all occurrences of the ©from© string in the current string have been replaced by the ©to© string. 2751 \begin{cfa} 2752 s = peter.replace( "E", "XX" ); §\C{// s is assigned "PXXTXXR"}§ 2753 \end{cfa} 2754 The replacement is done left-to-right. 2755 When an instance of the ©from© string is found and changed to the ©to© string, it is NOT examined again for further replacement. 2756 2757 \subsection{Returning N+1 on Failure} 2758 2759 Any of the string search routines can fail at some point during the search. 2760 When this happens it is necessary to return indicating the failure. 2761 Many string types in other languages use some special value to indicate the failure. 2762 This value is often 0 or -1 (PL/I returns 0). 2763 This section argues that a value of N+1, where N is the length of the base string in the search, is a more useful value to return. 2764 The index-of function in APL returns N+1. 2765 These are the boundary situations and are often overlooked when designing a string type. 2766 2767 The situation that can be optimized by returning N+1 is when a search is performed to find the starting location for a substring operation. 2768 For example, in a program that is extracting words from a text file, it is necessary to scan from left to right over whitespace until the first alphabetic character is found. 2769 \begin{cfa} 2770 line = line( line.exclude( alpha ) ); 2771 \end{cfa} 2772 If a text line contains all whitespaces, the exclude operation fails to find an alphabetic character. 2773 If ©exclude© returns 0 or -1, the result of the substring operation is unclear. 2774 Most string types generate an error, or clip the starting value to 1, resulting in the entire whitespace string being selected. 2775 If ©exclude© returns N+1, the starting position for the substring operation is beyond the end of the string leaving a null string. 2776 2777 The same situation occurs when scanning off a word. 2778 \begin{cfa} 2779 start = line.include(alpha); 2780 word = line(1, start - 1); 2781 \end{cfa} 2782 If the entire line is composed of a word, the include operation will fail to find a non-alphabetic character. 2783 In general, returning 0 or -1 is not an appropriate starting position for the substring, which must substring off the word leaving a null string. 2784 However, returning N+1 will substring off the word leaving a null string. 2785 2786 2787 \subsection{C Compatibility} 2788 2789 To ease conversion from C to \CFA, there are companion ©string© routines for C strings. 2790 \VRef[Table]{t:CompanionStringRoutines} shows the C routines on the left that also work with ©string© and the rough equivalent ©string© opeation of the right. 2791 Hence, it is possible to directly convert a block of C string operations into @string@ just by changing the 2792 2793 \begin{table} 2794 \begin{cquote} 2795 \begin{tabular}{@{}l|l@{}} 2796 \multicolumn{1}{c|}{©char []©} & \multicolumn{1}{c}{©string©} \\ 2797 \hline 2798 ©strcpy©, ©strncpy© & ©=© \\ 2799 ©strcat©, ©strncat© & ©+© \\ 2800 ©strcmp©, ©strncmp© & ©==©, ©!=©, ©<©, ©<=©, ©>©, ©>=© \\ 2801 ©strlen© & ©size© \\ 2802 ©[]© & ©[]© \\ 2803 ©strstr© & ©find© \\ 2804 ©strcspn© & ©find_first_of©, ©find_last_of© \\ 2805 ©strspc© & ©find_fist_not_of©, ©find_last_not_of© 2806 \end{tabular} 2807 \end{cquote} 2808 \caption{Companion Routines for \CFA \lstinline{string} to C Strings} 2809 \label{t:CompanionStringRoutines} 2810 \end{table} 2811 2812 For example, this block of C code can be converted to \CFA by simply changing the type of variable ©s© from ©char []© to ©string©. 2813 \begin{cfa} 2814 char s[32]; 2815 //string s; 2816 strcpy( s, "abc" ); PRINT( %s, s ); 2817 strncpy( s, "abcdef", 3 ); PRINT( %s, s ); 2818 strcat( s, "xyz" ); PRINT( %s, s ); 2819 strncat( s, "uvwxyz", 3 ); PRINT( %s, s ); 2820 PRINT( %zd, strlen( s ) ); 2821 PRINT( %c, s[3] ); 2822 PRINT( %s, strstr( s, "yzu" ) ) ; 2823 PRINT( %s, strstr( s, 'y' ) ) ; 2824 \end{cfa} 2825 However, the conversion fails with I/O because ©printf© cannot print a ©string© using format code ©%s© because \CFA strings are not null terminated. 2826 2827 2828 \subsection{Input/Output Operators} 2829 2830 Both the \CC operators ©<<© and ©>>© are defined on type ©string©. 2831 However, input of a string value is different from input of a ©char *© value. 2832 When a string value is read, \emph{all} input characters from the current point in the input stream to either the end of line (©'\n'©) or the end of file are read. 3168 \end{comment} 2833 3169 2834 3170 … … 3340 3676 allowable calls are: 3341 3677 \begin{cquote} 3342 \setlength{\tabcolsep}{0.75in}3343 3678 \begin{tabular}{@{}ll@{}} 3344 3679 \textbf{positional arguments} & \textbf{empty arguments} \\
Note:
See TracChangeset
for help on using the changeset viewer.