Context Navigation

-              rfae93a40
+              r829a955
 %% Created On       : Wed Apr  6 14:53:29 2016
 %% Last Modified By : Peter A. Buhr
 %% Last Modified On : Mon Apr 14 20:53:55 2025
 %% Update Count     : 7065
+%% Last Modified On : Mon Sep 15 17:06:25 2025
+%% Update Count     : 7216
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 …
 \setlength{\topmargin}{-0.45in}                                                 % move running title into header
 \setlength{\headsep}{0.25in}
+\setlength{\tabcolsep}{15pt}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 …
 In addition, inclusive ranges are allowed using symbol ©~© to specify a contiguous set of case values, both positive and negative.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}llll@{}}
 \multicolumn{1}{c}{\textbf{C}}  & \multicolumn{1}{c}{\textbf{\CFA}}     & \multicolumn{1}{c}{\textbf{©gcc©}}    \\
 …
 \end{tabular}
 \end{cquote}
+The target label must be below the \Indexc{fallthrough} and may not be nested in a control structure, and
+the target label must be at the same or higher level as the containing \Indexc{case} clause and located at
+the same level as a ©case© clause; the target label may be case \Indexc{default}, but only associated
+with the current \Indexc{switch}/\Indexc{choose} statement.
+The target label must be below the \Indexc{fallthrough} and may not be nested in a control structure, and the target label must be at the same or higher level as the containing \Indexc{case} clause and located at the same level as a ©case© clause;
+the target label may be case \Indexc{default}, but only associated with the current \Indexc{switch}/\Indexc{choose} statement.
 \subsection{Loop Control}
+\CFA condenses writing loops to facilitate coding speed and safety.
+To simplify creating an infinite loop, the \Indexc{for}, \Indexc{while}, and \Indexc{do} loop-predicate\index{loop predicate} is extended with an empty conditional, meaning a comparison value of ©1© (true).
+\begin{cfa}
+while ( )                               §\C{// while ( true )}§
+for ( )                                 §\C{// for ( ; true; )}§
+do ... while ( )                §\C{// do ... while ( true )}§
+\end{cfa}
 Looping a predefined number of times, possibly with a loop index, occurs frequently.
-\CFA condenses writing loops to facilitate coding speed and safety.
-\Indexc{for}, \Indexc{while}, and \Indexc{do} loop-control\index{loop control} are extended with an empty conditional, meaning a comparison value of ©1© (true).
-\begin{cfa}
-while ( ®/* empty */®  )                                §\C{// while ( true )}§
-for ( ®/* empty */®  )                                  §\C{// for ( ; true; )}§
-do ... while ( ®/* empty */®  )                 §\C{// do ... while ( true )}§
-\end{cfa}
 The ©for© control\index{for control}, \ie ©for ( /* control */ )©, is extended with a range and step.
 A range is a set of values defined by an optional low value (default to 0), tilde, and high value, ©L ~ H©, with an optional step ©~ S© (default to 1), which means an ascending set of values from ©L© to ©H© in positive steps of ©S©.
 …
 \end{cfa}
 \R{Warning}: A range in descending order, \eg ©5 ~ -3© is the null (empty) set, \ie no values in the set.
+\R{Warning}: A ©0© or negative step is undefined.
+Note, the order of values in a set may not be the order the values are presented during looping.
+As well, a ©0© or negative step is undefined.
 The range character, ©'~'©, is decorated on the left and right to control how the set values are presented in the loop body.
 …
 -8 ®§\Sp§®~ -2                                                  §\C{// ascending, no prefix}§
 ®+®~ 5                                                                §\C{// ascending, prefix}§
 -3 ®-®~ 3                                                               §\C{// descending}§
+-3 ®-®~ 3                                                               §\C{// descending, prefix}§
 \end{cfa}
 For descending iteration, the ©L© and ©H© values are \emph{implicitly} switched, and the increment/decrement for ©S© is toggled.
+Hence, the order of values in a set may not be the order the values are presented during looping.
 When changing the iteration direction, this form is faster and safer, \ie the direction prefix can be added/removed without changing existing (correct) program text.
 \R{Warning}: reversing the range endpoints for descending order results in an empty set.
 …
 \index{-\~}\index{descending exclusive range}
 \index{-\~=}\index{descending inclusive range}
+\begin{comment}
+To simplify loop iteration a range is provided, from low to high, and a traversal direction, ascending (©+©) or descending (©-©).
+The following is the syntax for the loop range, where ©[©\,©]© means optional.
+\begin{cfa}[deletekeywords=default]
+[ ®index ;® ] [ [ ®min® (default 0) ] [ direction ®+®/®-® (default +) ] ®~® [ ®=® (include endpoint) ] ] ®max® [ ®~ increment® ]
+\end{cfa}
+For ©=©, the range includes the endpoint (©max©/©min©) depending on the direction (©+©/©-©).
+\end{comment}
 ©for© control is formalized by the following regular expression:
 …
 \label{s:stringType}
+The \CFA \Indexc{string} type is for manipulation of dynamically-size character-strings versus C \Indexc{char *} type for manipulation of statically-size null-terminated character-strings.
+That is, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
+Hence, a ©string© declaration does not specify a maximum length;
+as a string dynamically grows and shrinks in size, so does its underlying storage.
+In contrast, a C string also dynamically grows and shrinks is size, but its underlying storage is fixed.
+A string is a sequence of symbols, where the form of a symbol can vary significantly: regular 7/8-bit ASCII/Latin-1, or wide 2/4/8-byte UNICODE or variable length UTF-8/16/32.
+A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes ©"xyz\n"©.
+Currently, \CFA strings only support regular characters.
+A string type is designed to operate on groups of characters for assigning, copying, scanning, and updating, rather than working with individual characters.
+The \CFA \Indexc{string} type is for manipulation of dynamically-sized strings versus C \Indexc{char *} type for manipulation of statically-sized null-terminated strings.
+Therefore, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
+As a result, a ©string© declaration does not specify a maximum length, where a C string array does.
+For \CFA, as a ©string© dynamically grows and shrinks in size, so does its underlying storage.
+For C, as a string dynamically grows and shrinks in size, but its underlying storage does not.
 The maximum storage for a \CFA ©string© value is ©size_t© characters, which is $2^{32}$ or $2^{64}$ respectively.
 A \CFA string manages its length separately from the string, so there is no null (©'\0'©) terminating value at the end of a string value.
 Hence, a \CFA string cannot be passed to a C string manipulation routine, such as ©strcat©.
+Like C strings, the characters in a ©string© are numbered starting from 0.
+The following operations have been defined to manipulate an instance of type ©string©.
+The discussion assumes the following declarations and assignment statements are executed.
+\begin{cfa}
+#include ®<string.hfa>®
+®string® s, peter, digit, alpha, punctuation, ifstmt;
+int i;
+peter  = "PETER";
+digit  = "0123456789";
+punctuation = "().,";
+ifstmt = "IF (A > B) {";
+\end{cfa}
+Note, the include file \Indexc{string.hfa} to access type ©string©.
+\subsection{Implicit String Conversions}
+The types ©char©, ©char *©, ©int©, ©double©, ©_Complex©, including different signness and sizes, implicitly convert to type ©string©.
+\VRef[Figure]{f:ImplicitConversionsString} shows examples of implicit conversions between C strings, integral, floating-point and complex types to ©string©.
+A conversions can be explicitly specified:
+\begin{cfa}
+s = string( "abc" );                            §\C{// converts char * to string}§
+s = string( 5 );                                        §\C{// converts int to string}§
+s = string( 5.5 );                                      §\C{// converts double to string}§
+\end{cfa}
+All conversions from ©string© to ©char *©, attempt to be safe:
+either by requiring the maximum length of the ©char *© storage (©strncpy©) or allocating the ©char *© storage for the string characters (ownership), meaning the programmer must free the storage.
+As well, a string is always null terminates, implying a minimum size of 1 character.
+Like C strings, characters in a ©string© are numbered from the left starting at 0 (because subscripting is zero-origin), and in \CFA numbered from the right starting at -1.
 \begin{cquote}
+\begin{tabular}{@{}l@{\hspace{1.75in}}|@{\hspace{15pt}}l@{}}
+\begin{cfa}
+string s = "abcde";
+char cs[3];
+strncpy( cs, s, sizeof(cs) );           §\C{sout | cs;}§
+char * cp = s;                                          §\C{sout | cp;}§
+delete( cp );
+cp = s + ' ' + s;                                       §\C{sout | cp;}§
+delete( cp );
+\end{cfa}
+&
+\begin{cfa}
+ab
+abcde
+abcde abcde
+\end{cfa}
+\rm
+\begin{tabular}{@{}rrrrll@{}}
+\small\tt "a & \small\tt b & \small\tt c & \small\tt d & \small\tt e" \\
+& 1 & 2 & 3 & 4 & left to right index \\
+-5 & -4 & -3 & -2 & -1 & right to left index
 \end{tabular}
 \end{cquote}
+\begin{figure}
+\begin{tabular}{@{}l@{\hspace{15pt}}|@{\hspace{15pt}}l@{}}
+\begin{cfa}
+//      string s = 5;                                   sout | s;
+        string s;
+        // conversion of char and char * to string
+        s = 'x';                                                §\C{sout | s;}§
+        s = "abc";                                              §\C{sout | s;}§
+        char cs[5] = "abc";
+        s = cs;                                                 §\C{sout | s;}§
+        // conversion of integral, floating-point, and complex to string
+        s = 45hh;                                               §\C{sout | s;}§
+        s = 45h;                                                §\C{sout | s;}§
+        s = -(ssize_t)MAX - 1;                  §\C{sout | s;}§
+        s = (size_t)MAX;                                §\C{sout | s;}§
+        s = 5.5;                                                §\C{sout | s;}§
+        s = 5.5L;                                               §\C{sout | s;}§
+        s = 5.5+3.4i;                                   §\C{sout | s;}§
+        s = 5.5L+3.4Li;                                 §\C{sout | s;}§
+\end{cfa}
+&
+\begin{cfa}
+x
+abc
+abc
+-9223372036854775808
+18446744073709551615
+.5
+.5
+.5+3.4i
+.5+3.4i
+The include file \Indexc{string.hfa} is necessary to access type ©string©.
+\subsection{Implicit String Conversions}
+The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O.
+Hence, the basic types ©char©, ©char *©, ©int©, ©double©, ©_Complex©, including any signness and size variations, implicitly convert to type ©string© (as in Java).
+\begin{cquote}
+\begin{tabular}{@{}l|ll|l@{}}
+\begin{cfa}
+string s = 5;
+s = 'x';
+s = "abc";
+s = 42hh;               /* signed char */
+s = 42h;                /* short int */
+s = 0xff;
+\end{cfa}
+&
+\begin{cfa}
+"5"
+"x"
+"abc"
+"42"
+"42"
+"255"
+\end{cfa}
+&
+\begin{cfa}
+s = (ssize_t)MIN;
+s = (size_t)MAX;
+s = 5.5;
+s = 5.5L;
+s = 5.5+3.4i;
+s = 5.5L+3.4Li;
+\end{cfa}
+&
+\begin{cfa}
+"-9223372036854775808"
+"18446744073709551615"
+"5.5"
+"5.5"
+"5.5+3.4i"
+"5.5+3.4i"
 \end{cfa}
 \end{tabular}
+\caption{Implicit Conversions to String}
+\label{f:ImplicitConversionsString}
+\end{figure}
+\subsection{Size (length)}
+The ©size© operation returns the length of a string.
+\begin{cfa}
+i = size( "" );                                         §\C{// i is assigned 0}§
+i = size( "abc" );                                      §\C{// i is assigned 3}§
+i = size( peter );                                      §\C{// i is assigned 5}§
+\end{cfa}
+\end{cquote}
+Conversions can be explicitly specified using a compound literal.
+\begin{cfa}
+s = (string){ 5 };    s = (string){ "abc" };   s = (string){ 5.5 };
+\end{cfa}
+Conversions from ©string© to ©char *© attempt to be safe.
+The ©strncpy© conversion requires the maximum length for the pointer's target buffer.
+The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it.
+Note, a C string is always null terminated, implying storage is always necessary for the null.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+string s = "abcde";
+char cs[4];
+strncpy( cs, s, sizeof(cs) );
+char * cp = s;          // ownership
+delete( cp );
+cp = s + ' ' + s;       // ownership
+delete( cp );
+\end{cfa}
+&
+\begin{cfa}
+"abc\0", in place
+"abcde\0", malloc
+"abcde abcde\0", malloc
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\subsection{Length}
+The ©len© operation (short for ©strlen©) returns the length of a C or \CFA string.
+For compatibility, ©strlen© works with \CFA strings.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+i = len( "" );
+i = len( "abc" );
+i = len( cs );
+i = strlen( cs );
+i = len( name );
+i = strlen( name );
+\end{cfa}
+&
+\begin{cfa}
+\end{cfa}
+\end{tabular}
+\end{cquote}
 \subsection{Comparison Operators}
+The binary \Index{relational operator}s, ©<©, ©<=©, ©>©, ©>=©, and \Index{equality operator}s, ©==©, ©!=©, compare strings using lexicographical ordering, where longer strings are greater than shorter strings.
+The binary relational\index{string!relational opertors}, \Indexc{<}, \Indexc{<=}, \Indexc{>}, \Indexc{>=}, and equality\index{string!equality operators}, \Indexc{==}, \Indexc{!=}, operators compare \CFA strings using lexicographical ordering, where longer strings are greater than shorter strings.
+In C, these operators compare the C string pointer not its value, which does not match programmer expectation.
+C strings use function ©strcmp© to lexicographically compare the string value.
+Java has the same issue with ©==© and ©.equals©.
 \subsection{Concatenation}
+The binary operators \Indexc{+} and \Indexc{+=} concatenate two strings, creating the sum of the strings.
+\begin{cfa}
+s = peter + ' ' + digit;                        §\C{// s is assigned "PETER 0123456789"}§
+s += peter;                                                     §\C{// s is assigned "PETER 0123456789PETER"}§
+\end{cfa}
+The binary operators \Indexc{+} and \Indexc{+=} concatenate C ©char©, ©char *© and \CFA strings, creating the sum of the characters.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}}
+\begin{cfa}
+s = "";
+s = 'a' + 'b';
+s = 'a' + "b";
+s = "a" + 'b';
+s = "a" + "b";
+\end{cfa}
+&
+\begin{cfa}
+"ab"
+"ab"
+"ab"
+"ab"
+\end{cfa}
+&
+\begin{cfa}
+s = "";
+s = 'a' + 'b' + s;
+s = 'a' + 'b' + s;
+s = 'a' + "b" + s;
+s = "a" + 'b' + s;
+\end{cfa}
+&
+\begin{cfa}
+"ab"
+"abab"
+"ababab"
+"abababab"
+\end{cfa}
+&
+\begin{cfa}
+s = "";
+s = s + 'a' + 'b';
+s = s + 'a' + "b";
+s = s + "a" + 'b';
+s = s + "a" + "b";
+\end{cfa}
+&
+\begin{cfa}
+"ab"
+"abab"
+"ababab"
+"abababab"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+However, including ©<string.hfa>© can result in ambiguous uses of the overloaded ©+© operator.\footnote{Combining multiple packages in any programming language can result in name clashes or ambiguities.}
+For example, subtracting characters or pointers has valid use-cases:
+\begin{cfa}
+ch - '0'        §\C[2in]{// find character offset}§
+cs - cs2;       §\C{// find pointer offset}\CRT§
+\end{cfa}
+addition is less obvious:
+\begin{cfa}
+ch + 'b'        §\C[2in]{// add character values}§
+cs + 'a';       §\C{// move pointer cs['a']}\CRT§
+\end{cfa}
+There are legitimate use cases for arithmetic with ©signed©/©unsigned© characters (bytes), and these types are treated differently from ©char© in \CC and \CFA.
+However, backwards compatibility makes it impossible to restrict or remove addition on type ©char©.
+Similarly, it is impossible to restrict or remove addition on type ©char *© because (unfortunately) it is subscripting: ©cs + 'a'© implies ©cs['a']© or ©'a'[cs]©.
+The prior \CFA concatenation examples show complex mixed-mode interactions among ©char©, ©char *©, and ©string© constants work correctly (variables are the same).
+The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs.
+Hence, the type system correctly handles all uses of addition (explicit or implicit) for ©char *©.
+\begin{cfa}
+printf( "%s %s %s %c %c\n", "abc", cs, cs + 3, cs['a'], 'a'[cs] );
+\end{cfa}
+Only ©char© addition can result in ambiguities, and only when there is no left-hand information.
+\begin{cfa}
+ch = ch + 'b';          §\C[2in]{// LHS disambiguate, add character values}§
+s = 'a' + 'b';          §\C{// LHS disambiguate, concatenate characters}§
+printf( "%c\n", ®'a' + 'b'® ); §\C{// no LHS information, ambiguous}§
+printf( "%c\n", ®(return char)®('a' + 'b') ); §\C{// disambiguate with ascription cast}\CRT§
+\end{cfa}
+The ascription cast, ©(return T)©, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion).
+Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator ©+© for ©string© types is not a problem.
+Note, other programming languages that repurpose ©+© for concatenation, can have similar ambiguity issues.
+Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution.
+While it can special case some combinations:
+\begin{C++}
+s = 'a' + s; §\C[2in]{// compiles in C++}§
+s = "a" + s;
+\end{C++}
+it cannot generalize to any number of steps:
+\begin{C++}
+s = 'a' + 'b' + s; §\C{// does not compile in C++}\CRT§
+s = "a" + "b" + s;
+\end{C++}
 …
 The binary operators \Indexc{*} and \Indexc{*=} repeat a string $N$ times.
+If $N = 0$, a zero length string, ©""© is returned.
+\begin{cfa}
+s = 'x' * 3;                            §\C{// s is assigned "PETER PETER PETER "}§
+s = (peter + ' ') * 3;                          §\C{// s is assigned "PETER PETER PETER "}§
+\end{cfa}
+If $N = 0$, a zero length string, ©""©, is returned.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = 'x' * 0;
+s = 'x' * 3;
+s = "abc" * 3;
+s = ("Peter" + ' ') * 3;
+\end{cfa}
+&
+\begin{cfa}
+""
+"xxx"
+"abcabcabc"
+"Peter Peter Peter "
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Like concatenation, there is a potential ambiguity with multiplication of characters;
+multiplication of pointers does not exist in C.
+\begin{cfa}
+ch = ch * 3;            §\C[2in]{// LHS disambiguate, multiply character values}§
+s = 'a' * 3;            §\C{// LHS disambiguate, concatenate characters}§
+printf( "%c\n", ®'a' * 3® ); §\C{// no LHS information, ambiguous}§
+printf( "%c\n", ®(return char)®('a' * 3) ); §\C{// disambiguate with ascription cast}\CRT§
+\end{cfa}
+Fortunately, character multiplication without LHS information is even rarer than addition, so repurposing the operator ©*© for ©string© types is not a problem.
 \subsection{Substring}
+The substring operation returns a subset of the string starting at a position in the string and traversing a length.
+\begin{cfa}
+s = peter( 2, 3 );                                      §\C{// s is assigned "ETE"}§
+s = peter( 4, -3 );                                     §\C{// s is assigned "ETE", length is opposite direction}§
+s = peter( 2, 8 );                                      §\C{// s is assigned "ETER", length is clipped to 4}§
+s = peter( 0, -1 );                                     §\C{// s is assigned "", beyond string so clipped to null}§
+s = peter(-1, -1 );                                     §\C{// s is assigned "R", start and length are negative}§
+\end{cfa}
+A negative starting position is a specification from the right end of the string.
+The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string.
+\begin{cquote}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}l|ll|l@{}}
+\multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\
+\multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\
+\begin{cfa}
+s = name( 0, 4 );
+s = name( 1, 4 );
+s = name( 2, 4 );
+s = name( 4, -2 );
+s = name( 8, 2 );
+s = name( 0, -2 );
+s = name( -1, -2 );
+s = name( -3 );
+\end{cfa}
+&
+\begin{cfa}
+"PETE"
+"ETER"
+"TER"   // clip length to 3
+"ER"
+""                 // beyond string to right, clip to null
+""                 // beyond string to left, clip to null
+"ER"
+"TER"   // to end of string
+\end{cfa}
+&
+\begin{cfa}
+s = name( "ET" );
+s = name( "WW" );
+\end{cfa}
+&
+\begin{cfa}
+"ET"
+""  // does not occur
+\end{cfa}
+\end{tabular}
+\end{cquote}
+For the length form, a negative starting position is a specification from the right end of the string.
 A negative length means that characters are selected in the opposite (right to left) direction from the starting position.
 If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string.
+If the substring request is completely outside of the original string, a null string located at the end of the original string is returned.
+The substring operation can also appear on the left hand side of the assignment operator.
+The substring is replaced by the value on the right hand side of the assignment.
+The length of the right-hand-side value may be shorter, the same length, or longer than the length of the substring that is selected on the left hand side of the assignment.
+\begin{cfa}
+digit( 3, 3 ) = "";                             §\C{// digit is assigned "0156789"}§
+digit( 4, 3 ) = "xyz";                          §\C{// digit is assigned "015xyz9"}§
+digit( 7, 0 ) = "***";                          §\C{// digit is assigned "015xyz***9"}§
+digit(-4, 3 ) = "$$$";                          §\C{// digit is assigned "015xyz\$\$\$9"}§
+\end{cfa}
+If the substring request is completely outside of the original string, a null string is returned.
+For the pattern-form, it returns the pattern string if the pattern matches or a null string if the pattern does not match.
+The usefulness of this mechanism is discussed next.
+The substring operation can appear on the left side of assignment, where it defines a replacement substring.
+The length of the right string may be shorter, the same, or longer than the length of left string.
+Hence, the left string may decrease, stay the same, or increase in length.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{2}{@{}l}{\lstinline{string digit = "0123456789"}} \\
+\begin{cfa}[escapechar={}]
+digit( 3, 3 ) = "";
+digit( 4, 3 ) = "xyz";
+digit( 7, 0 ) = "***";
+digit(-4, 3 ) = "$$$";
+digit( 5 ) = "LLL";
+\end{cfa}
+&
+\begin{cfa}[escapechar={}]
+"0126789"
+"0126xyz"
+"0126xyz"
+"012$$$z"
+"012$$LLL"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Now substring pattern matching is useful on the left-hand side of assignment.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}[escapechar={}]
+digit( "$$" ) = "345";
+digit( "LLL") = "6789";
+\end{cfa}
+&
+\begin{cfa}
+"012345LLL"
+"0123456789"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+The ©replace© operation extends substring to substitute all occurrences.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = replace( "PETER", "E", "XX" );
+s = replace( "PETER", "ET", "XX" );
+s = replace( "PETER", "W", "XX" );
+\end{cfa}
+&
+\begin{cfa}
+"PXXTXXR"
+"PXXER"
+"PETER"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+The replacement is done left-to-right and substituted text is not examined for replacement.
+\subsection{Searching}
+The ©find© operation returns the position of the first occurrence of a key in a string.
+If the key does not appear in the string, the length of the string is returned.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{2}{@{}l}{\lstinline{string digit = "0123456789"}} \\
+\begin{cfa}
+i = find( digit, '3' );
+i = find( digit, "45" );
+i = find( digit, "abc" );
+\end{cfa}
+&
+\begin{cfa}
+\end{cfa}
+\end{tabular}
+\end{cquote}
+A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+charclass vowels{ "aeiouy" };
+i = include( "aaeiuyoo", vowels );
+i = include( "aabiuyoo", vowels );
+\end{cfa}
+&
+\begin{cfa}
+  // compliant
+  // b non-compliant
+\end{cfa}
+\end{tabular}
+\end{cquote}
+©vowels© defines a character class and function ©include© checks if all characters in the string appear in the class (compliance).
+The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
+There is no relationship between the order of characters in the two strings.
+Function ©exclude© is the reverse of ©include©, checking if all characters in the string are excluded from the class (compliance).
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+i = exclude( "cdbfghmk", vowels );
+i = exclude( "cdyfghmk", vowels );
+\end{cfa}
+&
+\begin{cfa}
+  // compliant
+  // y non-compliant
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Both forms can return the longest substring of compliant characters.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = include( "aaeiuyoo", vowels );
+s = include( "aabiuyoo", vowels );
+s = exclude( "cdbfghmk", vowels );
+s = exclude( "cdyfghmk", vowels );
+\end{cfa}
+&
+\begin{cfa}
+"aaeiuyoo"
+"aa"
+"cdbfghmk"
+"cd"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+There are also versions of ©include© and ©exclude©, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.}
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+i = include( "1FeC34aB", ®isxdigit® );
+i = include( ".,;'!\"", ®ispunct® );
+i = include( "XXXx", ®isupper® );
+\end{cfa}
+&
+\begin{cfa}
+   // compliant
+   // compliant
+   // non-compliant
+\end{cfa}
+\end{tabular}
+\end{cquote}
+These operations perform an \emph{apply} of the validation function to each character, where the function returns a boolean indicating a stopping condition for the search.
+The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
+The translate operation returns a string with each character transformed by one of the C character transformation functions.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = translate( "abc", ®toupper® );
+s = translate( "ABC", ®tolower® );
+int tospace( int c ) { return isspace( c ) ? ' ' : c; }
+s = translate( "X X\tX\nX", ®tospace® );
+\end{cfa}
+&
+\begin{cfa}
+"ABC"
+"abc"
+"X X X X"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\subsection{Returning N on Search Failure}
+Some of the prior string operations are composite, \eg string operations returning the longest substring of compliant characters (©include©) are built using a search and then substring the appropriate text.
+However, string search can fail, which is reported as an alternate search outcome, possibly an exception.
+Many string libraries use a return code to indicate search failure, with a failure value of ©0© or ©-1© (PL/I~\cite{PLI} returns ©0©).
+This semantics leads to the awkward pattern, which can appear many times in a string library or user code.
+\begin{cfa}
+i = exclude( s, alpha );
+if ( i != -1 ) return s( 0, i );
+else return "";
+\end{cfa}
+\CFA adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin).
+This semantics allows many search and substring functions to be written without conditions, \eg:
+\begin{cfa}
+string include( const string & s, int (*f)( int ) ) { return ®s( 0, include( s, f ) )®; }
+string exclude( const string & s, int (*f)( int ) ) { return ®s( 0, exclude( s, f ) )®; }
+\end{cfa}
+In string systems with an $O(1)$ length operator, checking for failure is low cost.
+\begin{cfa}
+if ( include( line, alpha ) == len( line ) ) ... // not found, 0 origin
+\end{cfa}
+\VRef[Figure]{f:ExtractingWordsText} compares \CC and \CFA string code for extracting words from a line of text, repeatedly removing non-word text and then a word until the line is empty.
+The \CFA code is simpler solely because of the choice for indicating search failure.
+(A simplification of the \CC version is to concatenate a sentinel character at the end of the line so the call to ©find_first_not_of© does not fail.)
+\begin{figure}
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\
+\begin{cfa}
+for ( ;; ) {
+        string::size_type posn = line.find_first_of( alpha );
+  if ( posn == string::npos ) break;
+        line = line.substr( posn );
+        posn = line.find_first_not_of( alpha );
+        if ( posn != string::npos ) {
+                cout << line.substr( 0, posn ) << endl;
+                line = line.substr( posn );
+        } else {
+                cout << line << endl;
+                line = "";
+        }
+}
+\end{cfa}
+&
+\begin{cfa}
+for () {
+        size_t posn = exclude( line, alpha );
+  if ( posn == len( line ) ) break;
+        line = line( posn );
+        posn = include( line, alpha );
+        sout | line( 0, posn );
+        line = line( posn );
+}
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\caption{Extracting Words from Line of Text}
+\label{f:ExtractingWordsText}
+\end{figure}
+\subsection{C Compatibility}
+To ease conversion from C to \CFA, \CFA provides companion C @string@ functions.
+Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}ll@{}}
+\begin{cfa}
+char s[32];   // string s;
+strlen( s );
+strnlen( s, 3 );
+strcmp( s, "abc" );
+strncmp( s, "abc", 3 );
+\end{cfa}
+&
+\begin{cfa}
+strcpy( s, "abc" );
+strncpy( s, "abcdef", 3 );
+strcat( s, "xyz" );
+strncat( s, "uvwxyz", 3 );
+\end{cfa}
+\end{tabular}
+\end{cquote}
+However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated.
+Nevertheless, this capability does provide a useful starting point for conversion to safer \CFA strings.
+\subsection{I/O Operators}
+The ability to input and output strings is as essential as for any other type.
+The goal for character I/O is to also work with groups rather than individual characters.
+A comparison with \CC string I/O is presented as a counterpoint to \CFA string I/O.
+The \CC ooutput ©<<© and input ©>>© operators are defined on type ©string©.
+\CC output for ©char©, ©char *©, and ©string© are similar.
+The \CC manipulators are ©setw©, and its associated width controls ©left©, ©right© and ©setfill©.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{C++}
+string s = "abc";
+cout << setw(10) << left << setfill( 'x' ) << s << endl;
+\end{C++}
+&
+\begin{C++}
+"abcxxxxxxx"
+\end{C++}
+\end{tabular}
+\end{cquote}
+The \CFA input/output operator ©|© is defined on type ©string©.
+\CFA output for ©char©, ©char *©, and ©string© are similar.
+The \CFA manipulators are ©bin©, ©oct©, ©hex©, ©wd©, and its associated width control and ©left©.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+string s = "abc";
+sout | bin( s ) | nl
+           | oct( s ) | nl
+           | hex( s ) | nl
+           | wd( 10, s ) | nl
+           | wd( 10, 2, s ) | nl
+           | left( wd( 10, s ) );
+\end{cfa}
+&
+\begin{cfa}
+"0b1100001 0b1100010 0b1100011"
+"0141 0142 0143"
+"0x61 0x62 0x63"
+"       abc"
+"        ab"
+"abc       "
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\CC ©setfill© is not considered an important string manipulator.
+\CC input matching for ©char©, ©char *©, and ©string© are similar, where \emph{all} input characters are read from the current point in the input stream to the end of the type size, format width, whitespace, end of line (©'\n'©), or end of file.
+The \CC manipulator is ©setw© to restrict the size.
+Reading into a ©char© is safe as the size is 1, ©char *© is unsafe without using ©setw© to constraint the length (which includes ©'\0'©), ©string© is safe as its grows dynamically as characters are read.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{C++}
+char ch, c[10];
+string s;
+cin >> ch >> setw( 5 ) >> c  >> s;
+®abcde   fg®
+\end{C++}
+&
+\begin{C++}
+'a' "bcde" "fg"
+\end{C++}
+\end{tabular}
+\end{cquote}
+Input text can be \emph{gulped}, including whitespace, from the current point to an arbitrary delimiter character using ©getline©.
+The \CFA philosophy for input is that, for every constant type in C, these constants should be usable as input.
+For example, the complex constant ©3.5+4.1i© can appear as input to a complex variable.
+\CFA input matching for ©char©, ©char *©, and ©string© are similar.
+C-strings may only be read with a width field, which should match the string size.
+Certain input manipulators support a scanset, which is a simple regular expression from ©printf©.
+The \CFA manipulators for these types are ©wdi©,\footnote{Due to an overloading issue in the type-resolver, the input width name must be temporarily different from the output, \lstinline{wdi} versus \lstinline{wd}.} and its associated width control and ©left©, ©quote©, ©incl©, ©excl©, and ©getline©.
+\begin{cquote}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{C++}
+char ch, c[10];
+string s;
+sin | ch | wdi( 5, c ) | s;
+®abcde fg®
+sin | quote( ch ) | quote( wdi( sizeof(c), c ) ) | quote( s, '[', ']' ) | nl;
+®'a' "bcde" [fg]®
+sin | incl( "a-zA-Z0-9 ?!&\n", s ) | nl;
+®x?&000xyz TOM !.®
+sin | excl( "a-zA-Z0-9 ?!&\n", s );
+®<>{}{}STOP®
+\end{C++}
+&
+\begin{C++}
+'a' "bcde" "fg"
+'a' "bcde" "fg"
+"x?&000xyz TOM !"
+"<>{}{}"
+\end{C++}
+\end{tabular}
+\end{cquote}
+Note, the ability to read in quoted strings with whitespace to match with program string constants.
+The ©nl© at the end of an input ignores the rest of the line.
+\begin{comment}
 A substring is treated as a pointer into the base (substringed) string rather than creating a copy of the subtext.
 As with all pointers, if the item they are pointing at is changed, then the pointer is referring to the changed item.
 …
+}
 \end{cfa}
+There is an assignment form of substring in which only the starting position is specified and the length is assumed to be the remainder of the string.
+\begin{cfa}
+string operator () (int start);
+\end{cfa}
+For example:
+\begin{cfa}
+s = peter( 2 );                                         §\C{// s is assigned "ETER"}§
+peter( 2 ) = "IPER";                            §\C{// peter is assigned "PIPER"}§
+\end{cfa}
+It is also possible to substring using a string as the index for selecting the substring portion of the string.
+\begin{cfa}
+string operator () (const string &index);
+\end{cfa}
+For example:
+\begin{cfa}[mathescape=false]
+digit( "xyz$$$" ) = "678";                      §\C{// digit is assigned "0156789"}§
+digit( "234") = "***";                          §\C{// digit is assigned "0156789***"}§
+\end{cfa}
+\subsection{Searching}
+The ©index© operation
+\begin{cfa}
+int index( const string &key, int start = 1, occurrence occ = first );
+\end{cfa}
+returns the position of the first or last occurrence of the ©key© (depending on the occurrence indicator ©occ© that is either ©first© or ©last©) in the current string starting the search at position ©start©.
+If the ©key© does not appear in the current string, the length of the current string plus one is returned.
+%If the ©key© has zero length, the value 1 is returned regardless of what the current string contains.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+i = digit.index( "567" );                       §\C{// i is assigned 3}§
+i = digit.index( "567", 7 );            §\C{// i is assigned 11}§
+i = digit.index( "567", -1, last );     §\C{// i is assigned 3}§
+i = peter.index( "E", 5, last );        §\C{// i is assigned 4}§
+\end{cfa}
+The next two string operations test a string to see if it is or is not composed completely of a particular class of characters.
+For example, are the characters of a string all alphabetic or all numeric?
+Use of these operations involves a two step operation.
+First, it is necessary to create an instance of type ©strmask© and initialize it to a string containing the characters of the particular character class, as in:
+\begin{cfa}
+strmask digitmask = digit;
+strmask alphamask = string( "abcdefghijklmnopqrstuvwxyz" );
+\end{cfa}
+Second, the character mask is used in the functions ©include© and ©exclude© to check a string for compliance of its characters with the characters indicated by the mask.
+The ©include© operation
+\begin{cfa}
+int include( const strmask &, int = 1, occurrence occ = first );
+\end{cfa}
+returns the position of the first or last character (depending on the occurrence indicator, which is either ©first© or ©last©) in the current string that does not appear in the ©mask© starting the search at position ©start©;
+hence it skips over characters in the current string that are included (in) the ©mask©.
+The characters in the current string do not have to be in the same order as the ©mask©.
+If all the characters in the current string appear in the ©mask©, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+i = peter.include( digitmask );         §\C{// i is assigned 1}§
+i = peter.include( alphamask );         §\C{// i is assigned 6}§
+\end{cfa}
+The ©exclude© operation
+\begin{cfa}
+int exclude( string &mask, int start = 1, occurrence occ = first )
+\end{cfa}
+returns the position of the first or last character (depending on the occurrence indicator, which is either ©first© or ©last©) in the current string that does appear in the ©mask© string starting the search at position ©start©;
+hence it skips over characters in the current string that are excluded from (not in) in the ©mask© string.
+The characters in the current string do not have to be in the same order as the ©mask© string.
+If all the characters in the current string do NOT appear in the ©mask© string, the length of the current string plus one is returned, regardless of which occurrence is being searched for.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+i = peter.exclude( digitmask );         §\C{// i is assigned 6}§
+i = ifstmt.exclude( strmask( punctuation ) ); §\C{// i is assigned 4}§
+\end{cfa}
+The ©includeStr© operation:
+\begin{cfa}
+string includeStr( strmask &mask, int start = 1, occurrence occ = first )
+\end{cfa}
+returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) of the current string that ARE included in the ©mask© string starting the search at position ©start©.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+s = peter.includeStr( alphamask );      §\C{// s is assigned "PETER"}§
+s = ifstmt.includeStr( alphamask );     §\C{// s is assigned "IF"}§
+s = peter.includeStr( digitmask );      §\C{// s is assigned ""}§
+\end{cfa}
+The ©excludeStr© operation:
+\begin{cfa}
+string excludeStr( strmask &mask, int start = 1, occurrence = first )
+\end{cfa}
+returns the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) of the current string that are excluded (NOT) in the ©mask© string starting the search at position ©start©.
+A negative starting position is a specification from the right end of the string.
+\begin{cfa}
+s = peter.excludeStr( digitmask);       §\C{// s is assigned "PETER"}§
+s = ifstmt.excludeStr( strmask( punctuation ) ); §\C{// s is assigned "IF "}§
+s = peter.excludeStr( alphamask);       §\C{// s is assigned ""}§
+\end{cfa}
+\subsection{Miscellaneous}
+The ©trim© operation
+\begin{cfa}
+string trim( string &mask, occurrence occ = first )
+\end{cfa}
+returns a string in that is the longest substring of leading or trailing characters (depending on the occurrence indicator, which is either ©first© or ©last©) which ARE included in the ©mask© are removed.
+\begin{cfa}
+// remove leading blanks
+s = string( "   ABC" ).trim( " " );     §\C{// s is assigned "ABC",}§
+// remove trailing blanks
+s = string( "ABC   " ).trim( " ", last ); §\C{// s is assigned "ABC",}§
+\end{cfa}
+The ©translate© operation
+\begin{cfa}
+string translate( string &from, string &to )
+\end{cfa}
+returns a string that is the same length as the original string in which all occurrences of the characters that appear in the ©from© string have been translated into their corresponding character in the ©to© string.
+Translation is done on a character by character basis between the ©from© and ©to© strings; hence these two strings must be the same length.
+If a character in the original string does not appear in the ©from© string, then it simply appears as is in the resulting string.
+\begin{cfa}
+// upper to lower case
+peter = peter.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
+                        // peter is assigned "peter"
+s = ifstmt.translate( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" );
+                        // ifstmt is assigned "if (a > b) {"
+// lower to upper case
+peter = peter.translate( "abcdefghijklmnopqrstuvwxyz", "ABCDEFGHIJKLMNOPQRSTUVWXYZ" );
+                        // peter is assigned "PETER"
+\end{cfa}
+The ©replace© operation
+\begin{cfa}
+string replace( string &from, string &to )
+\end{cfa}
+returns a string in which all occurrences of the ©from© string in the current string have been replaced by the ©to© string.
+\begin{cfa}
+s = peter.replace( "E", "XX" );         §\C{// s is assigned "PXXTXXR"}§
+\end{cfa}
+The replacement is done left-to-right.
+When an instance of the ©from© string is found and changed to the ©to© string, it is NOT examined again for further replacement.
+\subsection{Returning N+1 on Failure}
+Any of the string search routines can fail at some point during the search.
+When this happens it is necessary to return indicating the failure.
+Many string types in other languages use some special value to indicate the failure.
+This value is often 0 or -1 (PL/I returns 0).
+This section argues that a value of N+1, where N is the length of the base string in the search, is a more useful value to return.
+The index-of function in APL returns N+1.
+These are the boundary situations and are often overlooked when designing a string type.
+The situation that can be optimized by returning N+1 is when a search is performed to find the starting location for a substring operation.
+For example, in a program that is extracting words from a text file, it is necessary to scan from left to right over whitespace until the first alphabetic character is found.
+\begin{cfa}
+line = line( line.exclude( alpha ) );
+\end{cfa}
+If a text line contains all whitespaces, the exclude operation fails to find an alphabetic character.
+If ©exclude© returns 0 or -1, the result of the substring operation is unclear.
+Most string types generate an error, or clip the starting value to 1, resulting in the entire whitespace string being selected.
+If ©exclude© returns N+1, the starting position for the substring operation is beyond the end of the string leaving a null string.
+The same situation occurs when scanning off a word.
+\begin{cfa}
+start = line.include(alpha);
+word = line(1, start - 1);
+\end{cfa}
+If the entire line is composed of a word, the include operation will  fail to find a non-alphabetic character.
+In general, returning 0 or -1 is not an appropriate starting position for the substring, which must substring off the word leaving a null string.
+However, returning N+1 will substring off the word leaving a null string.
+\subsection{C Compatibility}
+To ease conversion from C to \CFA, there are companion ©string© routines for C strings.
+\VRef[Table]{t:CompanionStringRoutines} shows the C routines on the left that also work with ©string© and the rough equivalent ©string© opeation of the right.
+Hence, it is possible to directly convert a block of C string operations into @string@ just by changing the
+\begin{table}
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{1}{c|}{©char []©}  & \multicolumn{1}{c}{©string©}  \\
+\hline
+©strcpy©, ©strncpy©             & ©=©                                                                   \\
+©strcat©, ©strncat©             & ©+©                                                                   \\
+©strcmp©, ©strncmp©             & ©==©, ©!=©, ©<©, ©<=©, ©>©, ©>=©              \\
+©strlen©                                & ©size©                                                                \\
+©[]©                                    & ©[]©                                                                  \\
+©strstr©                                & ©find©                                                                \\
+©strcspn©                               & ©find_first_of©, ©find_last_of©               \\
+©strspc©                                & ©find_fist_not_of©, ©find_last_not_of©
+\end{tabular}
+\end{cquote}
+\caption{Companion Routines for \CFA \lstinline{string} to C Strings}
+\label{t:CompanionStringRoutines}
+\end{table}
+For example, this block of C code can be converted to \CFA by simply changing the type of variable ©s© from ©char []© to ©string©.
+\begin{cfa}
+        char s[32];
+        //string s;
+        strcpy( s, "abc" );                             PRINT( %s, s );
+        strncpy( s, "abcdef", 3 );              PRINT( %s, s );
+        strcat( s, "xyz" );                             PRINT( %s, s );
+        strncat( s, "uvwxyz", 3 );              PRINT( %s, s );
+        PRINT( %zd, strlen( s ) );
+        PRINT( %c, s[3] );
+        PRINT( %s, strstr( s, "yzu" ) ) ;
+        PRINT( %s, strstr( s, 'y' ) ) ;
+\end{cfa}
+However, the conversion fails with I/O because ©printf© cannot print a ©string© using format code ©%s© because \CFA strings are not null terminated.
+\subsection{Input/Output Operators}
+Both the \CC operators ©<<© and ©>>© are defined on type ©string©.
+However, input of a string value is different from input of a ©char *© value.
+When a string value is read, \emph{all} input characters from the current point in the input stream to either the end of line (©'\n'©) or the end of file are read.
+\end{comment}
 …
 allowable calls are:
 \begin{cquote}
-\setlength{\tabcolsep}{0.75in}
 \begin{tabular}{@{}ll@{}}
 \textbf{positional arguments} & \textbf{empty arguments} \\

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 829a955 for doc/user

Legend:

doc/user/user.tex

Download in other formats: