Context Navigation

-              r8317671
+              r780727f
 The \CFA string type is for manipulation of dynamically-sized character-strings versus C @char *@ type for manipulation of statically-sized null-terminated character-strings.
+Hence, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
+As a result, a @string@ declaration does not specify a maximum length, where a C string must.
+Therefore, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time.
+As a result, a @string@ declaration does not specify a maximum length, where a C string array does.
+For \CFA, as a @string@ dynamically grows and shrinks in size, so does its underlying storage.
+For C, as a string dynamically grows and shrinks in size, but its underlying storage does not.
 The maximum storage for a \CFA @string@ value is @size_t@ characters, which is $2^{32}$ or $2^{64}$ respectively.
 A \CFA string manages its length separately from the string, so there is no null (@'\0'@) terminating value at the end of a string value.
 …
 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ (as in Java).
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|ll|l@{}}
 \begin{cfa}
 string s;
+string s = 5;
 s = 'x';
 s = "abc";
 s = cs;
 s = 45hh;
 s = 45h;
 \end{cfa}
+&
 \begin{cfa}
+s = 42hh;               /* signed char */
+s = 42h;                /* short int */
+s = 0xff;
+\end{cfa}
+&
+\begin{cfa}
+"5"
 "x"
 "abc"
 "abc"
 "45"
 "45"
 \end{cfa}
+&
 \begin{cfa}
         s = (ssize_t)MIN;
         s = (size_t)MAX;
         s = 5.5;
         s = 5.5L;
         s = 5.5+3.4i;
         s = 5.5L+3.4Li;
+"42"
+"42"
+"255"
+\end{cfa}
+&
+\begin{cfa}
+s = (ssize_t)MIN;
+s = (size_t)MAX;
+s = 5.5;
+s = 5.5L;
+s = 5.5+3.4i;
+s = 5.5L+3.4Li;
 \end{cfa}
+&
 …
 Conversions can be explicitly specified using a compound literal.
 \begin{cfa}
 s = (string){ "abc" };                          $\C{// converts char * to string}$
+s = (string){ 5 };                                      $\C{// converts int to string}$
+s = (string){ 5.5 };                            $\C{// converts double to string}$
+\end{cfa}
+Conversions from @string@ to @char *@ attempt to be safe:
 either by requiring the maximum length of the @char *@ storage (@strncpy@) or allocating the @char *@ storage for the string characters (ownership), meaning the programmer must free the storage.
+Note, a C string is always null terminated, implying a minimum size of 1 character.
 \begin{cquote}
 \setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+s = (string){ 5 };    s = (string){ "abc" };   s = (string){ 5.5 };
+\end{cfa}
+Conversions from @string@ to @char *@ attempt to be safe.
+The @strncpy@ conversion requires the maximum length for the pointer's target buffer.
+The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it.
+Note, a C string is always null terminated, implying storage is always necessary for the null.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+string s = "abcde";
+char cs[4];
 strncpy( cs, s, sizeof(cs) );
 char * cp = s;
+char * cp = s;          // ownership
 delete( cp );
 cp = s + ' ' + s;
+cp = s + ' ' + s;       // ownership
 delete( cp );
 \end{cfa}
+&
 \begin{cfa}
 "abc\0", in place
 "abcde\0", malloc
+ownership
 "abcde abcde\0", malloc
+ownership
 \end{cfa}
 \end{tabular}
 …
 For compatibility, @strlen@ also works with \CFA strings.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 \subsection{Comparison Operators}
 The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare \CFA string values using lexicographical ordering, where longer strings are greater than shorter strings.
+The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare \CFA strings using lexicographical ordering, where longer strings are greater than shorter strings.
 In C, these operators compare the C string pointer not its value, which does not match programmer expectation.
 C strings use function @strcmp@ to lexicographically compare the string value.
 …
 The binary operators @+@ and @+=@ concatenate C @char@, @char *@ and \CFA strings, creating the sum of the characters.
 \par\noindent
+\begin{cquote}
 \begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}}
 \begin{cfa}
 …
 \end{cfa}
 \end{tabular}
 \par\noindent
+\end{cquote}
 However, including @<string.hfa>@ can result in ambiguous uses of the overloaded @+@ operator.\footnote{Combining multiple packages in any programming language can result in name clashes or ambiguities.}
+While subtracting characters or pointers has a low-level use-case
 \begin{cfa}
 ch - '0'    $\C[2in]{// find character offset}$
 cs - cs2;  $\C{// find pointer offset}\CRT$
+For example, subtracting characters or pointers has valid use-cases:
+\begin{cfa}
+ch - '0'        $\C[2in]{// find character offset}$
+cs - cs2;       $\C{// find pointer offset}\CRT$
 \end{cfa}
 addition is less obvious
 \begin{cfa}
 ch + 'b'    $\C[2in]{// add character values}$
 cs + 'a';  $\C{// move pointer cs['a']}\CRT$
+ch + 'b'        $\C[2in]{// add character values}$
+cs + 'a';       $\C{// move pointer cs['a']}\CRT$
 \end{cfa}
 There are legitimate use cases for arithmetic with @signed@/@unsigned@ characters (bytes), and these types are treated differently from @char@ in \CC and \CFA.
 …
 Similarly, it is impossible to restrict or remove addition on type @char *@ because (unfortunately) it is subscripting: @cs + 'a'@ implies @cs['a']@ or @'a'[cs]@.
 The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly.
+The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ constants work correctly (variables are the same).
 The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs.
 Hence, the type system correctly handles all uses of addition (explicit or implicit) for @char *@.
 …
 Only @char@ addition can result in ambiguities, and only when there is no left-hand information.
 \begin{cfa}
 ch = ch + 'b'; $\C[2in]{// LHS disambiguate, add character values}$
 s = 'a' + 'b'; $\C{// LHS disambiguate, concatenate characters}$
+ch = ch + 'b';          $\C[2in]{// LHS disambiguate, add character values}$
+s = 'a' + 'b';          $\C{// LHS disambiguate, concatenate characters}$
 printf( "%c\n", @'a' + 'b'@ ); $\C{// no LHS information, ambiguous}$
 printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast}\CRT$
 …
 The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion).
 Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator @+@ for @string@ types is not a problem.
 Note, other programming languages that repurpose @+@ for concatenation, could have similar ambiguity issues.
+Note, other programming languages that repurpose @+@ for concatenation, can have similar ambiguity issues.
 Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution.
 …
 If $N = 0$, a zero length string, @""@, is returned.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 s = 'x' * 3;
 s = "abc" * 3;
 s = (name + ' ') * 3;
 \end{cfa}
+&
 \begin{cfa}
+"
+s = ("MIKE" + ' ') * 3;
+\end{cfa}
+&
+\begin{cfa}
+""
 "xxx"
 "abcabcabc"
 …
 \end{cquote}
 Like concatenation, there is a potential ambiguity with multiplication of characters;
 multiplication for pointers does not exist in C.
 \begin{cfa}
 ch = ch * 3; $\C[2in]{// LHS disambiguate, multiply character values}$
 s = 'a' * 3; $\C{// LHS disambiguate, concatenate characters}$
+multiplication of pointers does not exist in C.
+\begin{cfa}
+ch = ch * 3;            $\C[2in]{// LHS disambiguate, multiply character values}$
+s = 'a' * 3;            $\C{// LHS disambiguate, concatenate characters}$
 printf( "%c\n", @'a' * 3@ ); $\C{// no LHS information, ambiguous}$
 printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast}\CRT$
 …
 \subsection{Substring}
+The substring operation returns a subset of a string starting at a position in the string and traversing a length or matching a pattern string.
+The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string.
 \begin{cquote}
 \setlength{\tabcolsep}{10pt}
 \begin{tabular}{@{}l|ll|l@{}}
+\multicolumn{2}{c}{\textbf{length}} & \multicolumn{2}{c}{\textbf{pattern}} \\
+\begin{cfa}
+s = name( 2, 2 );
+s = name( 3, -2 );
+s = name( 2, 8 );
+s = name( 0, -1 );
+s = name( -1, -1 );
+\multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\
+\multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\
+\begin{cfa}
+s = name( 0, 4 );
+s = name( 1, 4 );
+s = name( 2, 4 );
+s = name( 4, -2 );
+s = name( 8, 2 );
+s = name( 0, -2 );
+s = name( -1, -2 );
 s = name( -3 );
 \end{cfa}
+&
 \begin{cfa}
+"KE"
+"IK"
+"KE", clip length to 2
+"", beyond string clip to null
+"K"
+"IKE", to end of string
+\end{cfa}
+&
+\begin{cfa}
+s = name( "IK" );
+"PETE"
+"ETER"
+"TER"   // clip length to 3
+"ER"
+""                 // beyond string to right, clip to null
+""                 // beyond string to left, clip to null
+"ER"
+"TER"   // to end of string
+\end{cfa}
+&
+\begin{cfa}
+s = name( "ET" );
 s = name( "WW" );
 …
+\end{cfa}
+&
+\begin{cfa}
+"IK"
+""
+\end{cfa}
+\end{tabular}
+\end{cquote}
+A negative starting position is a specification from the right end of the string.
+\end{cfa}
+&
+\begin{cfa}
+"ET"
+""  // does not occur
+\end{cfa}
+\end{tabular}
+\end{cquote}
+For the length form, a negative starting position is a specification from the right end of the string.
 A negative length means that characters are selected in the opposite (right to left) direction from the starting position.
 If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string.
 If the substring request is completely outside of the original string, a null string is returned.
 The pattern-form either returns the pattern string is the pattern matches or a null string if the pattern does not match.
+For the pattern-form, it returns the pattern string if the pattern matches or a null string if the pattern does not match.
 The usefulness of this mechanism is discussed next.
 …
 Hence, the left string may decrease, stay the same, or increase in length.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}[escapechar={}]
 …
 \end{tabular}
 \end{cquote}
+Now pattern matching is useful on the left-hand side of assignment.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+Now substring pattern matching is useful on the left-hand side of assignment.
+\begin{cquote}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}[escapechar={}]
 …
 Extending the pattern to a regular expression is a possible extension.
+The replace operation extensions substring to substitute all occurrences.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+The replace operation extends substring to substitute all occurrences.
+\begin{cquote}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 \subsection{Searching}
 The find operation returns the position of the first occurrence of a key in a string.
+The @find@ operation returns the position of the first occurrence of a key in a string.
 If the key does not appear in the string, the length of the string is returned.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance).
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 Both forms can return the longest substring of compliant characters.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 There are also versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.}
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 The translate operation returns a string with each character transformed by one of the C character transformation functions.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 \begin{figure}
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\
 …
 Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}ll@{}}
 \begin{cfa}
 …
 The \CC manipulators are @setw@, and its associated width controls @left@, @right@ and @setfill@.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{c++}
 …
 The \CFA manipulators are @bin@, @oct@, @hex@, @wd@, and its associated width control and @left@.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 Reading into a @char@ is safe as the size is 1, @char *@ is unsafe without using @setw@ to constraint the length (which includes @'\0'@), @string@ is safe as its grows dynamically as characters are read.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{c++}
 …
 \CC modifies the mutable receiver object, replacing by position (zero origin) and length.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{c++}
 …
 \label{p:JavaReplace}
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{java}
 …
 Java also provides a mutable @StringBuffer@, replacing by position (zero origin) and length.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{java}
 …
 The common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present \CFA lowering.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}l|l@{}}
 \begin{cfa}
 …
 Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases need revisions.
 \begin{cquote}
-\setlength{\tabcolsep}{15pt}
 \begin{tabular}{ll}
 HL & LL \\

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 780727f

Legend:

doc/theses/mike_brooks_MMath/string.tex

Download in other formats: