Changeset d9aee90 for doc/theses/mike_brooks_MMath
- Timestamp:
- Apr 14, 2025, 7:49:46 AM (5 months ago)
- Branches:
- master
- Children:
- 68c7062
- Parents:
- b0296dba
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
rb0296dba rd9aee90 3 3 \vspace*{-20pt} 4 4 This chapter presents my work on designing and building a modern string type in \CFA. 5 The discussion starts with an overview of string API, then a number of interesting string problems, followed by how these issues are resolved in this work.5 The discussion starts with an overview of the string API, then a number of interesting string problems, followed by how these issues are resolved in this work. 6 6 7 7 … … 83 83 84 84 The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O. 85 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ .85 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ (as in Java). 86 86 \begin{cquote} 87 87 \setlength{\tabcolsep}{15pt} … … 189 189 In C, these operators compare the C string pointer not its value, which does not match programmer expectation. 190 190 C strings use function @strcmp@ to lexicographically compare the string value. 191 Java has the same issue with @==@ and @.equals@. 191 192 192 193 … … 256 257 cs + 'a'; $\C{// move pointer cs['a']}\CRT$ 257 258 \end{cfa} 258 There is a legitimate use case for arithmetic with @signed@/@unsigned@ characters (bytes), butthese types are treated differently from @char@ in \CC and \CFA.259 However, backwards compatibility makes i simpossible to restrict or remove addition on type @char@.259 There are legitimate use cases for arithmetic with @signed@/@unsigned@ characters (bytes), and these types are treated differently from @char@ in \CC and \CFA. 260 However, backwards compatibility makes it impossible to restrict or remove addition on type @char@. 260 261 Similarly, it is impossible to restrict or remove addition on type @char *@ because (unfortunately) it is subscripting: @cs + 'a'@ implies @cs['a']@ or @'a'[cs]@. 261 262 262 Fortunately, the priorconcatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly.263 The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly. 263 264 The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs. 264 265 Hence, the type system correctly handles all uses of addition (explicit or implicit) for @char *@. … … 275 276 The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion). 276 277 Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator @+@ for @string@ types is not a problem. 277 Note, other programming languages that repurpose @+@ for concatenation, could have a similar ambiguity issue.278 Note, other programming languages that repurpose @+@ for concatenation, could have similar ambiguity issues. 278 279 279 280 Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution. … … 454 455 \end{cquote} 455 456 456 A character-class operation indicate if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.457 A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc. 457 458 \begin{cquote} 458 459 \setlength{\tabcolsep}{15pt} … … 509 510 \end{cquote} 510 511 511 There are versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class routines.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{char}, which affects the function type.}512 There are also versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class routines.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.} 512 513 \begin{cquote} 513 514 \setlength{\tabcolsep}{15pt} … … 526 527 \end{tabular} 527 528 \end{cquote} 528 These operations perform an apply of the validation function to each character, and it returns a boolean indicating a stopping condition.529 These operations perform an \emph{apply} of the validation function to each character, where the function returns a boolean indicating a stopping condition for the search. 529 530 The position of the last character is returned if the string is compliant or the position of the first non-compliant character. 530 531 … … 562 563 \end{cfa} 563 564 564 \CFA a lso adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin).565 \CFA adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin). 565 566 This semantics allows many search and substring functions to be written without conditions, \eg: 566 567 \begin{cfa} … … 574 575 \VRef[Figure]{f:ExtractingWordsText} compares \CC and \CFA string code for extracting words from a line of text, repeatedly removing non-word text and then a word until the line is empty. 575 576 The \CFA code is simpler solely because of the choice for indicating search failure. 576 ( It is possible to simplify the \CC version by concatenatinga sentinel character at the end of the line so the call to @find_first_not_of@ does not fail.)577 (A simplification of the \CC version is to concatenate a sentinel character at the end of the line so the call to @find_first_not_of@ does not fail.) 577 578 578 579 \begin{figure} … … 623 624 To ease conversion from C to \CFA, \CFA provides companion C @string@ functions. 624 625 Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@. 626 \begin{cquote} 627 \setlength{\tabcolsep}{15pt} 628 \begin{tabular}{@{}ll@{}} 625 629 \begin{cfa} 626 630 char s[32]; // string s; … … 629 633 strcmp( s, "abc" ); 630 634 strncmp( s, "abc", 3 ); 635 \end{cfa} 636 & 637 \begin{cfa} 638 631 639 strcpy( s, "abc" ); 632 640 strncpy( s, "abcdef", 3 ); … … 634 642 strncat( s, "uvwxyz", 3 ); 635 643 \end{cfa} 644 \end{tabular} 645 \end{cquote} 636 646 However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated. 637 647 Nevertheless, this capability does provide a useful starting point for conversion to safer \CFA strings. … … 728 738 @abcde fg@ 729 739 sin | quote( ch ) | quote( wdi( sizeof(c), c ) ) | quote( s, '[', ']' ) | nl; 730 @ $'a' "bcde" [fg]$@740 @'a' "bcde" [fg]@ 731 741 sin | incl( "a-zA-Z0-9 ?!&\n", s ) | nl; 732 742 @x?&000xyz TOM !.@ … … 749 759 \end{tabular} 750 760 \end{cquote} 751 Note, the ability to read in quoted strings to match with program string s.761 Note, the ability to read in quoted strings to match with program string constants. 752 762 The @nl@ at the end of an input ignores the rest of the line. 753 763 … … 807 817 As well, the operations are asymmetric, \eg @String@ has @replace@ by text but not replace by position and vice versa for @StringBuffer@. 808 818 809 More significant operational differences relate to storage management, often appearing through assignment (@target = source@), and are summarized in \VRef[Figure]{f:StrSemanticCompare}, defining properties:type abstraction, state, symmetry, and referent.819 More significant operational differences relate to storage management, often appearing through assignment (@target = source@), and are summarized in \VRef[Figure]{f:StrSemanticCompare}, which defines properties type abstraction, state, symmetry, and referent. 810 820 The following discussion justifies the figure's yes/no entries per language. 811 821
Note:
See TracChangeset
for help on using the changeset viewer.