Ignore:
Timestamp:
Apr 14, 2025, 7:49:46 AM (5 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
68c7062
Parents:
b0296dba
Message:

small proofreading changes for string chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/string.tex

    rb0296dba rd9aee90  
    33\vspace*{-20pt}
    44This chapter presents my work on designing and building a modern string type in \CFA.
    5 The discussion starts with an overview of string API, then a number of interesting string problems, followed by how these issues are resolved in this work.
     5The discussion starts with an overview of the string API, then a number of interesting string problems, followed by how these issues are resolved in this work.
    66
    77
     
    8383
    8484The ability to convert from internal (machine) to external (human) format is useful in situations other than I/O.
    85 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@.
     85Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ (as in Java).
    8686\begin{cquote}
    8787\setlength{\tabcolsep}{15pt}
     
    189189In C, these operators compare the C string pointer not its value, which does not match programmer expectation.
    190190C strings use function @strcmp@ to lexicographically compare the string value.
     191Java has the same issue with @==@ and @.equals@.
    191192
    192193
     
    256257cs + 'a';  $\C{// move pointer cs['a']}\CRT$
    257258\end{cfa}
    258 There is a legitimate use case for arithmetic with @signed@/@unsigned@ characters (bytes), but these types are treated differently from @char@ in \CC and \CFA.
    259 However, backwards compatibility makes is impossible to restrict or remove addition on type @char@.
     259There are legitimate use cases for arithmetic with @signed@/@unsigned@ characters (bytes), and these types are treated differently from @char@ in \CC and \CFA.
     260However, backwards compatibility makes it impossible to restrict or remove addition on type @char@.
    260261Similarly, it is impossible to restrict or remove addition on type @char *@ because (unfortunately) it is subscripting: @cs + 'a'@ implies @cs['a']@ or @'a'[cs]@.
    261262
    262 Fortunately, the prior concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly.
     263The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly.
    263264The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs.
    264265Hence, the type system correctly handles all uses of addition (explicit or implicit) for @char *@.
     
    275276The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion).
    276277Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator @+@ for @string@ types is not a problem.
    277 Note, other programming languages that repurpose @+@ for concatenation, could have a similar ambiguity issue.
     278Note, other programming languages that repurpose @+@ for concatenation, could have similar ambiguity issues.
    278279
    279280Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution.
     
    454455\end{cquote}
    455456
    456 A character-class operation indicate if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
     457A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
    457458\begin{cquote}
    458459\setlength{\tabcolsep}{15pt}
     
    509510\end{cquote}
    510511
    511 There are versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class routines.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{char}, which affects the function type.}
     512There are also versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class routines.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.}
    512513\begin{cquote}
    513514\setlength{\tabcolsep}{15pt}
     
    526527\end{tabular}
    527528\end{cquote}
    528 These operations perform an apply of the validation function to each character, and it returns a boolean indicating a stopping condition.
     529These operations perform an \emph{apply} of the validation function to each character, where the function returns a boolean indicating a stopping condition for the search.
    529530The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
    530531
     
    562563\end{cfa}
    563564
    564 \CFA also adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin).
     565\CFA adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin).
    565566This semantics allows many search and substring functions to be written without conditions, \eg:
    566567\begin{cfa}
     
    574575\VRef[Figure]{f:ExtractingWordsText} compares \CC and \CFA string code for extracting words from a line of text, repeatedly removing non-word text and then a word until the line is empty.
    575576The \CFA code is simpler solely because of the choice for indicating search failure.
    576 (It is possible to simplify the \CC version by concatenating a sentinel character at the end of the line so the call to @find_first_not_of@ does not fail.)
     577(A simplification of the \CC version is to concatenate a sentinel character at the end of the line so the call to @find_first_not_of@ does not fail.)
    577578
    578579\begin{figure}
     
    623624To ease conversion from C to \CFA, \CFA provides companion C @string@ functions.
    624625Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@.
     626\begin{cquote}
     627\setlength{\tabcolsep}{15pt}
     628\begin{tabular}{@{}ll@{}}
    625629\begin{cfa}
    626630char s[32];   // string s;
     
    629633strcmp( s, "abc" );
    630634strncmp( s, "abc", 3 );
     635\end{cfa}
     636&
     637\begin{cfa}
     638
    631639strcpy( s, "abc" );
    632640strncpy( s, "abcdef", 3 );
     
    634642strncat( s, "uvwxyz", 3 );
    635643\end{cfa}
     644\end{tabular}
     645\end{cquote}
    636646However, the conversion fails with I/O because @printf@ cannot print a @string@ using format code @%s@ because \CFA strings are not null terminated.
    637647Nevertheless, this capability does provide a useful starting point for conversion to safer \CFA strings.
     
    728738@abcde fg@
    729739sin | quote( ch ) | quote( wdi( sizeof(c), c ) ) | quote( s, '[', ']' ) | nl;
    730 @$'a' "bcde" [fg]$@
     740@'a' "bcde" [fg]@
    731741sin | incl( "a-zA-Z0-9 ?!&\n", s ) | nl;
    732742@x?&000xyz TOM !.@
     
    749759\end{tabular}
    750760\end{cquote}
    751 Note, the ability to read in quoted strings to match with program strings.
     761Note, the ability to read in quoted strings to match with program string constants.
    752762The @nl@ at the end of an input ignores the rest of the line.
    753763
     
    807817As well, the operations are asymmetric, \eg @String@ has @replace@ by text but not replace by position and vice versa for @StringBuffer@.
    808818
    809 More significant operational differences relate to storage management, often appearing through assignment (@target = source@), and are summarized in \VRef[Figure]{f:StrSemanticCompare}, defining properties: type abstraction, state, symmetry, and referent.
     819More significant operational differences relate to storage management, often appearing through assignment (@target = source@), and are summarized in \VRef[Figure]{f:StrSemanticCompare}, which defines properties type abstraction, state, symmetry, and referent.
    810820The following discussion justifies the figure's yes/no entries per language.
    811821
Note: See TracChangeset for help on using the changeset viewer.