Changeset 780727f
- Timestamp:
- Sep 15, 2025, 9:20:53 PM (7 weeks ago)
- Branches:
- master
- Children:
- 4b465445
- Parents:
- 8317671
- File:
-
- 1 edited
-
doc/theses/mike_brooks_MMath/string.tex (modified) (34 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
r8317671 r780727f 57 57 58 58 The \CFA string type is for manipulation of dynamically-sized character-strings versus C @char *@ type for manipulation of statically-sized null-terminated character-strings. 59 Hence, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time. 60 As a result, a @string@ declaration does not specify a maximum length, where a C string must. 59 Therefore, the amount of storage for a \CFA string changes dynamically at runtime to fit the string size, whereas the amount of storage for a C string is fixed at compile time. 60 As a result, a @string@ declaration does not specify a maximum length, where a C string array does. 61 For \CFA, as a @string@ dynamically grows and shrinks in size, so does its underlying storage. 62 For C, as a string dynamically grows and shrinks in size, but its underlying storage does not. 61 63 The maximum storage for a \CFA @string@ value is @size_t@ characters, which is $2^{32}$ or $2^{64}$ respectively. 62 64 A \CFA string manages its length separately from the string, so there is no null (@'\0'@) terminating value at the end of a string value. … … 86 88 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ (as in Java). 87 89 \begin{cquote} 88 \setlength{\tabcolsep}{15pt}89 90 \begin{tabular}{@{}l|ll|l@{}} 90 91 \begin{cfa} 91 string s ;92 string s = 5; 92 93 s = 'x'; 93 94 s = "abc"; 94 s = cs;95 s = 4 5hh;96 s = 45h;97 \end{cfa} 98 & 99 \begin{cfa} 100 95 s = 42hh; /* signed char */ 96 s = 42h; /* short int */ 97 s = 0xff; 98 \end{cfa} 99 & 100 \begin{cfa} 101 "5" 101 102 "x" 102 103 "abc" 103 " abc"104 "4 5"105 " 45"106 \end{cfa} 107 & 108 \begin{cfa} 109 s = (ssize_t)MIN;110 s = (size_t)MAX;111 s = 5.5;112 s = 5.5L;113 s = 5.5+3.4i;114 s = 5.5L+3.4Li;104 "42" 105 "42" 106 "255" 107 \end{cfa} 108 & 109 \begin{cfa} 110 s = (ssize_t)MIN; 111 s = (size_t)MAX; 112 s = 5.5; 113 s = 5.5L; 114 s = 5.5+3.4i; 115 s = 5.5L+3.4Li; 115 116 \end{cfa} 116 117 & … … 127 128 Conversions can be explicitly specified using a compound literal. 128 129 \begin{cfa} 129 s = (string){ "abc" }; $\C{// converts char * to string}$130 s = (string){ 5 }; $\C{// converts int to string}$ 131 s = (string){ 5.5 }; $\C{// converts double to string}$ 132 \end{cfa} 133 134 Conversions from @string@ to @char *@ attempt to be safe: 135 either by requiring the maximum length of the @char *@ storage (@strncpy@) or allocating the @char *@ storage for the string characters (ownership), meaning the programmer must free the storage.136 Note, a C string is always null terminated, implying a minimum size of 1 character. 137 \begin{ cquote}138 \ setlength{\tabcolsep}{15pt}139 \begin{tabular}{@{}l|l@{}} 140 \begin{cfa} 130 s = (string){ 5 }; s = (string){ "abc" }; s = (string){ 5.5 }; 131 \end{cfa} 132 133 Conversions from @string@ to @char *@ attempt to be safe. 134 The @strncpy@ conversion requires the maximum length for the pointer's target buffer. 135 The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it. 136 Note, a C string is always null terminated, implying storage is always necessary for the null. 137 \begin{cquote} 138 \begin{tabular}{@{}l|l@{}} 139 \begin{cfa} 140 string s = "abcde"; 141 char cs[4]; 141 142 strncpy( cs, s, sizeof(cs) ); 142 char * cp = s; 143 char * cp = s; // ownership 143 144 delete( cp ); 144 cp = s + ' ' + s; 145 cp = s + ' ' + s; // ownership 145 146 delete( cp ); 146 147 \end{cfa} 147 148 & 148 149 \begin{cfa} 150 151 149 152 "abc\0", in place 150 153 "abcde\0", malloc 151 ownership 154 152 155 "abcde abcde\0", malloc 153 ownership 156 154 157 \end{cfa} 155 158 \end{tabular} … … 162 165 For compatibility, @strlen@ also works with \CFA strings. 163 166 \begin{cquote} 164 \setlength{\tabcolsep}{15pt}165 167 \begin{tabular}{@{}l|l@{}} 166 168 \begin{cfa} … … 187 189 \subsection{Comparison Operators} 188 190 189 The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare \CFA string values using lexicographical ordering, where longer strings are greater than shorter strings.191 The binary relational, @<@, @<=@, @>@, @>=@, and equality, @==@, @!=@, operators compare \CFA strings using lexicographical ordering, where longer strings are greater than shorter strings. 190 192 In C, these operators compare the C string pointer not its value, which does not match programmer expectation. 191 193 C strings use function @strcmp@ to lexicographically compare the string value. … … 196 198 197 199 The binary operators @+@ and @+=@ concatenate C @char@, @char *@ and \CFA strings, creating the sum of the characters. 198 \ par\noindent200 \begin{cquote} 199 201 \begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}} 200 202 \begin{cfa} … … 246 248 \end{cfa} 247 249 \end{tabular} 248 \ par\noindent250 \end{cquote} 249 251 However, including @<string.hfa>@ can result in ambiguous uses of the overloaded @+@ operator.\footnote{Combining multiple packages in any programming language can result in name clashes or ambiguities.} 250 While subtracting characters or pointers has a low-level use-case 251 \begin{cfa} 252 ch - '0' $\C[2in]{// find character offset}$253 cs - cs2; $\C{// find pointer offset}\CRT$252 For example, subtracting characters or pointers has valid use-cases: 253 \begin{cfa} 254 ch - '0' $\C[2in]{// find character offset}$ 255 cs - cs2; $\C{// find pointer offset}\CRT$ 254 256 \end{cfa} 255 257 addition is less obvious 256 258 \begin{cfa} 257 ch + 'b' $\C[2in]{// add character values}$258 cs + 'a'; $\C{// move pointer cs['a']}\CRT$259 ch + 'b' $\C[2in]{// add character values}$ 260 cs + 'a'; $\C{// move pointer cs['a']}\CRT$ 259 261 \end{cfa} 260 262 There are legitimate use cases for arithmetic with @signed@/@unsigned@ characters (bytes), and these types are treated differently from @char@ in \CC and \CFA. … … 262 264 Similarly, it is impossible to restrict or remove addition on type @char *@ because (unfortunately) it is subscripting: @cs + 'a'@ implies @cs['a']@ or @'a'[cs]@. 263 265 264 The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ (variables are the same as constants) work correctly.266 The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ constants work correctly (variables are the same). 265 267 The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs. 266 268 Hence, the type system correctly handles all uses of addition (explicit or implicit) for @char *@. … … 270 272 Only @char@ addition can result in ambiguities, and only when there is no left-hand information. 271 273 \begin{cfa} 272 ch = ch + 'b'; $\C[2in]{// LHS disambiguate, add character values}$273 s = 'a' + 'b'; $\C{// LHS disambiguate, concatenate characters}$274 ch = ch + 'b'; $\C[2in]{// LHS disambiguate, add character values}$ 275 s = 'a' + 'b'; $\C{// LHS disambiguate, concatenate characters}$ 274 276 printf( "%c\n", @'a' + 'b'@ ); $\C{// no LHS information, ambiguous}$ 275 277 printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast}\CRT$ … … 277 279 The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion). 278 280 Fortunately, character addition without LHS information is rare in C/\CFA programs, so repurposing the operator @+@ for @string@ types is not a problem. 279 Note, other programming languages that repurpose @+@ for concatenation, c ouldhave similar ambiguity issues.281 Note, other programming languages that repurpose @+@ for concatenation, can have similar ambiguity issues. 280 282 281 283 Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution. … … 297 299 If $N = 0$, a zero length string, @""@, is returned. 298 300 \begin{cquote} 299 \setlength{\tabcolsep}{15pt}300 301 \begin{tabular}{@{}l|l@{}} 301 302 \begin{cfa} … … 303 304 s = 'x' * 3; 304 305 s = "abc" * 3; 305 s = ( name+ ' ') * 3;306 \end{cfa} 307 & 308 \begin{cfa} 309 " 306 s = ("MIKE" + ' ') * 3; 307 \end{cfa} 308 & 309 \begin{cfa} 310 "" 310 311 "xxx" 311 312 "abcabcabc" … … 315 316 \end{cquote} 316 317 Like concatenation, there is a potential ambiguity with multiplication of characters; 317 multiplication forpointers does not exist in C.318 \begin{cfa} 319 ch = ch * 3; $\C[2in]{// LHS disambiguate, multiply character values}$320 s = 'a' * 3; $\C{// LHS disambiguate, concatenate characters}$318 multiplication of pointers does not exist in C. 319 \begin{cfa} 320 ch = ch * 3; $\C[2in]{// LHS disambiguate, multiply character values}$ 321 s = 'a' * 3; $\C{// LHS disambiguate, concatenate characters}$ 321 322 printf( "%c\n", @'a' * 3@ ); $\C{// no LHS information, ambiguous}$ 322 323 printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast}\CRT$ … … 326 327 327 328 \subsection{Substring} 328 The substring operation returns a subset of a string starting at a position in the string and traversing a length or matching a pattern string. 329 330 The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string. 329 331 \begin{cquote} 330 332 \setlength{\tabcolsep}{10pt} 331 333 \begin{tabular}{@{}l|ll|l@{}} 332 \multicolumn{2}{c}{\textbf{length}} & \multicolumn{2}{c}{\textbf{pattern}} \\ 333 \begin{cfa} 334 s = name( 2, 2 ); 335 s = name( 3, -2 ); 336 s = name( 2, 8 ); 337 s = name( 0, -1 ); 338 s = name( -1, -1 ); 334 \multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\ 335 \multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\ 336 \begin{cfa} 337 s = name( 0, 4 ); 338 s = name( 1, 4 ); 339 s = name( 2, 4 ); 340 s = name( 4, -2 ); 341 s = name( 8, 2 ); 342 s = name( 0, -2 ); 343 s = name( -1, -2 ); 339 344 s = name( -3 ); 340 345 \end{cfa} 341 346 & 342 347 \begin{cfa} 343 "KE" 344 "IK" 345 "KE", clip length to 2 346 "", beyond string clip to null 347 "K" 348 "IKE", to end of string 349 \end{cfa} 350 & 351 \begin{cfa} 352 s = name( "IK" ); 348 "PETE" 349 "ETER" 350 "TER" // clip length to 3 351 "ER" 352 "" // beyond string to right, clip to null 353 "" // beyond string to left, clip to null 354 "ER" 355 "TER" // to end of string 356 \end{cfa} 357 & 358 \begin{cfa} 359 s = name( "ET" ); 353 360 s = name( "WW" ); 354 361 … … 356 363 357 364 358 \end{cfa} 359 & 360 \begin{cfa} 361 "IK" 362 "" 363 364 365 366 367 \end{cfa} 368 \end{tabular} 369 \end{cquote} 370 A negative starting position is a specification from the right end of the string. 365 366 367 \end{cfa} 368 & 369 \begin{cfa} 370 "ET" 371 "" // does not occur 372 373 374 375 376 377 378 \end{cfa} 379 \end{tabular} 380 \end{cquote} 381 For the length form, a negative starting position is a specification from the right end of the string. 371 382 A negative length means that characters are selected in the opposite (right to left) direction from the starting position. 372 383 If the substring request extends beyond the beginning or end of the string, it is clipped (shortened) to the bounds of the string. 373 384 If the substring request is completely outside of the original string, a null string is returned. 374 The pattern-form either returns the pattern string isthe pattern matches or a null string if the pattern does not match.385 For the pattern-form, it returns the pattern string if the pattern matches or a null string if the pattern does not match. 375 386 The usefulness of this mechanism is discussed next. 376 387 … … 379 390 Hence, the left string may decrease, stay the same, or increase in length. 380 391 \begin{cquote} 381 \setlength{\tabcolsep}{15pt}382 392 \begin{tabular}{@{}l|l@{}} 383 393 \begin{cfa}[escapechar={}] … … 398 408 \end{tabular} 399 409 \end{cquote} 400 Now pattern matching is useful on the left-hand side of assignment. 401 \begin{cquote} 402 \setlength{\tabcolsep}{15pt} 410 Now substring pattern matching is useful on the left-hand side of assignment. 411 \begin{cquote} 403 412 \begin{tabular}{@{}l|l@{}} 404 413 \begin{cfa}[escapechar={}] … … 415 424 Extending the pattern to a regular expression is a possible extension. 416 425 417 The replace operation extensions substring to substitute all occurrences. 418 \begin{cquote} 419 \setlength{\tabcolsep}{15pt} 426 The replace operation extends substring to substitute all occurrences. 427 \begin{cquote} 420 428 \begin{tabular}{@{}l|l@{}} 421 429 \begin{cfa} … … 437 445 \subsection{Searching} 438 446 439 The findoperation returns the position of the first occurrence of a key in a string.447 The @find@ operation returns the position of the first occurrence of a key in a string. 440 448 If the key does not appear in the string, the length of the string is returned. 441 449 \begin{cquote} 442 \setlength{\tabcolsep}{15pt}443 450 \begin{tabular}{@{}l|l@{}} 444 451 \begin{cfa} … … 458 465 A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc. 459 466 \begin{cquote} 460 \setlength{\tabcolsep}{15pt}461 467 \begin{tabular}{@{}l|l@{}} 462 468 \begin{cfa} … … 478 484 Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance). 479 485 \begin{cquote} 480 \setlength{\tabcolsep}{15pt}481 486 \begin{tabular}{@{}l|l@{}} 482 487 \begin{cfa} … … 493 498 Both forms can return the longest substring of compliant characters. 494 499 \begin{cquote} 495 \setlength{\tabcolsep}{15pt}496 500 \begin{tabular}{@{}l|l@{}} 497 501 \begin{cfa} … … 513 517 There are also versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.} 514 518 \begin{cquote} 515 \setlength{\tabcolsep}{15pt}516 519 \begin{tabular}{@{}l|l@{}} 517 520 \begin{cfa} … … 533 536 The translate operation returns a string with each character transformed by one of the C character transformation functions. 534 537 \begin{cquote} 535 \setlength{\tabcolsep}{15pt}536 538 \begin{tabular}{@{}l|l@{}} 537 539 \begin{cfa} … … 580 582 \begin{figure} 581 583 \begin{cquote} 582 \setlength{\tabcolsep}{15pt}583 584 \begin{tabular}{@{}l|l@{}} 584 585 \multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\ … … 626 627 Hence, it is possible to convert a block of C string operations to \CFA strings just by changing the type @char *@ to @string@. 627 628 \begin{cquote} 628 \setlength{\tabcolsep}{15pt}629 629 \begin{tabular}{@{}ll@{}} 630 630 \begin{cfa} … … 659 659 The \CC manipulators are @setw@, and its associated width controls @left@, @right@ and @setfill@. 660 660 \begin{cquote} 661 \setlength{\tabcolsep}{15pt}662 661 \begin{tabular}{@{}l|l@{}} 663 662 \begin{c++} … … 677 676 The \CFA manipulators are @bin@, @oct@, @hex@, @wd@, and its associated width control and @left@. 678 677 \begin{cquote} 679 \setlength{\tabcolsep}{15pt}680 678 \begin{tabular}{@{}l|l@{}} 681 679 \begin{cfa} … … 706 704 Reading into a @char@ is safe as the size is 1, @char *@ is unsafe without using @setw@ to constraint the length (which includes @'\0'@), @string@ is safe as its grows dynamically as characters are read. 707 705 \begin{cquote} 708 \setlength{\tabcolsep}{15pt}709 706 \begin{tabular}{@{}l|l@{}} 710 707 \begin{c++} … … 771 768 \CC modifies the mutable receiver object, replacing by position (zero origin) and length. 772 769 \begin{cquote} 773 \setlength{\tabcolsep}{15pt}774 770 \begin{tabular}{@{}l|l@{}} 775 771 \begin{c++} … … 787 783 \label{p:JavaReplace} 788 784 \begin{cquote} 789 \setlength{\tabcolsep}{15pt}790 785 \begin{tabular}{@{}l|l@{}} 791 786 \begin{java} … … 802 797 Java also provides a mutable @StringBuffer@, replacing by position (zero origin) and length. 803 798 \begin{cquote} 804 \setlength{\tabcolsep}{15pt}805 799 \begin{tabular}{@{}l|l@{}} 806 800 \begin{java} … … 1265 1259 The common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present \CFA lowering. 1266 1260 \begin{cquote} 1267 \setlength{\tabcolsep}{15pt}1268 1261 \begin{tabular}{@{}l|l@{}} 1269 1262 \begin{cfa} … … 1366 1359 Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases need revisions. 1367 1360 \begin{cquote} 1368 \setlength{\tabcolsep}{15pt}1369 1361 \begin{tabular}{ll} 1370 1362 HL & LL \\
Note:
See TracChangeset
for help on using the changeset viewer.