Changeset 06ffa95 for doc/theses/mike_brooks_MMath/string.tex
- Timestamp:
- Mar 14, 2025, 7:15:01 PM (7 months ago)
- Branches:
- master
- Children:
- 8617ee90
- Parents:
- 7611631
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
TabularUnified doc/theses/mike_brooks_MMath/string.tex ¶
r7611631 r06ffa95 9 9 10 10 To prepare for the following discussion, comparisons among C, \CC, Java and \CFA strings are presented, beginning in \VRef[Figure]{f:StrApiCompare}. 11 It provides a classic ``cheat sheet'' presentation, summarizing the names of the most 11 It provides a classic ``cheat sheet'' presentation, summarizing the names of the most-common closely-equivalent operations. 12 12 The over-arching commonality is that operations work on groups of characters for assigning, copying, scanning, and updating. 13 13 … … 39 39 hence, most of its group operations are error prone and expensive. 40 40 Most high-level string libraries use a separate length field and specialized storage management to implement group operations. 41 Interestingly, \CC strings retain null termination justin case it is needed to interface with C library functions.41 Interestingly, \CC strings retain null termination in case it is needed to interface with C library functions. 42 42 \begin{cfa} 43 43 int open( @const char * pathname@, int flags ); … … 52 52 While \VRef[Figure]{f:StrApiCompare} emphasizes cross-language similarities, it elides many specific operational differences. 53 53 For example, the @replace@ function selects a substring in the target and substitutes it with the source string, which can be smaller or larger than the substring. 54 \CC performs the modification on the mutable receiver object .54 \CC performs the modification on the mutable receiver object 55 55 \begin{cfa} 56 56 string s1 = "abcde"; … … 60 60 \end{cfa} 61 61 while Java allocates and returns a new string with the result, leaving the receiver unmodified. 62 \label{p:JavaReplace} 62 63 \begin{java} 63 64 String s = "abcde"; 64 65 String r = s.replace( "cde", "xy" ); $\C[2.25in]{// replace by text, immutable}$ 66 System.out.println( s + ' ' + r ); 67 $\texttt{abcde abxy}$ 68 \end{java} 69 % Generally, Java's @String@ type is immutable. 70 Java provides a @StringBuffer@ near-analog that is mutable. 71 \begin{java} 65 72 StringBuffer sb = new StringBuffer( "abcde" ); 66 sb.replace( 2, 5, "xy" ); $\C {// replace by position, mutable}\CRT$67 System.out.println( s + ' ' + r + ' ' + sb );68 $\texttt{ab cde abxy abxy}$73 sb.replace( 2, 5, "xy" ); $\C[2.25in]{// replace by position, mutable}\CRT$ 74 System.out.println( sb ); 75 $\texttt{abxy}$ 69 76 \end{java} 70 Generally, Java's @String@ type is immutable. 71 Java provides a @StringBuffer@ near-analog that is mutable, but the differences are significant; for example, @StringBuffer@'s @substring@ functions still return copies rather than mutable selections.72 Finally, the operations between these type are asymmetric, \eg @ string@ has @replace@ by text but not replace by position and vice versa for @StringBuffer@.77 However, there are significant differences; 78 \eg, @StringBuffer@'s @substring@ function returns a @String@ copy that is immutable. 79 Finally, the operations between these type are asymmetric, \eg @String@ has @replace@ by text but not replace by position and vice versa for @StringBuffer@. 73 80 74 81 More significant operational differences relate to storage management, often appearing through assignment (@target = source@), and are summarized in \VRef[Figure]{f:StrSemanticCompare}. … … 140 147 $\texttt{\small abcde abcde abcde bcde cde}$ 141 148 \end{cfa} 142 Note, all of these strings rely on the single null termination character at the end of @s@.149 Note, all of these aliased strings rely on the single null termination character at the end of @s@. 143 150 The issue of symmetry does not apply to C strings because the value and pointer strings are essentially different types, and so this feature is scored as not applicable for C. 144 With the type not managing the text buffer, there is no ownership question, \ie operations on @s1@ or @s2@ never leads to their memory becoming reusable.151 With the type not managing the text storage, there is no ownership question, \ie operations on @s1@ or @s2@ never leads to their memory becoming reusable. 145 152 While @s3@ is a valid C-string that contains a proper substring of @s1@, the @s3@ technique does not constitute having a fragment referent because null termination implies the substring cannot be chosen arbitrarily; the technique works only for suffixes. 146 153 … … 157 164 \end{cfa} 158 165 The lax symmetry reflects how the validity of @s1@ depends on the content and lifetime of @s@. 159 It is common practice in \CC to use the @s1@-style pass by reference, with the understanding that the callee only use the referenced string for the duration of the call, \ie no side-effect using the parameter.160 So, when the called function is a constructor, it is typical to use an @s2@-style copy 166 It is common practice in \CC to use the @s1@-style pass by reference, with the understanding that the callee only uses the referenced string for the duration of the call, \ie no side-effect using the parameter. 167 So, when the called function is a constructor, it is typical to use an @s2@-style copy-initialization to string-object-typed member. 161 168 Exceptions to this pattern are possible, but require the programmer to assure safety where the type system does not. 162 The @s3@ initialization is constrained to copy the substring because @c_str@ always provides a null-terminated character, which is different fromsource string.169 The @s3@ initialization is constrained to copy the substring because @c_str@ always provides a null-terminated character, which may be different from the source string. 163 170 @s3@ assignment could be fast by reference counting the text area and using copy-on-write, but would require an implementation upgrade. 164 171 165 172 In Java, @String@ also offers a high-level abstraction: 166 \begin{ cfa}173 \begin{java} 167 174 String s = "abcde"; 168 175 String s1 = s; $\C[2.25in]{// snapshot state, strict symmetry, variable-constrained referent}$ … … 173 180 $\texttt{\small abcde abcde bc c}$ 174 181 $\texttt{\small true false false}$ 175 \end{ cfa}182 \end{java} 176 183 Note, @substring@ takes a start and end position, rather than a start position and length. 177 184 Here, facts about Java's implicit pointers and pointer equality can over complicate the picture, and so are ignored. … … 180 187 With @s2@, the case for fast-copy is more subtle. 181 188 Certainly, its value is not pointer-equal to @s@, implying at least a further allocation. 182 TODO: finish the fast-copy case. 183 Java's immutable strings mean aliasing is impossible with the @String@ type. 184 Java's @StringBuffer@ provides aliasing, though without supporting symmetric treatment of a fragment referent; as a result, @StringBuffer@ scores as \CC. 185 The easy symmetry that the Java string enjoys is aided by Java's garbage collection; Java's @s2@ is doing effectively the operation of \CC's @s2@, though without the consequence to of complicating memory management. 186 187 Finally, In \CFA, @string@ also offers a high-level abstraction: 189 \PAB{TODO: finish the fast-copy case.} 190 Java does not meet the aliasing requirement because immutability make it impossible to modify. 191 Java's @StringBuffer@ provides aliasing (see @replace@ example on \VPageref{p:JavaReplace}), though without supporting symmetric treatment of a fragment referent, \eg @substring@ of a @StringBuffer@ is a @String@; 192 as a result, @StringBuffer@ scores as \CC. 193 The easy symmetry that the Java string enjoys is aided by Java's garbage collection; Java's @s2@ is doing effectively the operation of \CC's @s3@, though without the consequence of complicating memory management. 194 \PAB{What complex storage management is going on here?} 195 196 Finally, in \CFA, @string@ also offers a high-level abstraction: 188 197 \begin{cfa} 189 198 string s = "abcde"; 190 199 string & s1 = s; $\C[2.25in]{// alias state, strict symmetry, variable-constrained referent}$ 191 200 string s2 = s; $\C{// snapshot state, strict symmetry, variable-constrained referent}$ 192 string s3 = s`share Edits; $\C{// alias state, strict symmetry, variable-constrained referent}\CRT$201 string s3 = s`share; $\C{// alias state, strict symmetry, variable-constrained referent}\CRT$ 193 202 string s4 = s( 1, 2 ); 194 203 string s5 = s4( 1, 1 ); … … 198 207 % all helpful criteria of \VRef[Figure]{f:StrSemanticCompare} are satisfied. 199 208 The \CFA string manages storage, handles all assignments, including those of fragment referents with fast initialization, provides the choice between snapshot and alias semantics, and does so symmetrically with one type (which assures text validity according to the lifecycles of the string variables). 200 With aliasing, the intuition is that each string is an editor on an open shared document. 201 With fragment aliasing, the intuition is that these editor views have been scrolled or zoomed to overlapping, but potentially different, ranges. 209 The intended metaphor for \CFA stings is similar to a GUI text-editor or web browser. 210 Select a consecutive block of text using the mouse generates an aliased substring in the file/dialog-box. 211 Typing into the selected area is like assigning to an aliased substring, where the highlighted text is replaced with more or less text; 212 depending on the text entered, the file/dialog-box content grows or shrinks. 213 \PAB{Need to discuss the example, as for the other languages.} 202 214 203 215 The remainder of this chapter explains how the \CFA string achieves this usage style. 204 216 205 217 206 207 218 \section{Storage Management} 208 219 209 220 This section discusses issues related to storage management of strings. 210 Specifically, it is common for strings to logically overlap completely or partially.221 Specifically, it is common for strings to logically overlap partially or completely. 211 222 \begin{cfa} 212 223 string s1 = "abcdef"; … … 218 229 s3[1] = 'w'; $\C{// what happens to s1 and s2?}$ 219 230 \end{cfa} 220 This question is the notion of mutable or immutable strings.221 For example, Java has immutable strings that are copiedwhen any overlapping string changes.222 Note, the notion of underlying string mutability is not specified by @const@ , \eg:231 which is restricted by a string's mutable or immutable property. 232 For example, Java's immutable strings require copy-on-write when any overlapping string changes. 233 Note, the notion of underlying string mutability is not specified by @const@; \eg in \CC: 223 234 \begin{cfa} 224 235 const string s1 = "abc"; 225 236 \end{cfa} 226 Here,@const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage.227 Hence, @s1@ is not pointing at an immutable constant, meaning its underlying string is always mutable, unless some other designation is specified, such as Java's globalrule.237 the @const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage. 238 Hence, @s1@ is not pointing at an immutable constant, meaning its underlying string can be mutable, unless some other designation is specified, such as Java's global immutable rule. 228 239 229 240 230 241 \subsection{Logical overlap} 231 242 232 \CFA provides a dynamic mechanism to indicate mutable or immutable as an assignment attribute: @`shareEdits@. 233 234 Consider two strings @s1@ and @s1a@ that are in an aliasing relationship, and a third, @s2@, made by a simple copy from @s1@. 235 Aliasing (@`shareEdits@) means that changes flow in both directions; with a simple copy, they do not. 243 \CFA provides a dynamic mechanism to indicate mutable or immutable using the attribute @`share@. 244 This aliasing relationship is a sticky-property established at initialization. 245 For example, here strings @s1@ and @s1a@ are in an aliasing relationship, while @s2@ is in a copy relationship. 236 246 \input{sharing1.tex} 237 238 Aliasing (@`shareEdits@) means that changes flow in both directions; with a simple copy, they do not. 247 Here, the aliasing (@`share@) causes partial changes (subscripting) to flow in both directions. 239 248 \input{sharing2.tex} 240 241 Assignment of a value is just a modification. 242 The aliasing relationship is established at construction and is unaffected by assignment of a value. 249 Similarly for complete changes. 243 250 \input{sharing3.tex} 244 251 245 Assignment from a string is just assignment of a value.246 Whether of not the RHS participates in aliasing is irrelevant. Anyaliasing of the LHS is unaffected.252 Because string assignment copies the value, RHS aliasing is irrelevant. 253 Hence, aliasing of the LHS is unaffected. 247 254 \input{sharing4.tex} 248 255 249 Consider new strings @s1_mid@ being an alias for a runin the middle of @s1@, along with @s2@, made by a simple copy from the middle of @s1@.256 Now, consider string @s1_mid@ being an alias in the middle of @s1@, along with @s2@, made by a simple copy from the middle of @s1@. 250 257 \input{sharing5.tex} 251 252 Again, @`shareEdits@ passes changes in both directions; copy does not. 253 Note the difference in index values, with the \emph{b} position being 1 in the longer string and 0 in the shorter strings. 254 In the case of s1 aliasing with @s1_mid@, the very same character is being accessed by different positions. 258 Again, @`share@ passes changes in both directions; copy does not. 259 As a result, the index values for the position of @b@ are 1 in the longer string @"abcd"@ and 0 in the shorter aliased string @"bc"@. 260 This alternate positioning also applies to subscripting. 255 261 \input{sharing6.tex} 256 262 257 Once again, assignment of a value is a modification that flows through the aliasing relationship,without affecting its structure.263 Finally, assignment flows through the aliasing relationship without affecting its structure. 258 264 \input{sharing7.tex} 259 260 In the \emph{ff} step, which is a positive example of flow across an aliasing relationship, the result is straightforward to accept because the flow direction is from contained (small) to containing (large). 261 The following rules for edits through aliasing substrings will guide how to flow in the opposite direction. 262 263 Growth and shrinkage are natural extensions. 264 An empty substring is a real thing, at a well-defined location, whose meaning is extrapolated from the examples so far. 265 The intended metaphor is to operating a GUI text editor. 266 Having an aliasing substring is like using the mouse to select a few words. 267 Assigning onto an aliasing substring is like typing from having a few words selected: depending how much you type, the file being edited can get shorter or longer. 265 In the @"ff"@ assignment, the result is straightforward to accept because the flow direction is from contained (small) to containing (large). 266 The following rules explain aliasing substrings that flow in the opposite direction, large to small. 267 268 Growth and shrinkage are natural extensions, as for the text-editor example mentioned earlier, where an empty substring is as real real as an empty string. 268 269 \input{sharing8.tex} 269 270 270 Multiple portions can be aliased.271 When there are several aliasing substrings at once, the text editor analogy becomes an online multi-user editor.272 I should be able to edit a paragraph in one place (changing the document's length), without my edits affecting which letters are within a mouse-selection that you had made previously, somewhere else.271 Multiple portions of a string can be aliased. 272 % When there are several aliasing substrings at once, the text editor analogy becomes an online multi-user editor. 273 %I should be able to edit a paragraph in one place (changing the document's length), without my edits affecting which letters are within a mouse-selection that you had made previously, somewhere else. 273 274 \input{sharing9.tex} 274 275 When an edit happens on an aliasing substring that overlaps another, an effect is unavoidable. 276 Here, the passive party sees its selection shortened, to exclude the characters that were not part of the original selection.275 When @s1_bgn@'s size increases by 3, @s1_mid@'s starting location moves from 1 to 4 and @s1_end@'s from 3 to 6, 276 277 When changes happens on an aliasing substring that overlap. 277 278 \input{sharing10.tex} 278 279 TODO: finish typesetting the demo 279 Strings @s1_crs@ and @s1_mid@ overlap at character 4, @j@ because the substrings are 3,2 and 4,2. 280 When @s1_crs@'s size increases by 1, @s1_mid@'s starting location moves from 4 to 5, but the overlaping character remains, changing to @'+'@. 281 282 \PAB{TODO: finish typesetting the demo} 280 283 281 284 %\input{sharing-demo.tex}
Note:
See TracChangeset
for help on using the changeset viewer.