Context Navigation

← Previous Change
Next Change →

Changeset 5d300ba for doc/theses

Timestamp:

Dec 16, 2025, 7:41:51 AM (2 months ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

master, stuck-waitfor-destruct

Children:

Parents:

Message:

proofread string chapter

Location:

doc/theses/mike_brooks_MMath

Files:

: 7 edited

plots/list-zoomout-noshuf.gp (modified) (1 diff)
plots/list-zoomout-shuf.gp (modified) (1 diff)
plots/string-peq-cppemu.gp (modified) (3 diffs)
plots/string-peq-sharing.gp (modified) (3 diffs)
plots/string-pta-sharing.gp (modified) (3 diffs)
programs/sharing-demo.cfa (modified) (1 diff)
string.tex (modified) (63 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/plots/list-zoomout-noshuf.gp

-              r35fc819
+              r5d300ba
 set key top left
 set logscale x
 set logscale y
 set yrange [1:1000];
+#set logscale y
+#set yrange [1:1000];
 set xlabel "List length (item count)" offset 2,0
 set ylabel "Duration (ns)"

doc/theses/mike_brooks_MMath/plots/list-zoomout-shuf.gp

-              r35fc819
+              r5d300ba
 set logscale x
 set logscale y
 set yrange [1:1000];
+#set yrange [1:1000];
 set xlabel "List length (item count)" offset 2,0
 set ylabel "Duration (ns)"
+set ylabel "Duration (ns), log scale"
 set linetype 3 dashtype 2
 set linetype 4 dashtype 2

doc/theses/mike_brooks_MMath/plots/string-peq-cppemu.gp

-              r35fc819
+              r5d300ba
 set terminal pdf color enhanced size 6.0in,3.0in font "Times,17"
+set terminal pdf color enhanced size 6.5in,3.5in font "Times,17"
 #set terminal postscript portrait enhanced size 7.5, 10. color solid 9.5;
 #set terminal wxt size 950,1250
 …
 set grid
 set key top left
+set xrange [1:500]
 set xtics (1,2,5,10,20,50,100,200,500)
 set logscale x
 …
 set linetype 4 dashtype 2
 plot INDIR."/plot-string-peq-cppemu.dat" \
+           i 0 using 1:2 title columnheader(1)  with points lt rgb "red"  pt  2  ps 1, \
+        '' i 1 using 1:2 title columnheader(1)  with points lt rgb "red"  pt  1  ps 1, \
+        '' i 2 using 1:2 title columnheader(1)  with points lt rgb "blue" pt  6  ps 1, \
+        '' i 3  using 1:2 title columnheader(1) with points lt rgb "blue" pt  8  ps 1
+           i 0 using 1:2 title columnheader(1)  with points lt rgb "red" pt 2  ps 1, \
+        '' i 0 using 1:2 notitle smooth sbezier lt rgb "red" dashtype 1, \
+        '' i 1 using 1:2 title columnheader(1)  with points lt rgb "red" pt 1  ps 1, \
+        '' i 1 using 1:2 notitle smooth sbezier lt rgb "red" dashtype 4, \
+        '' i 2 using 1:2 title columnheader(1)  with points lt rgb "blue" pt 6  ps 1, \
+        '' i 2 using 1:2 notitle smooth sbezier lt rgb "blue" dashtype 1, \
+        '' i 3 using 1:2 title columnheader(1) with points lt rgb "blue" pt 8  ps 1, \
+        '' i 3 using 1:2 notitle smooth sbezier lt rgb "blue" dashtype 4 \

doc/theses/mike_brooks_MMath/plots/string-peq-sharing.gp

-              r35fc819
+              r5d300ba
 set terminal pdf color enhanced size 6.0in,3.0in font "Times,17"
+set terminal pdf color enhanced size 6.5in,3.5in font "Times,17"
 #set terminal postscript portrait enhanced size 7.5, 10. color solid 9.5;
 #set terminal wxt size 950,1250
 …
 set grid
 set key top left
+set xrange [1:500]
 set xtics (1,2,5,10,20,50,100,200,500)
 set logscale x
 …
 set linetype 4 dashtype 2
 plot INDIR."/plot-string-peq-sharing.dat" \
+           i 0 using 1:2 title columnheader(1) with points lt rgb "red"  pt  2  ps 1, \
+        '' i 1 using 1:2 title columnheader(1) with points lt rgb "red"  pt  1  ps 1, \
+        '' i 2 using 1:2 title columnheader(1) with points lt rgb "blue" pt  6  ps 1, \
+        '' i 3 using 1:2 title columnheader(1) with points lt rgb "blue" pt  8  ps 1
+           i 0 using 1:2 title columnheader(1)  with points lt rgb "red" pt 2  ps 1, \
+        '' i 0 using 1:2 notitle smooth sbezier lt rgb "red" dashtype 1, \
+        '' i 1 using 1:2 title columnheader(1)  with points lt rgb "red" pt 1  ps 1, \
+        '' i 1 using 1:2 notitle smooth sbezier lt rgb "red" dashtype 4, \
+        '' i 2 using 1:2 title columnheader(1)  with points lt rgb "blue" pt 6  ps 1, \
+        '' i 2 using 1:2 notitle smooth sbezier lt rgb "blue" dashtype 1, \
+        '' i 3 using 1:2 title columnheader(1) with points lt rgb "blue" pt 8  ps 1, \
+        '' i 3 using 1:2 notitle smooth sbezier lt rgb "blue" dashtype 4 \

doc/theses/mike_brooks_MMath/plots/string-pta-sharing.gp

-              r35fc819
+              r5d300ba
 set terminal pdf color enhanced size 6.0in,3.0in font "Times,17"
+set terminal pdf color enhanced size 6.5in,3.5in font "Times,17"
 #set terminal postscript portrait enhanced size 7.5, 10. color solid 9.5;
 #set terminal wxt size 950,1250
 …
 set grid
 set key top left
+set xrange [1:500]
 set xtics (1,2,5,10,20,50,100,200,500)
 set logscale x
 …
 set xlabel "String Length being appended (mean, geo. dist.), log scale" offset 2,0
 set ylabel "Time per append (ns, mean), log_{2} scale"
-#show colornames
 plot INDIR."/plot-string-pta-sharing.dat" \
+           i 0 using 1:2 title columnheader(1) with points lt rgb "red"        pt  2   ps 1, \
+        '' i 1 using 1:2 title columnheader(1) with points lt rgb "dark-green" pt  4   ps 1, \
+        '' i 2 using 1:2 title columnheader(1) with points lt rgb "blue"       pt  6   ps 1, \
+        '' i 3 using 1:2 title columnheader(1) with points lt rgb "dark-green" pt  12  ps 1
+           i 0 using 1:2 title columnheader(1)  with points lt rgb "red" pt 2  ps 1, \
+        '' i 0 using 1:2 notitle smooth sbezier lt rgb "red" dashtype 1, \
+        '' i 1 using 1:2 title columnheader(1)  with points lt rgb "dark-green" pt 4  ps 1, \
+        '' i 1 using 1:2 notitle smooth sbezier lt rgb "dark-green" dashtype 4, \
+        '' i 2 using 1:2 title columnheader(1)  with points lt rgb "blue" pt 6  ps 1, \
+        '' i 2 using 1:2 notitle smooth sbezier lt rgb "blue" dashtype 1, \
+        '' i 3 using 1:2 title columnheader(1) with points lt rgb "dark-green" pt 12  ps 1, \
+        '' i 3 using 1:2 notitle smooth sbezier lt rgb "dark-green" dashtype 4 \

doc/theses/mike_brooks_MMath/programs/sharing-demo.cfa

r35fc819	r5d300ba
300	300	open( outfile, "build/sharing10.tex" );
301	301	outfile \| "\\begin{cquote}";
	302	outfile \| "\\setlength{\\tabcolsep}{10pt}";
302	303	outfile \| "\\begin{tabular}{@{}rlllll@{}}";
303	304	outfile \| "\t\t\t\t& @s1@\t& @s1_bgn@\t& @s1_crs@\t& @s1_mid@\t& @s1_end@\t\\\\";

doc/theses/mike_brooks_MMath/string.tex

-              r35fc819
+              r5d300ba
 \begin{cquote}
 \begin{tabular}{@{}l|l|l|l@{}}
 C @char [ ]@                    &  \CC @string@                 & Java @String@ & \CFA @string@ \\
+C @char [ ]@                    & \CC @string@                  & Java @String@ & \CFA @string@ \\
 \hline
 @strcpy@, @strncpy@             & @=@                                   & @=@                   & @=@   \\
 …
 As a result, a @string@ declaration does not specify a maximum length, where a C string array does.
 For \CFA, as a @string@ dynamically grows and shrinks in size, so does its underlying storage.
 For C, as a string dynamically grows and shrinks in size, but its underlying storage does not.
+For C, as a string dynamically grows and shrinks in size, its underlying storage does not.
 The maximum storage for a \CFA @string@ value is @size_t@ characters, which is $2^{32}$ or $2^{64}$ respectively.
 A \CFA string manages its length separately from the string, so there is no null (@'\0'@) terminating value at the end of a string value.
 …
 Hence, the basic types @char@, @char *@, @int@, @double@, @_Complex@, including any signness and size variations, implicitly convert to type @string@ (as in Java).
 \begin{cquote}
+\begin{tabular}{@{}l|ll|l@{}}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}llll@{}}
 \begin{cfa}
 string s = 5;
 …
 Conversions can be explicitly specified using a compound literal.
 \begin{cfa}
 s = (string){ 5 };    s = (string){ "abc" };   s = (string){ 5.5 };
+s = (string){ 5 };    s = (string){ "abc" };    s = (string){ 5.5 };
 \end{cfa}
 Conversions from @string@ to @char *@ attempt to be safe.
 The @strncpy@ conversion requires the maximum length for the pointer's target buffer.
+The overloaded @strncpy@ function is safe, if the length of the C string is correct.
 The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it.
 Note, a C string is always null terminated, implying storage is always necessary for the null.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 string s = "abcde";
 char cs[4];
 strncpy( cs, s, sizeof(cs) );
 char * cp = s;          // ownership
+char * cp = s;            // ownership
 delete( cp );
 cp = s + ' ' + s;       // ownership
 …
 \subsection{Length}
 The @len@ operation (short for @strlen@) returns the length of a C or \CFA string.
 For compatibility, @strlen@ also works with \CFA strings.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+The @len@ operation (short for @strlen@) returns the length of a C (not including the terminating null) or \CFA string.
+For compatibility, an overloaded @strlen@ works with \CFA strings.
+\begin{cquote}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 i = len( "" );
 …
 In C, these operators compare the C string pointer not its value, which does not match programmer expectation.
 C strings use function @strcmp@ to lexicographically compare the string value.
 Java has the same issue with @==@ and @.equals@.
+Java has the same issue with @==@ (reference) and @.equals@ (value) comparison.
 …
 The binary operators @+@ and @+=@ concatenate C @char@, @char *@ and \CFA strings, creating the sum of the characters.
 \begin{cquote}
+\begin{tabular}{@{}l|l@{\hspace{15pt}}l|l@{\hspace{15pt}}l|l@{}}
+\setlength{\tabcolsep}{5pt}
+\begin{tabular}{@{}ll|ll|ll@{}}
 \begin{cfa}
 s = "";
 …
 Similarly, it is impossible to restrict or remove addition on type @char *@ because (unfortunately) it is subscripting: @cs + 'a'@ implies @cs['a']@ or @'a'[cs]@.
 The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ constants work correctly (variables are the same).
+The prior \CFA concatenation examples show complex mixed-mode interactions among @char@, @char *@, and @string@ constants work correctly (@string@ variables are the same).
 The reason is that the \CFA type-system handles this kind of overloading well using the left-hand assignment-type and complex conversion costs.
 Hence, the type system correctly handles all uses of addition (explicit or implicit) for @char *@.
 …
 If $N = 0$, a zero length string, @""@, is returned.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 s = 'x' * 0;
 …
 multiplication of pointers does not exist in C.
 \begin{cfa}
 ch = ch * 3;            $\C[2in]{// LHS disambiguate, multiply character values}$
+ch = ch * 3;            $\C[2in]{// LHS disambiguate, multiply character value}$
 s = 'a' * 3;            $\C{// LHS disambiguate, concatenate characters}$
 printf( "%c\n", @'a' * 3@ ); $\C{// no LHS information, ambiguous}$
 …
 The substring operation returns a subset of a string starting at a position in the string and traversing a length, or matching a pattern string.
 \begin{cquote}
 \setlength{\tabcolsep}{10pt}
 \begin{tabular}{@{}l|ll|l@{}}
+\setlength{\tabcolsep}{8pt}
+\begin{tabular}{@{}ll|ll@{}}
 \multicolumn{2}{@{}c}{\textbf{length}} & \multicolumn{2}{c@{}}{\textbf{pattern}} \\
 \multicolumn{4}{@{}l}{\lstinline{string name = "PETER"}} \\
 …
 "TER"   // clip length to 3
 "ER"
 ""                 // beyond string to right, clip to null
 ""                 // beyond string to left, clip to null
+""                 // clip, beyond right
+""                 // clip, beyond left
 "ER"
 "TER"   // to end of string
 …
 Hence, the left string may decrease, stay the same, or increase in length.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}[escapechar={}]
 digit( 3, 3 ) = "";
 …
 \end{tabular}
 \end{cquote}
 Now substring pattern matching is useful on the left-hand side of assignment.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+Here, substring pattern matching is useful on the left-hand side of assignment.
+\begin{cquote}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}[escapechar={}]
 digit( "$$" ) = "345";
 …
 \end{tabular}
 \end{cquote}
 Extending the pattern to a regular expression is a possible extension.
+Supporting a regular-expression pattern is a possible extension.
 The replace operation extends substring to substitute all occurrences.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 s = replace( "PETER", "E", "XX" );
 …
 If the key does not appear in the string, the length of the string is returned.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 i = find( digit, '3' );
 …
 A character-class operation indicates if a string is composed completely of a particular class of characters, \eg, alphabetic, numeric, vowels, \etc.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 charclass vowels{ "aeiouy" };
 …
 Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance).
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 i = exclude( "cdbfghmk", vowels );
 …
 Both forms can return the longest substring of compliant characters.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 s = include( "aaeiuyoo", vowels );
 …
 There are also versions of @include@ and @exclude@, returning a position or string, taking a validation function, like one of the C character-class functions.\footnote{It is part of the hereditary of C that these function take and return an \lstinline{int} rather than a \lstinline{bool}, which affects the function type.}
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 i = include( "1FeC34aB", @isxdigit@ );
 …
 The translate operation returns a string with each character transformed by one of the C character transformation functions.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 s = translate( "abc", @toupper@ );
 …
 However, string search can fail, which is reported as an alternate search outcome, possibly an exception.
 Many string libraries use a return code to indicate search failure, with a failure value of @0@ or @-1@ (PL/I~\cite{PLI} returns @0@).
 This semantics leads to the awkward pattern, which can appear many times in a string library or user code.
+This semantics leads to an awkward pattern, which can appear many times in a string library or user code.
 \begin{cfa}
 i = exclude( s, alpha );
 …
 else return "";
 \end{cfa}
+The problem is that substring does the wrong thing or fails with the failure return-code in most string libraries, so it has to be special cased.
 \CFA adopts a return code but the failure value is taken from the index-of function in APL~\cite{apl}, which returns the length of the target string $N$ (or $N+1$ for 1 origin).
 …
 \begin{figure}
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\
 \begin{cfa}
 …
 \begin{cquote}
 \begin{tabular}{@{}ll@{}}
 \begin{cfa}
+char s[32];   // string s;
+\multicolumn{2}{@{}l@{}}{\lstinline{char s[32];   // changing s's type to string works}} \\
+\begin{cfa}
 strlen( s );
 strnlen( s, 3 );
 …
+&
 \begin{cfa}
 strcpy( s, "abc" );
 strncpy( s, "abcdef", 3 );
 …
 \subsection{I/O Operators}
 The ability to input and output strings is as essential as for any other type.
 The goal for character I/O is to also work with groups rather than individual characters.
+The ability to input and output a string is as essential as for any other type.
+The goal for character I/O is to work with groups rather than individual characters.
 A comparison with \CC string I/O is presented as a counterpoint to \CFA string I/O.
 …
 The \CC manipulators are @setw@, and its associated width controls @left@, @right@ and @setfill@.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{c++}
 string s = "abc";
 …
 The \CFA manipulators are @bin@, @oct@, @hex@, @wd@, and its associated width control and @left@.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{cfa}
 string s = "abc";
 …
 Reading into a @char@ is safe as the size is 1, @char *@ is unsafe without using @setw@ to constraint the length (which includes @'\0'@), @string@ is safe as its grows dynamically as characters are read.
 \begin{cquote}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{c++}
 char ch, c[10];
 …
 \end{cquote}
 Input text can be \emph{gulped}, including whitespace, from the current point to an arbitrary delimiter character using @getline@.
+\begin{cquote}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}ll@{}}
+\multicolumn{1}{c}{\textbf{\CC}} & \multicolumn{1}{c}{\textbf{\CFA}} \\
+\begin{cfa}
+string s1, s2, s3;
+getline( cin, s1, 'a' );
+getline( cin, s2, 'w' );
+getline( cin, s3 );
+cout << s1 << ' ' << s2 << ' ' << s3 << endl;
+@bbbad ddwxyz@
+\end{cfa}
+&
+\begin{cfa}
+sin | getline( s1, 'a' )
+        | getline( s2, 'w' )
+        | getline( s3 );
+sout | s1 | s2 | s3;       "bbb" "d dd" "xyz"
+\end{cfa}
+\end{tabular}
+\end{cquote}
 The \CFA philosophy for input is that, for every constant type in C, these constants should be usable as input.
 …
 \begin{cquote}
 \setlength{\tabcolsep}{10pt}
 \begin{tabular}{@{}l|l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{c++}
 char ch, c[10];
 …
 sin | ch | wdi( 5, c ) | s;
 @abcde fg@
 sin | quote( ch ) | quote( wdi( sizeof(c), c ) ) | quote( s, '[', ']' ) | nl;
 @'a' "bcde" [fg]@
+sin | @quote@( ch ) | @quote@( wdi( sizeof(c), c ) ) | @quote@( s, '[', ']' ) | nl;
+@'a'   "bcde"      [fg]@
 sin | incl( "a-zA-Z0-9 ?!&\n", s ) | nl;
 @x?&000xyz TOM !.@
 …
 For example, the \CC @replace@ function selects a substring in the target and substitutes it with the source string, which can be smaller or larger than the substring.
 \CC modifies the mutable receiver object, replacing by position (zero origin) and length.
-\begin{cquote}
-\begin{tabular}{@{}l|l@{}}
 \begin{c++}
 string s1 = "abcde";
 s1.replace( 2, 3, "xy" );
+s1.replace( 2, 3, "xy" );                        "abxy"
 \end{c++}
+&
-\begin{c++}
-"abxy"
-\end{c++}
-\end{tabular}
-\end{cquote}
 Java cannot modify the receiver (immutable strings) so it returns a new string, replacing by text.
 \label{p:JavaReplace}
-\begin{cquote}
-\begin{tabular}{@{}l|l@{}}
 \begin{java}
 String s = "abcde";
 String r = s.replace( "cde", "xy" );
+String r = s.replace( "cde", "xy" );      "abxy"
 \end{java}
+&
-\begin{java}
-"abxy"
-\end{java}
-\end{tabular}
-\end{cquote}
 Java also provides a mutable @StringBuffer@, replacing by position (zero origin) and length.
-\begin{cquote}
-\begin{tabular}{@{}l|l@{}}
 \begin{java}
 StringBuffer sb = new StringBuffer( "abcde" );
 sb.replace( 2, 5, "xy" );
+sb.replace( 2, 5, "xy" );                         "abxy"
 \end{java}
+&
-\begin{java}
-"abxy"
-\end{java}
-\end{tabular}
-\end{cquote}
 However, there are anomalies.
 @StringBuffer@'s @substring@ returns a @String@ copy that is immutable rather than modifying the receiver.
 …
 \begin{figure}
 \setlength{\extrarowheight}{2pt}
+\begin{tabularx}{\textwidth}{@{}p{0.6in}XXcccc@{}}
+\setlength{\tabcolsep}{5pt}
+\begin{tabularx}{\textwidth}{@{}p{0.8in}XXcccc@{}}
                                         &                       &                       & \multicolumn{4}{@{}c@{}}{\underline{Supports Helpful?}} \\
                                         & Required      & Helpful       & C                     & \CC           & Java          & \CFA \\
 \hline
 Type abst'n
+Type Abstraction
                                         & Low-level: The string type is a varying amount of text communicated via a parameter or return.
                                                                 & High-level: The string-typed relieves the user of managing memory for the text.
 …
 String s3 = s2.substring( 1, 2 );  $\C{// snapshot state (possible), strict symmetry, fragment referent}\CRT$
 System.out.println( s + ' ' + s1 + ' ' + s2 + ' ' + s3 );
+$\texttt{\small abcde abcde bc c}$
 System.out.println( (s == s1) + " " + (s == s2) + " " + (s2 == s3) );
-$\texttt{\small abcde abcde bc c}$
 $\texttt{\small true false false}$
 \end{java}
 …
 string s3 = s`share; $\C{// alias state, strict symmetry, variable-constrained referent}$
 string s4 = s( 1, 2 ); $\C{// snapshot state, strict symmetry, fragment referent}$
 string s5 = s4( 1, 1 )`share'; $\C{// alias state, strict symmetry, fragment referent}\CRT$
+string s5 = s4( 1, 1 )`share; $\C{// alias state, strict symmetry, fragment referent}\CRT$
 sout | s | s1 | s2 | s3 | s4 | s5;
 $\texttt{\small abcde abcde abcde abcde bc c}$
 …
 String sharing is expressed using the @`share@ marker to indicate aliasing (mutations shared) \vs snapshot (not quite an immutable result, but one with subsequent mutations isolated).
 This aliasing relationship is a sticky property established at initialization.
+This aliasing relationship is a \newterm{sticky property} established at initialization.
 For example, here strings @s1@ and @s1a@ are in an aliasing relationship, while @s2@ is in a copy relationship.
 \input{sharing1.tex}
 …
 When changes happen on an aliasing substring that overlap.
 \input{sharing10.tex}
 Strings @s1_crs@ and @s1_mid@ overlap at character 4, @j@, because the substrings are 3,2 and 4,2.
+Strings @s1_crs@ and @s1_mid@ overlap at character 4, @'j'@, because the substrings are 3,2 and 4,2.
 When @s1_crs@'s size increases by 1, @s1_mid@'s starting location moves from 4 to 5, but the overlapping character remains, changing to @'+'@.
 …
 Normally, one global context is appropriate for an entire program;
 concurrency is discussed in \VRef{s:ControllingImplicitSharing}.
 A string is a handle to a node in a linked list containing a information about a string text in the buffer.
+A string is a handle to a node in a linked list containing information about a string text in the buffer.
 The list is doubly linked for $O(1)$ insertion and removal at any location.
 Strings are ordered in the list by text start address.
 …
 The linked handles define all live strings in the buffer, which indirectly defines the allocated and free space in the buffer.
 The string handles are maintained in sorted order, so the handle list can be traversed, copying the first live text to the start of the buffer, and subsequent strings after each other.
 After compaction, if free storage is still be less than the new string allocation, a larger text buffer is heap-allocated, the current buffer is copied into the new buffer, and the original buffer is freed.
+After compaction, if free storage is still less than the new string allocation, a larger text buffer is heap-allocated, the current buffer is copied into the new buffer, and the original buffer is freed.
 Note, the list of string handles is structurally unaffected during a compaction;
 only the text pointers in the handles are modified to new buffer locations.
 …
 There are two fundamental string-creation functions: importing external text like a C-string or reading a string, and initialization from an existing \CFA string.
 When importing, storage comes from the end of the buffer, into which the text is copied.
 The new string handle is inserted at the end of the handle list because the new text is at the end of the buffer.
+To maintain sorted order, the new string handle is inserted at the end of the handle list because the new text is at the end of the buffer.
 When initializing from text already in the buffer, the new handle is a second reference into the original run of characters.
 In this case, the new handle's linked-list position is after the original handle.
+To maintain sorted order, the new handle's linked-list position is after the original handle.
 Both string initialization styles preserve the string module's internal invariant that the linked-list order matches the buffer order.
 For string destruction, handles are removed from the list.
 …
 Favourable conditions allow for in-place editing: where there is room for the resulting value in the original buffer location, and where all handles referring to the original buffer location see the new value.
 One notable example of favourable conditions occurs because the most recently written string is often the last in the buffer, after which a large amount of free space occurs.
 So, repeated appends often occur without copying previously accumulated characters.
+Now, repeated appends (reading) can occur without copying previously accumulated characters.
 However, the general case requires a new buffer allocation: where the new value does not fit in the old place, or if other handles are still using the old value.
 …
 Both \CC and \CFA RAII systems are powerful enough to achieve reference counting.
+In general, a lifecycle function has access to an object by location, \ie constructors and destructors receive a @this@ parameter providing an object's memory address.
+In general, a lifecycle function has access to an object by location, \ie constructors and destructors receive a parameter providing an object's memory address.\footnote{
+Historically in \CC, the type of \lstinline[language=C++]{this} is a pointer versus a reference;
+in \CFA, the first parameter of a constructor or destructor must be a reference.}
 \begin{cfa}
 struct S { int * ip; };
 void ?{}( S & @this@ ) { this.ip = new(); } $\C[3in]{// default constructor}$
 void ?{}( S & @this@, int i ) { (this){}; *this.ip = i; } $\C{// initializing constructor}$
 void ?{}( S & @this@, S s ) { (this){*s.ip}; } $\C{// copy constructor}$
+void ?{}( S & @this@, int i ) { this{}; *this.ip = i; } $\C{// initializing constructor}$
+void ?{}( S & @this@, S s ) { this{ *s.ip }; } $\C{// copy constructor}$
 void ^?{}( S & @this@ ) { delete( this.ip ); } $\C{// destructor}\CRT$
 \end{cfa}
 Such basic examples use the @this@ address only to gain access to the values being managed.
+But the lifecycle logic can use the pointer generally, too.
+For example, they can add @this@ object to a collection at creation and remove it at destruction.
+\begin{cfa}
+// header
+struct T {
+But lifecycle logic can use the address, too, \eg add the @this@ object to a collection at creation and remove it at destruction.
+\begin{cfa}
+// header (.hfa)
+struct N { $\C[3in]{// list node}$
         // private
         inline dlink(T);
+        inline dlink( N );
 };
 void ?{}( T & ); $\C[3in]{// default constructor}$
 void ^?{}( T & ); $\C{// destructor}\CRT$
 // implementation
 static dlist(T) @all_T@;
 void ?{}( T & this ) { insert_last(all_T, @this@) }
 void ^?{}( T & this ) { remove(this); }
 \end{cfa}
 A module providing the @T@ type can traverse @all_T@ at relevant times, to keep the objects ``good.''
 Hence, declaring a @T@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' during its lifetime.
+void ?{}( N & ); $\C{// default constructor}$
+void ^?{}( N & ); $\C{// destructor}\CRT$
+// implementation (.cfa)
+static dlist( N ) @list_N@;
+void ?{}( N & this ) { insert_last( list_N, @this@ ) }
+void ^?{}( N & this ) { remove( this ); }
+\end{cfa}
+A module providing the @N@ (node) type can traverse @list_N@ to manipulate the objects.
+Hence, declaring a @N@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' during its lifetime.
 Again, both \CFA and \CC support this usage style.
 …
 In the parameter direction, the language's function-call handling must arrange for a copy-constructor call to happen, at a time near the control transfer into the callee. %, with the source as the caller's (sender's) version and the target as the callee's (receiver's) version.
 In the return direction, the roles are reversed and the copy-constructor call happens near the return of control.
 \CC supports this capability.% without qualification.
+\CC supports this capability. % without qualification.
 \CFA offers limited support;
 simple examples work, but implicit copying does not combine successfully with the other RAII capabilities discussed.
+simple examples work, but implicit copying does not combine successfully with other RAII capabilities.
 \CC also offers move constructors and return-value optimization~\cite{RVO20}.
 …
 \begin{enumerate}
 \item
+\label{p:feature1}
         Object provider implements lifecycle functions to manage a resource outside of the object.
 \item
+        Object provider implements lifecycle functions to store references back to the object, often originating from outside of it.
+\label{p:feature2}
+        Object provider implements lifecycle functions to store references to the object, often originating from outside of it.
 \item
+\label{p:feature3}
         Object user expects to pass (in either direction) an object by value for function calls.
 \end{enumerate}
+\CC supports all three simultaneously.  \CFA does not currently support \#2 and \#3 on the same object, though \#1 works along with either one of \#2 or \#3.  \CFA needs to be fixed to support all three simultaneously.
+The reason that \CFA does not support \#2 with \#3 is a holdover from how \CFA function calls lowered to C, before \CFA got references and RAII.
+At that time, adhering to a principal of minimal intervention, this code could always be treated as passthrough:
+\CC supports all three simultaneously.
+\CFA does not currently support \ref{p:feature2} and \ref{p:feature3} on the same object, though \ref{p:feature1} works along with either one of \ref{p:feature2} or \ref{p:feature3}.
+\CFA needs to be fixed to support all three simultaneously.
+The reason \CFA does not support \ref{p:feature2} with \ref{p:feature3} is a holdover from how \CFA lowered function calls to C, before \CFA got references and RAII.
+At that time, adhering to a principal of minimal intervention, this code was treated as a passthrough:
 \begin{cfa}
 struct U { ... };
 // RAII to go here
 void f( U u ) { F_BODY(u) }
+void f( U u ) { F_BODY( u ) }
 U x;
 f( x );
 \end{cfa}
+But adding custom RAII (at ``...go here'') changes things.
+The common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present \CFA lowering.
+\begin{cquote}
+However, adding custom RAII (at ``...go here'') changes things.
+\VRef[Figure]{f:CodeLoweringRAII} shows the common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} (right) proceeds differently than the present \CFA lowering (left).
+The current \CFA scheme is still using a by-value C call.
+C does a @memcpy@ on structures passed by value.
+And so, @F_BODY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions.
+If @U@ is trying to have a style- \ref{p:feature2} invariant, it shows up broken in @F_BODY@: references supposedly to @u@ are actually to @__u_for_f@.
+The \CC scheme does not have this problem because it constructs the @u@ copy in the correct location within @f@.
+Yet, the current \CFA scheme is sufficient to deliver style-\ref{p:feature1} invariants (in this style-\ref{p:feature3} use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times.
+So, reference-counting or simple ownership applications get their invariants respected under call/return-by-value.
+\begin{figure}
+\centering
 \begin{tabular}{@{}l|l@{}}
+\begin{cfa}
+$\C[0.0in]{// \CC, \CFA future}\CRT$
+\multicolumn{1}{@{}c|}{\CFA today} & \multicolumn{1}{c@{}}{\CC, \CFA future} \\
+\begin{cfa}
+struct U {...};
+// RAII elided
+void f( U u ) {
+        F_BODY( u );
+}
+U x; // call default ctor
+{
+        @U __u_for_f = x;@  // call copy ctor
+        f( __u_for_f );
+        // call dtor, __u_for_f
+}
+// call dtor, x
+\end{cfa}
+&
+\begin{cfa}
 struct U {...};
 // RAII elided
 void f( U * __u_orig ) {
         U u = * __u_orig;  // call copy ctor
+        @U u = * __u_orig;@  // call copy ctor
         F_BODY( u );
         // call dtor, u
 …
 // call dtor, x
 \end{cfa}
+&
+\begin{cfa}
+$\C[0.0in]{// \CFA today}\CRT$
+struct U {...};
+// RAII elided
+void f( U u ) {
+        F_BODY( u );
+}
+U x; // call default ctor
+{
+        U __u_for_f = x;  // call copy ctor
+        f( __u_for_f );
+        // call dtor, __u_for_f
+}
+// call dtor, x
+\end{cfa}
+\end{tabular}
+\end{cquote}
+The current \CFA scheme is still using a by-value C call.
+C does a @memcpy@ on structures passed by value.
+And so, @F_BODY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions.
+If @U@ is trying to have a style-\#2 invariant, it shows up broken in @F_BODY@: references supposedly to @u@ are actually to @__u_for_f@.
+The \CC scheme does not have this problem because it constructs the for @f@ copy in the correct location within @f@.
+Yet, the current \CFA scheme is sufficient to deliver style-\#1 invariants (in this style-\#3 use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times.
+So, reference-counting or simple ownership applications get their invariants respected under call/return-by-value.
+\end{tabular}
+\caption{Code Lowering for RAII}
+\label{f:CodeLoweringRAII}
+\end{figure}
 % [Mike is not currently seeing how distinguishing initialization from assignment is relevant]
 …
 % The following discusses the consequences of this semantics with respect to lifetime management of \CFA strings.
 The string API offers style \#3's pass-by-value in, \eg in the return of @"a" + "b"@.
 Its implementation uses the style-\#2 invariant of the string handles being linked to each other, helping to achieve high performance.
+The string API offers style \ref{p:feature3}'s pass-by-value, \eg in the return of @"a" + "b"@.
+Its implementation uses the style-\ref{p:feature2} invariant of the string handles being linked to each other, helping to achieve high performance.
 Since these two RAII styles cannot coexist, a workaround splits the API into two layers: one that provides pass-by-value, built upon the other with inter-linked handles.
 The layer with pass-by-value incurs a performance penalty, while the layer without delivers the desired runtime performance.
 …
 Both APIs present the same features, up to return-by-value operations being unavailable in LL and implemented via the workaround in HL.
 The intention is for most future code to target HL.
+When the RAII issue is fixed, the full HL feature set will be achievable using the LL-style lifetime management.
+Then, HL will be removed;
+LL's type will be renamed @string@ and programs written for current HL will run faster.
+When the RAII issue is fixed, the full HL feature set is achievable using the LL-style lifetime management.
+Then, HL can be removed, LL's type renamed to @string@, and programs generated with the current HL will run faster.
 In the meantime, performance-critical sections of applications must use LL.
 Subsequent performance experiments \see{\VRef{s:PerformanceAssessment}} use the LL API when comparing \CFA to other languages.
 This measurement gives a fair estimate of the goal state for \CFA.
 A separate measure of the HL overhead is also included.
 hence, \VRef[Section]{string-general-impl} us describing the goal state for \CFA.
 In present state, the type @string_res@ replaces its mention of @string@ as inter-linked handle.
+Hence, \VRef[Section]{string-general-impl} is describing the goal state for \CFA.
+In present state, the internal type @string_res@ replaces @string@ for an inter-linked handle.
 To use LL, a programmer rewrites invocations using pass-by-value APIs into invocations where resourcing is more explicit.
 Many invocations are unaffected, notably assignment and comparison.
+Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases need revisions.
+\begin{cquote}
+\begin{tabular}{ll}
+\VRef[Figure]{f:HL_LL_Lowering} shows, of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only three cases need revisions.
+The actual HL workaround wraps @string@ as a pointer to a uniquely owned, heap-allocated @string_res@.
+This arrangement has @string@ using style-\ref{p:feature1} RAII, which is compatible with pass-by-value.
+\begin{figure}
+\centering
+\begin{tabular}{@{}ll@{}}
 HL & LL \\
 \hline
 …
 \begin{cfa}
 string s = "abcde";
 string s2 = s(2, 3); // s2 == "cde"
 s(2,3) = "x"; // s == "abx" && s2 == "cde"
+string s2 = s(2, 3);   // s2 == "cde"
+s(2,3) = "x";   // s == "abx" && s2 == "cde"
 \end{cfa}
+&
 \begin{cfa}
 string_res sr = "abcde";
 string_res sr2 = {sr, 2, 3}; // sr2 == "cde"
+string_res sr2 = {sr, 2, 3};   // sr2 == "cde"
 string_res sr_mid = { sr, 2, 3, SHARE };
 sr_mid = "x"; // sr == "abx" && sr2 == "cde"
+sr_mid = "x";   // sr == "abx" && sr2 == "cde"
 \end{cfa}
 \\
 …
 string s = "abcde";
 s[2] = "xxx";  // s == "abxxxde"
+s[2] = "xxx";    // s == "abxxxde"
 \end{cfa}
+&
 …
 string_res sr = "abcde";
 string_res sr_mid = { sr, 2, 1, SHARE };
+mid = "xxx"; // sr == "abxxxde"
+\end{cfa}
+\end{tabular}
+\end{cquote}
+The actual HL workaround is having @string@ wrap a pointer to a uniquely owned, heap-allocated @string_res@.  This arrangement has @string@ being style-\#1 RAII, which is compatible with pass-by-value.
+mid = "xxx";   // sr == "abxxxde"
+\end{cfa}
+\end{tabular}
+\caption{HL to LL Lowering}
+\label{f:HL_LL_Lowering}
+\end{figure}
 …
 It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as 8-bit characters.
 Again, locations for identification flags must be found and checked along the fast path to select the correct actions.
 Handling utf8 (variable length), is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters.
+Handling utf8 (variable length) is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters.
 Trying to use a secondary array of fixed-sized pointers/offsets to the characters is possible, but raises the question of storage management for the utf8 characters themselves.
 …
 I assessed the \CFA string library's speed and memory usage against strings in \CC STL.
+Overall, this analysis shows that adding support for the features shown earlier in the chapter comes at no substantial cost in the performance of features common to both APIs.
+Moreover, the results support the \CFA string's position as a high-level enabler of simplified text processing.
+STL makes its user think about memory management.
+Overall, this analysis shows that features common to both APIs comes at no substantial cost in the performance.
+Moreover, the comparison shows that \CFA's high-level string features simplify text processing because the STL requires users to think more about memory management.
 When the user does, and is successful, STL's performance can be very good.
 But when the user fails to think through the consequences of the STL representation, performance becomes poor.
+But if a user does understand the consequences of the STL representation, performance becomes poor.
 The \CFA string lets the user work at the level of just putting the right text into the right variables, with corresponding performance degradations reduced or eliminated.
 …
 These tests use a \emph{corpus} of strings.
 Their lengths are important; the specific characters occurring in them are immaterial.
+In a result graph, a corpus's mean string length is often the independent variable shown on the X axis.
+In a result graph, a corpus's mean string-length is often the independent variable on the x-axis.
 When a corpus contains strings of different lengths, the lengths are drawn from a lognormal distribution.
 Therefore, strings much longer than the mean occur less often and strings slightly shorter than the mean occur most often.
 …
 To ensure comparable results, a common memory allocator is used for \CFA and \CC.
 \CFA runs the llheap allocator~\cite{Zulfiqar22}, which is also plugged into \CC.
+The llheap allocator is significantly better than the standard @glibc@ allocator.
 The operations being measured take dozens of nanoseconds, so a succession of many invocations is run and timed as a group.
 …
 \VRef[Figure]{fig:string-graph-peq-cppemu} shows the resulting performance.
 The two fresh (solid) lines and the two reuse (dash) lines are identical, except for lengths $\le$10, where the \CC SSO has a 40\% average and minimally 24\% advantage.
+The two fresh (solid spline lines) and the two reuse (dash spline lines) are identical, except for lengths $\le$10, where the \CC SSO has a 40\% average and minimally 24\% advantage.
 The gap between the fresh and reuse lines is the removal of the dynamic memory allocates and reuse of prior storage, \eg 100M allocations for fresh \vs 100 allocations for reuse across all experiments.
 While allocation reduction is huge, data copying dominates the cost, so the lines are still reasonably close together.
 …
 In earlier experiments, the choice of \CFA API among HL and LL had no impact on the functionality being tested.
 Here, however, the @+@ operation, which returns its result by value, is only available in HL.
+The \CFA @+@ number was obtained by inlining the HL implementation of @+@, which is done using LL's @+=@, into the test harness, while omitting the HL-inherent extra dynamic allocation.  The HL-upon-LL @+@ implementation, is:
+\begin{cfa}
+struct string {
+        string_res * inner;  // RAII manages malloc/free, simple ownership
+The \CFA @+=@ is obtained by inlining the HL implementation of @+@, which is done using LL's @+=@, into the test harness, while omitting the HL-inherent extra dynamic allocation.  The HL-upon-LL @+@ implementation, is:
+\begin{cquote}
+\setlength{\tabcolsep}{20pt}
+\begin{tabular}{@{}ll@{}}
+\begin{cfa}
+struct string {   // simple ownership
+        string_res * inner;  // RAII manages malloc/free
 };
 void ?+=?( string & s, string s2 ) {
         (*s.inner) += (*s2.inner);
+}
+\end{cfa}
+&
+\begin{cfa}
 string @?+?@( string lhs, string rhs ) {
         string ret = lhs;
 …
         return ret;
+}
+\end{cfa}
+\end{cfa}
+\end{tabular}
+\end{cquote}
 This @+@ implementation is also the goal implementation of @+@ once the HL/LL workaround is no longer needed.  Inlining the induced LL steps into the test harness gives:
 \begin{cquote}
 …
 So again, \CFA helps users who just want to treat strings as values, and not think about the resource management under the covers.
 While not a design goal, and not graphed, \CFA in STL-emulation mode outperformed STL in this case.
 User-managed allocation reuse did not affect either implementation in this case; only ``fresh'' results are shown.
+\PAB{Something is wrong with these sentences: While not a design goal, and not graphed, \CFA in STL-emulation mode outperformed STL in this case.
+User-managed allocation reuse did not affect either implementation in this case; only ``fresh'' results are shown.}
 …
 \end{cfa}
 With implicit sharing active, \CFA treats this operation as normal and supported.
+Again, an HL-LL difference requires an LL mockup.  This time, the fact to integrate into the test harness is that LL does not directly support pass-by-value.
+Again, an HL-LL difference requires a mockup as LL does not directly support pass-by-value.
 \begin{cquote}
 \setlength{\tabcolsep}{20pt}
 …
 The goal (HL) version gives the modified test harness, with a single loop.
 Each iteration uses a corpus item as the argument to the function call.
 These corpus items were imported to the string heap before beginning the timed run.
+These corpus items are imported to the string heap before beginning the timed run.
 \begin{figure}
 …
 \VRef[Figure]{fig:string-graph-pbv} shows the costs for calling a function that receives a string argument by value.
+STL's performance worsens uniformly as string length increases, while \CFA has the same performance at all sizes.
+Although the STL is better than \CFA until string length 10 because of the SSO.
+STL's performance worsens uniformly as string length increases, except for short strings due to SSO, while \CFA has the same performance at all sizes.
 While improved, the \CFA cost to pass a string is still nontrivial.
 The contributor is adding and removing the callee's string handle from the global list.

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: