Changeset e78d969 for doc


Ignore:
Timestamp:
Apr 5, 2026, 9:24:11 AM (24 hours ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
bb9897c
Parents:
c1f17aa
Message:

additional updates in string chapter

Location:
doc/theses/mike_brooks_MMath
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/plots/string-pbv.gp

    rc1f17aa re78d969  
    1 set terminal pdf color enhanced size 6.0in,3.0in font "Times,17"
     1set terminal pdf color enhanced size 6.5in,3.0in font "Times,17"
    22#set terminal postscript portrait enhanced size 7.5, 10. color solid 9.5;
    33#set terminal wxt size 950,1250
  • doc/theses/mike_brooks_MMath/string.tex

    rc1f17aa re78d969  
    4747open( fname.@c_str()@, O_RDONLY );              // null terminated value of string
    4848\end{cfa}
    49 Here, the \CC @c_str@ function does not create a new null-terminated C string from the \CC string, as that requires passing ownership of the C string to the caller for eventual deletion.\footnote{
     49Here, the \CC @c_str@ member does not create a new null-terminated C string from the \CC string, as that requires passing ownership of the C string to the caller for eventual deletion.\footnote{
    5050C functions like \lstinline{strdup} do return allocated storage that must be freed by the caller.}
    5151% Instead, each \CC string is null terminated just in case it might be needed for this purpose.
     
    9494s = 'x';
    9595s = "abc";
    96 s = 42hh;               /* signed char */
    97 s = 42h;                /* short int */
     96s = 42hh;               // signed char
     97s = 42h;                // short int
    9898s = 0xff;
    9999\end{cfa}
     
    134134Conversions from @string@ to @char *@ attempt to be safe.
    135135The overloaded @strncpy@ function is safe, if the length of the C string is correct.
    136 The assignment operator and constructor both allocate the buffer and return its address, meaning the programmer must free it.
     136The assignment operator and constructor both allocate storage and return its address, meaning the programmer must free it.
    137137Note, a C string is always null terminated, implying storage is always necessary for the null.
    138138\begin{cquote}
     
    284284
    285285Interestingly, \CC cannot support this generality because it does not use the left-hand side of assignment in expression resolution.
    286 While it can special case some combinations:
     286While it can special-case some combinations:
    287287\begin{c++}
    288288s = 'a' + s; $\C[2in]{// compiles in C++}$
     
    299299
    300300The binary operators @*@ and @*=@ repeat a string $N$ times.
    301 If $N = 0$, a zero length string, @""@, is returned.
     301If $N = 0$, a zero-length string results, @""@.
    302302\begin{cquote}
    303303\begin{tabular}{@{}ll@{}}
     
    4604603
    4614614
    462 10
     46210   // not found, string length
    463463\end{cfa}
    464464\end{tabular}
     
    470470\begin{cfa}
    471471charclass vowels{ "aeiouy" };
     472i = include( "aabiuyoo", vowels );
    472473i = include( "aaeiuyoo", vowels );
    473 i = include( "aabiuyoo", vowels );
    474 \end{cfa}
    475 &
    476 \begin{cfa}
    477 
    478 8  // compliant
     474\end{cfa}
     475&
     476\begin{cfa}
     477
    4794782  // b non-compliant
     4798  // compliant, string length
    480480\end{cfa}
    481481\end{tabular}
    482482\end{cquote}
    483483@vowels@ defines a character class and function @include@ checks if all characters in the string appear in the class (compliance).
    484 The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
     484The position of the first non-compliant character is returned or the position of the last character if the string is compliant.
    485485There is no relationship between the order of characters in the two strings.
    486486Function @exclude@ is the reverse of @include@, checking if all characters in the string are excluded from the class (compliance).
     
    493493&
    494494\begin{cfa}
    495 8  // compliant
     4958  // compliant, string length
    4964962  // y non-compliant
    497497\end{cfa}
     
    502502\begin{tabular}{@{}ll@{}}
    503503\begin{cfa}
     504s = include( "aabiuyoo", vowels );
    504505s = include( "aaeiuyoo", vowels );
    505 s = include( "aabiuyoo", vowels );
    506506s = exclude( "cdbfghmk", vowels );
    507507s = exclude( "cdyfghmk", vowels );
     
    534534\end{cquote}
    535535These operations perform an \emph{apply} of the validation function to each character, where the function returns a boolean indicating a stopping condition for the search.
    536 The position of the last character is returned if the string is compliant or the position of the first non-compliant character.
     536The position of the first non-compliant character is returned or the position of the last character if the string is compliant (and vice versa for @exclude@).
    537537
    538538The translate operation returns a string with each character transformed by one of the C character transformation functions.
     
    10281028When @s1_bgn@'s size increases by 3, @s1_mid@'s starting location moves from 1 to 4 and @s1_end@'s from 3 to 6,
    10291029
    1030 When changes happen on an aliasing substring that overlap.
     1030Changes can happen on an aliasing substring that overlaps.
    10311031\input{sharing10.tex}
    10321032Strings @s1_crs@ and @s1_mid@ overlap at character 4, @'j'@, because the substrings are 3,2 and 4,2.
    10331033When @s1_crs@'s size increases by 1, @s1_mid@'s starting location moves from 4 to 5, but the overlapping character remains, changing to @'+'@.
    10341034
    1035 \PAB{TODO: finish typesetting the demo}
    1036 
    1037 %\input{sharing-demo.tex}
    1038 
    1039 \VRef[Figure]{f:ParameterPassing} shows similar relationships when passing the results of substring operations by reference and by value to a subprogram.
    1040 Again, notice the side-effects to other reference parameters as one is modified.
    1041 
    10421035\begin{figure}
    10431036\begin{cfa}
     
    10531046        a( 0, 2 ) = "aaa";
    10541047        b( 1, 12 ) = "bbb";
    1055         c( 4, 5 ) = "ccc";
     1048        c( 3, 5 ) = "ccc";
    10561049        c = "yyy";
    10571050        d( 0, 3 ) = "ddd";
     
    10641057\setlength{\extrarowheight}{-0.5pt}
    10651058\begin{tabular}{@{}llllll@{}}
    1066 x                                       & a             & b             & c             & d             & e             \\
    1067 @"aaaxxxxxxxxx"@        & @"aaax"@      & @"xxx"@       & @"xxxxx"@     & @"xxx"@       & @"xxx"@       \\
    1068 @"aaaxbbbxxxxxx"@       & @"aaax"@      & @"xbbb"@      & @"xxxx"@      & @"xxx"@       & @"xxx"@       \\
    1069 @"aaaxbbbxxxcccxx"@     & @"aaax"@      & @"xbbb"@      & @"xxxccc"@& @"cccxx"@ & @"xxx"@       \\
    1070 @"aaaxbbbyyyxx"@        & @"aaax"@      & @"aaab"@      & @"yyy"@       & @"xx"@        & @"xxx"@       \\
    1071 @"aaaxbbbyyyddd"@       & @"aaax"@      & @"xbbb"@      & @"yyy"@       & @"ddd"@       & @"xxx"@       \\
    1072 @"aaaxbbbyyyddd"@       & @"aaax"@      & @"xbbb"@      & @"yyy"@       & @"ddd"@       & @"eee"@       \\
    1073 @"eee"@                         & @""@  & @""@  & @""@          & @"eee"@ \\
    1074  & \\
     1059x                                       & a                     & b                     & c                     & d                     & e                     \\
     1060@"ssttttuuuww"@         & @"sst"@       & @"ttt"@       & @"ttuuu"@     & @"uww"@       & @"uww"@       \\
     1061@"aaattttuuuww"@        & @"aaat"@      & @"ttt"@       & @"ttuuu"@     & @"uww"@       & @"uww"@       \\
     1062@"aaatbbbtuuuww"@       & @"aaat"@      & @"tbbb"@      & @"tuuu"@      & @"uww"@       & @"uww"@       \\
     1063@"aaatbbbtuucccww"@     & @"aaat"@      & @"tbbb"@      & @"tuuccc"@& @"cccww"@ & @"uww"@       \\
     1064@"aaatbbbyyyww"@        & @"aaat"@      & @"tbbb"@      & @"yyy"@       & @"ww"@        & @"uww"@       \\
     1065@"aaatbbbyyyddd"@       & @"aaat"@      & @"tbbb"@      & @"yyy"@       & @"ddd"@       & @"uww"@       \\
     1066@"aaatbbbyyyddd"@       & @"aaat"@      & @"tbbb"@      & @"yyy"@       & @"ddd"@       & @"uww"@       \\
     1067@"eee"@                         & @""@          & @""@          & @""@          & @""@          & @"eee"@       \\
     1068 &
    10751069\end{tabular}
    10761070\end{tabular}
     
    10861080\end{figure}
    10871081
     1082\begin{figure}
     1083\vspace*{-5pt}
     1084\setlength{\tabcolsep}{1.5pt}
     1085\setlength{\columnsep}{60pt}
     1086\setlength{\columnseprule}{0.2pt}
     1087\setlength{\extrarowheight}{-1pt}
     1088
     1089\begin{multicols}{2}
     1090\sf
     1091\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1092\begin{tabular}{@{}*{12}{c}@{}}
     1093\multicolumn{12}{@{}l@{}}{function entry} \\
     1094x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\
     1095  & {\tt s}  & {\tt s} & {\tt t} & {\tt t} & {\tt t} & {\tt t} & {\tt u} & {\tt u} & {\tt u} & {\tt w} & {\tt w} \\
     1096a & 0 & 1 & 2 \\
     1097  & {\tt s} & {\tt s} & {\tt t} \\
     1098b &   &   & 0 & 1 & 2 \\
     1099  &   &   & {\tt t} & {\tt t} & {\tt t} \\
     1100c &   &   &   &   & 0 & 1 & 2 & 3 & 4 \\
     1101  &   &   &   &   & {\tt t} & {\tt t} & {\tt u} & {\tt u} & {\tt u} \\
     1102d &   &   &   &   &   &   &   &   & 0 & 1 & 2 \\
     1103  &   &   &   &   &   &   &   &   & {\tt u} & {\tt w} & {\tt w}
     1104\end{tabular}
     1105&
     1106\begin{tabular}{@{}*{5}{c}@{}}
     1107 & \\
     1108 & e & 0 & 1 & 2 \\
     1109 &   & {\tt u} & {\tt w} & {\tt w} \\
     1110 &   &   &   &   \\
     1111 &   &   &   &   \\
     1112 &   &   &   &   \\
     1113 &   &   &   &   \\
     1114 &   &   &   &   \\
     1115 &   &   &   &   \\
     1116 &   &   &   &   \\
     1117 &   &   &   &
     1118\end{tabular}
     1119\end{tabular}
     1120
     1121\bigskip
     1122@a( 0, 2 ) = "aaa";@ \\
     1123\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1124\begin{tabular}{@{}*{13}{c}@{}}
     1125x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\
     1126  & {\tt a} & {\tt a} & {\tt a} & {\tt t} & {\tt t} & {\tt t} & {\tt t} & {\tt u} & {\tt u} & {\tt u} & {\tt w}  & {\tt w} \\
     1127a & 0 & 1 & 2 & 3 \\
     1128  & {\tt a} & {\tt a} & {\tt a} & {\tt t} \\
     1129b &   &   &   & 0 & 1 & 2 \\
     1130  &   &   &   & {\tt t} & {\tt t} & {\tt t} \\
     1131c &   &   &   &   &   & 0 & 1 & 2 & 3 & 4 \\
     1132  &   &   &   &   &   & {\tt t} & {\tt t} & {\tt u} & {\tt u} & {\tt u} \\
     1133d &   &   &   &   &   &   &   &   &   & 0 & 1 & 2 \\
     1134  &   &   &   &   &   &   &   &   &   & {\tt u} & {\tt w} & {\tt w}
     1135\end{tabular}
     1136&
     1137\begin{tabular}{@{}*{5}{c}@{}}
     1138 & e & 0 & 1 & 2 \\
     1139 &   & {\tt u} & {\tt w} & {\tt w} \\
     1140 &   &   &   &   \\
     1141 &   &   &   &   \\
     1142 &   &   &   &   \\
     1143 &   &   &   &   \\
     1144 &   &   &   &   \\
     1145 &   &   &   &   \\
     1146 &   &   &   &   \\
     1147 &   &   &   &
     1148\end{tabular}
     1149\end{tabular}
     1150
     1151\bigskip
     1152
     1153@b( 1, 12 ) = "bbb";@ \\
     1154\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1155\begin{tabular}{@{}*{14}{c}@{}}
     1156x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12\\
     1157  & {\tt a} & {\tt a} & {\tt a} & {\tt t} & {\tt b} & {\tt b} & {\tt b} & {\tt t} & {\tt u} & {\tt u} & {\tt u}  & {\tt w}  & {\tt w} \\
     1158a & 0 & 1 & 2 & 3 \\
     1159  & {\tt a} & {\tt a} & {\tt a} & {\tt t} \\
     1160b &   &   &   & 0 & 1 & 2 & 3 \\
     1161  &   &   &   & {\tt t} & {\tt b} & {\tt b} & {\tt b} \\
     1162c &   &   &   &   &   &   &   & 0 & 1 & 2 & 3 \\
     1163  &   &   &   &   &   &   &   & {\tt t} & {\tt u} & {\tt u} & {\tt u} \\
     1164d &   &   &   &   &   &   &   &   &   &   & 0 & 1 & 2 \\
     1165  &   &   &   &   &   &   &   &   &   &   & {\tt u} & {\tt w} & {\tt w}
     1166\end{tabular}
     1167&
     1168\begin{tabular}{@{}*{5}{c}@{}}
     1169 & e & 0 & 1 & 2 \\
     1170 &   & {\tt u} & {\tt w} & {\tt w} \\
     1171 &   &   &   &   \\
     1172 &   &   &   &   \\
     1173 &   &   &   &   \\
     1174 &   &   &   &   \\
     1175 &   &   &   &   \\
     1176 &   &   &   &   \\
     1177 &   &   &   &   \\
     1178 &   &   &   &
     1179\end{tabular}
     1180\end{tabular}
     1181
     1182\bigskip
     1183
     1184@c( 3, 5 ) = "ccc";@ \\
     1185\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1186\begin{tabular}{@{}*{16}{c}@{}}
     1187x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 \\
     1188  & {\tt a} & {\tt a} & {\tt a} & {\tt t} & {\tt b} & {\tt b} & {\tt b} & {\tt t} & {\tt u} & {\tt u} & {\tt c} & {\tt c} & {\tt c}  & {\tt w}  & {\tt w} \\
     1189a & 0 & 1 & 2 & 3 \\
     1190  & {\tt a} & {\tt a} & {\tt a} & {\tt t} \\
     1191b &   &   &   & 0 & 1 & 2 & 3 \\
     1192  &   &   &   & {\tt t} & {\tt b} & {\tt b} & {\tt b} \\
     1193c &   &   &   &   &   &   &   & 0 & 1 & 2 & 3  & 4  & 5 \\
     1194  &   &   &   &   &   &   &   & {\tt t} & {\tt u} & {\tt u} & {\tt c} & {\tt c} & {\tt c} \\
     1195d &   &   &   &   &   &   &   &   &   &   & 0  & 1  & 2  & 3  & 4 \\
     1196  &   &   &   &   &   &   &   &   &   &   & {\tt c}  & {\tt c} & {\tt c} & {\tt w} & {\tt w}
     1197\end{tabular}
     1198&
     1199\begin{tabular}{@{}*{5}{c}@{}}
     1200 & e & 0 & 1 & 2 \\
     1201 &   & {\tt u} & {\tt w} & {\tt w} \\
     1202 &   &   &   &   \\
     1203 &   &   &   &   \\
     1204 &   &   &   &   \\
     1205 &   &   &   &   \\
     1206 &   &   &   &   \\
     1207 &   &   &   &   \\
     1208 &   &   &   &   \\
     1209 &   &   &   &
     1210\end{tabular}
     1211\end{tabular}
     1212
     1213@c = "yyy";@ \\
     1214\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1215\begin{tabular}{@{}*{13}{c}@{}}
     1216x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\
     1217  & {\tt a} & {\tt a} & {\tt a} & {\tt t} & {\tt b} & {\tt b} & {\tt b} & {\tt y} & {\tt y} & {\tt y} & {\tt w}  & {\tt w} \\
     1218a & 0 & 1 & 2 & 3 \\
     1219  & {\tt a} & {\tt a} & {\tt a} & {\tt t} \\
     1220b &   &   &   & 0 & 1 & 2 & 3 \\
     1221  &   &   &   & {\tt t} & {\tt b} & {\tt b} & {\tt b} \\
     1222c &   &   &   &   &   &   &   & 0 & 1 & 2 \\
     1223  &   &   &   &   &   &   &   & {\tt y} & {\tt y} & {\tt y} \\
     1224d &   &   &   &   &   &   &   &   &   &   & 0  & 1 \\
     1225  &   &   &   &   &   &   &   &   &   &   & {\tt w}  & {\tt w}
     1226\end{tabular}
     1227&
     1228\begin{tabular}{@{}*{5}{c}@{}}
     1229 & e & 0 & 1 & 2 \\
     1230 &   & {\tt u} & {\tt w} & {\tt w} \\
     1231 &   &   &   &   \\
     1232 &   &   &   &   \\
     1233 &   &   &   &   \\
     1234 &   &   &   &   \\
     1235 &   &   &   &   \\
     1236 &   &   &   &   \\
     1237 &   &   &   &   \\
     1238 &   &   &   &
     1239\end{tabular}
     1240\end{tabular}
     1241
     1242\bigskip
     1243@d( 0, 3 ) = "ddd";@ \\
     1244\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1245\begin{tabular}{@{}*{14}{c}@{}}
     1246x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\
     1247  & {\tt a} & {\tt a} & {\tt a} & {\tt t} & {\tt b} & {\tt b} & {\tt b} & {\tt y} & {\tt y} & {\tt y} & {\tt d}  & {\tt d}  & {\tt d} \\
     1248a & 0 & 1 & 2 & 3 \\
     1249  & {\tt a} & {\tt a} & {\tt a} & {\tt t} \\
     1250b &   &   &   & 0 & 1 & 2 & 3 \\
     1251  &   &   &   & {\tt t} & {\tt b} & {\tt b} & {\tt b} \\
     1252c &   &   &   &   &   &   &   & 0 & 1 & 2 \\
     1253  &   &   &   &   &   &   &   & {\tt y} & {\tt y} & {\tt y} \\
     1254d &   &   &   &   &   &   &   &   &   &   & 0  & 1 & 2 \\
     1255  &   &   &   &   &   &   &   &   &   &   & {\tt d}  & {\tt d} & {\tt d}
     1256\end{tabular}
     1257&
     1258\begin{tabular}{@{}*{5}{c}@{}}
     1259 & e & 0 & 1 & 2 \\
     1260 &   & {\tt u} & {\tt w} & {\tt w} \\
     1261 &   &   &   &   \\
     1262 &   &   &   &   \\
     1263 &   &   &   &   \\
     1264 &   &   &   &   \\
     1265 &   &   &   &   \\
     1266 &   &   &   &   \\
     1267 &   &   &   &   \\
     1268 &   &   &   &
     1269\end{tabular}
     1270\end{tabular}
     1271
     1272\bigskip
     1273
     1274@e( 0, 3 ) = "eee";@ \\
     1275\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1276\begin{tabular}{@{}*{14}{c}@{}}
     1277x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\
     1278  & {\tt a} & {\tt a} & {\tt a} & {\tt t} & {\tt b} & {\tt b} & {\tt b} & {\tt y} & {\tt y} & {\tt y} & {\tt d}  & {\tt d}  & {\tt d} \\
     1279a & 0 & 1 & 2 & 3 \\
     1280  & {\tt a} & {\tt a} & {\tt a} & {\tt t} \\
     1281b &   &   &   & 0 & 1 & 2 & 3 \\
     1282  &   &   &   & {\tt t} & {\tt b} & {\tt b} & {\tt b} \\
     1283c &   &   &   &   &   &   &   & 0 & 1 & 2 \\
     1284  &   &   &   &   &   &   &   & {\tt y} & {\tt y} & {\tt y} \\
     1285d &   &   &   &   &   &   &   &   &   &   & 0  & 1 & 2 \\
     1286  &   &   &   &   &   &   &   &   &   &   & {\tt d}  & {\tt d} & {\tt d}
     1287\end{tabular}
     1288&
     1289\begin{tabular}{@{}*{5}{c}@{}}
     1290 & e & 0 & 1 & 2 \\
     1291 &   & {\tt e} & {\tt e} & {\tt e} \\
     1292 &   &   &   &   \\
     1293 &   &   &   &   \\
     1294 &   &   &   &   \\
     1295 &   &   &   &   \\
     1296 &   &   &   &   \\
     1297 &   &   &   &   \\
     1298 &   &   &   &   \\
     1299 &   &   &   &
     1300\end{tabular}
     1301\end{tabular}
     1302
     1303\bigskip
     1304
     1305@x = e;@ \\
     1306\begin{tabular}{@{}l@{\hspace{10pt}}l@{}}
     1307\begin{tabular}{@{}*{16}{c}@{}}
     1308x & 0 & 1 & 2 \\
     1309  & {\tt e} & {\tt e} & {\tt e} \\
     1310a & \\
     1311  & "" \\
     1312b & \\
     1313  & "" \\
     1314c & \\
     1315  & "" \\
     1316d & \\
     1317  & ""
     1318\end{tabular}
     1319&
     1320\begin{tabular}{@{}*{5}{c}@{}}
     1321 & e & 0 & 1 & 2 \\
     1322 &   & {\tt e} & {\tt e} & {\tt e} \\
     1323 &   &   &   &   \\
     1324 &   &   &   &   \\
     1325 &   &   &   &   \\
     1326 &   &   &   &   \\
     1327 &   &   &   &   \\
     1328 &   &   &   &   \\
     1329 &   &   &   &   \\
     1330 &   &   &   &
     1331\end{tabular}
     1332\end{tabular}
     1333\end{multicols}
     1334
     1335\caption{Execution of Function \lstinline{test}}
     1336\label{f:ParametersPassingResults}
     1337\end{figure}
     1338
     1339\VRef[Figure]{f:ParameterPassing} shows similar relationships when passing the results of substring operations by reference and by value to a subprogram.
     1340Because this example is complex, \VRef[Figure]{f:ParametersPassingResults} shows line-by-line changes to the reference parameters @x@, @a@, @b@, @c@, @d@, and value parameter @e@ as function @test@ executes (read top to bottom, left to right).
     1341The assignment to variable @a@ changes its substring @"ss"@ to @"aaa"@ increasing @a@ to @"aaat"@ (3 to 4 characters), increasing @x@ from 11 to 12 characters, and pushing @b@, @c@ and @d@ right one character in @x@ because there is no overlap with the changed characters.
     1342Similarly, the assignment to variable @b@ changes its substring @"tt"@ to @"bbb"@ increasing @b@ to @"tbbb"@ (3 to 4 characters), increasing @x@ from 12 to 13 characters, and pushing @c@ and @d@ right one character in @x@ because there is no overlap with the changed characters.
     1343The assignment to variable @c@ changes its substring @"u"@ (at the end) to @"ccc"@ increasing @c@ to @"tuuccc"@ (4 to 6 characters), increasing @x@ from 13 to 15 characters, and @d@ remains in the same position because it overlaps with the changed characters and its @"u"@ grows to @"ccc"@ giving @"cccww"@.
     1344The second assignment to variable @c@ changes its substring @"tuuccc"@ to @"yyy"@ decreasing @c@ to @"yyy"@ (6 to 3 characters), decreasing @x@ from 15 to 12 characters, and the 3 character overlapping with @d@ are removed leaving @"ww"@.
     1345The assignment to variable @d@ changes its substring @"ww"@ to @"ddd"@ increasing @d@ from 2 to 3 characters, increasing @x@ from 11 to 12 characters.
     1346The assignment to variable @e@ changes its substring @"uww"@ to @"eee"@ with no change in length.
     1347Finally, the assignment to @x@ sets it pointing at @e@, and all prior overlapping strings with @x@ become empty (null) strings.
     1348
    10881349
    10891350\section{Storage Management}
    10901351
    1091 This section discusses issues related to storage management of strings.
    1092 Specifically, it is common for strings to logically overlap partially or completely.
    1093 \begin{cfa}
    1094 string s1 = "abcdef";
    1095 string s2 = s1; $\C{// complete overlap, s2 == "abcdef"}$
    1096 string s3 = s1.substr( 0, 3 ); $\C{// partial overlap, s3 == "abc"}$
    1097 \end{cfa}
    1098 This raises the question of how strings behave when an overlapping component is changed,
    1099 \begin{cfa}
    1100 s3[1] = 'w'; $\C{// what happens to s1 and s2?}$
    1101 \end{cfa}
    1102 which is restricted by a string's mutable or immutable property.
    1103 For example, Java's immutable strings require copy-on-write when any overlapping string changes.
    1104 Note, the notion of underlying string mutability is not specified by @const@; \eg in \CC:
    1105 \begin{cfa}
    1106 const string s1 = "abc";
    1107 \end{cfa}
    1108 @const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage.
    1109 Hence, @s1@ is not pointing at an immutable constant and its underlying string is mutable, unless some other designation is specified, such as Java's global immutable rule.
     1352This section discusses the implementation of the prior string semantics, specifically, when strings overlap partially or completely.
     1353% First, the issue of a string's mutable or immutable property is clarified.
     1354% Java's immutable strings require copy-on-write when any overlapping string changes.
     1355% However, \CFA/\CC strings are always mutability, even if specified with @const@.
     1356% \begin{cfa}
     1357% const string s1 = "abc";
     1358% \end{cfa}
     1359% Here, @const@ applies to the implicit @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage.
     1360% Hence, @s1@ is never pointing at an immutable constant and its underlying string is always mutable, without some addition mechanism as for explicit pointers/references.
     1361% \begin{cfa}
     1362% const int * const cip;  // pointer and its referent are const
     1363% const string s1 @const@ = "abc"; // second const applies to implicit referent
     1364% \end{cfa}
    11101365
    11111366
     
    11131368\label{string-general-impl}
    11141369
    1115 A centrepiece of the string module is its memory manager.
     1370The centrepiece of the string module is its memory manager.
    11161371The management scheme defines a shared buffer for string text.
    11171372Allocation in this buffer is always via a bump-pointer;
     
    11361391Normally, one global context is appropriate for an entire program;
    11371392concurrency is discussed in \VRef{s:ControllingImplicitSharing}.
    1138 A string is a handle to a node in a linked list containing information about a string text in the buffer.
     1393A string is a handle to a node in a linked list containing information about string text in the buffer.
    11391394The list is doubly linked for $O(1)$ insertion and removal at any location.
    11401395Strings are ordered in the list by text start address.
     
    11681423The resulting handle is then placed in the correct sorted position in the list, possible requiring a short linear search to locate the position.
    11691424For string operations resulting in a new string, that string is allocated at the end of the buffer.
    1170 For shared-edit strings, handles that originally referenced containing locations need to see the new value at the new buffer location.
     1425For shared-edit strings, handles that originally reference containing locations need to see the new value at the new buffer location.
    11711426These strings are moved to appropriate locations at the end of the list \see{details in \VRef{sharing-impl}}.
    11721427For nonshared-edit strings, a containing string can be moved and the other strings can remain in the same position.
     
    12051460\end{cfa}
    12061461Such basic examples use the @this@ address only to gain access to the values being managed.
    1207 But lifecycle logic can use the address, too, \eg add the @this@ object to a collection at creation and remove it at destruction.
    1208 \begin{cfa}
    1209 // header (.hfa)
    1210 struct N { $\C[3in]{// list node}$
     1462But lifecycle logic can use the \emph{address}, too, \eg add the @this@ object to a collection at creation and remove it at destruction.
     1463\begin{cfa}
     1464struct Thread { $\C[3in]{// list node}$
    12111465        // private
    1212         inline dlink( N );
     1466        inline dlink( Thread );
    12131467};
    1214 void ?{}( N & ); $\C{// default constructor}$
    1215 void ^?{}( N & ); $\C{// destructor}\CRT$
    1216 // implementation (.cfa)
    1217 static dlist( N ) @list_N@;
    1218 void ?{}( N & this ) { insert_last( list_N, @this@ ) }
    1219 void ^?{}( N & this ) { remove( this ); }
    1220 \end{cfa}
    1221 A module providing the @N@ (node) type can traverse @list_N@ to manipulate the objects.
    1222 Hence, declaring a @N@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' during its lifetime.
     1468static dlist( Thread ) @ready_queue@;
     1469void ?{}( Thread & this ) { /* initialize thread */  insert_last( ready_queue, @this@ ); }
     1470void ^?{}( Thread & this ) { remove( @this@ );  /* de-initialize thread */ } $\C{// implicitly use ready\_queue}\CRT$
     1471\end{cfa}
     1472A module providing the @Thread@ (node) type can traverse @ready_queue@ to manipulate the objects.
     1473Hence, declaring a @Thread@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service (scheduler) that keeps the value ``good'' during its lifetime.
    12231474Again, both \CFA and \CC support this usage style.
    12241475
    1225 A third capability concerns \emph{implicitly} requested copies.
    1226 When stack-allocated objects are used as parameter and return values, a sender's version exists in one stack frame and a receiver's version exists in another.
    1227 In the parameter direction, the language's function-call handling must arrange for a copy-constructor call to happen, at a time near the control transfer into the callee. %, with the source as the caller's (sender's) version and the target as the callee's (receiver's) version.
    1228 In the return direction, the roles are reversed and the copy-constructor call happens near the return of control.
     1476A third capability concerns \emph{implicit} copying.
     1477For passing and returning by value in function call, the parameters and return results do not require construction because after allocation the argument or return value is copied into the corresponding object.
     1478In the parameter direction, the language's function-call arranges copy-constructor calls at the control transfer into the callee.
     1479In the return direction, the roles are reversed and the copy-constructor calls happen at the return to the caller.
    12291480\CC supports this capability. % without qualification.
    12301481\CFA offers limited support;
     
    12321483
    12331484\CC also offers move constructors and return-value optimization~\cite{RVO20}.
    1234 These features help reduce unhelpful copy-constructor calls, which, for types like the @S@ example, would lead to extra memory allocations.
    1235 \CFA does not currently have these features; adding similarly-intended features to \CFA is desirable.
     1485These features help reduce unhelpful copy-constructor calls, which, for types like the initial @S@ example, lead to extra memory allocations.
     1486\CFA does not currently have these features;
     1487adding similarly-intended features to \CFA is desirable.
    12361488However, this section is about a problem in the realization of features that \CFA already supports.
    1237 Hence, the comparison continues with the classic version of \CC that treated such copy-constructor calls as necessary.
     1489Hence, the comparison continues with the classic version of \CC that treats such copy-constructor calls as necessary.
    12381490
    12391491To summarize the unsupported combinations, the relevant features are:
     
    12571509\begin{cfa}
    12581510struct U { ... };
    1259 // RAII to go here
    1260 void f( U u ) { F_BODY( u ) }
     1511// add RAII ctor/dtor function
     1512void f( U u ) { ... /* access fields of u */ ... }
    12611513U x;
    12621514f( x );
    12631515\end{cfa}
    1264 However, adding custom RAII (at ``...go here'') changes things.
     1516However, adding RAII constructors/destructor changes things.
    12651517
    12661518\VRef[Figure]{f:CodeLoweringRAII} shows the common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} (right) proceeds differently than the present \CFA lowering (left).
    12671519The current \CFA scheme is still using a by-value C call.
    12681520C does a @memcpy@ on structures passed by value.
    1269 And so, @F_BODY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions.
    1270 If @U@ is trying to have a style- \ref{p:feature2} invariant, it shows up broken in @F_BODY@: references supposedly to @u@ are actually to @__u_for_f@.
     1521And so, the body of @f@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions.
     1522If @U@ is trying to have a style-\ref{p:feature2} invariant, it shows up broken in @f@: references supposedly to @u@ are actually to @__u_for_f@.
    12711523The \CC scheme does not have this problem because it constructs the @u@ copy in the correct location within @f@.
    12721524Yet, the current \CFA scheme is sufficient to deliver style-\ref{p:feature1} invariants (in this style-\ref{p:feature3} use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times.
     
    12821534void f( U u ) {
    12831535
    1284         F_BODY( u );
     1536        ... /* access fields of u */ ...
    12851537
    12861538}
     
    12991551void f( U * __u_orig ) {
    13001552        @U u = * __u_orig;@  // call copy ctor
    1301         F_BODY( u );
     1553        ... /* access fields of u */ ...
    13021554        // call dtor, u
    13031555}
     
    14471699
    14481700A further abstraction helps distinguish the two senses of sharing.
    1449 A share-edit set (SES) is an equivalence class over string handles, being the reflexive, symmetric and transitive closure of the relationship of one string being constructed from another, with the ``share'' option given.
     1701A share-edit set (SES) is an equivalence class over string handles, being the reflexive, symmetric and transitive closure of the relationship of one string being constructed from another, when the ``share'' option is given.
    14501702The SES is represented by a second linked list among the handles.
    1451 A string that shares edits with no other is in a SES by itself.
    1452 Inside a SES, a logical modification of one substring portion may change the logical value in another substring portion, depending on whether the two actually overlap.
    1453 Conversely, no logical value change can flow outside of a SES.
    1454 Even if a modification on one string handle does not reveal itself \emph{logically} to anther handle in the same SES (because they do not overlap), if the modification is length-changing, completing the modification requires visiting the second handle to adjust its location in the sliding text.
     1703A string with no sharing is in a SES by itself.
     1704Inside a SES, a modification of one substring portion may change the value in another substring portion, depending on whether the two overlap.
     1705Conversely, no value change can flow outside of a SES.
     1706Even if a modification on one string handle does not reveal itself \emph{logically} to another handle in the same SES (because they do not overlap), if the modification is length-changing, completing the modification requires visiting the second handle to adjust its location in the sliding text.
    14551707
    14561708
     
    14731725The functions @c@, @d@ and @f@ never share anything, because they are in a sharing-disabled context.
    14741726Executing the example does not produce an interesting outcome, but the comments in the picture indicate when the logical copy operation runs with
    1475 \begin{description}
     1727\begin{description}[itemsep=0pt,parsep=0pt]
    14761728        \item[share:] the copy being deferred, as described through the rest of this section (fast), or
    14771729        \item[copy:] the copy performed eagerly (slow).
     
    15181770However, there is now a conditional check required on the fast-path to switch between short and long string operations.
    15191771
    1520 It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as 8-bit characters.
     1772It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as for 8-bit characters.
    15211773Again, locations for identification flags must be found and checked along the fast path to select the correct actions.
    15221774Handling utf8 (variable length) is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters.
     
    15271779\label{s:PerformanceAssessment}
    15281780
    1529 I assessed the \CFA string library's speed and memory usage against strings in \CC STL.
    1530 Overall, this analysis shows that features common to both APIs comes at no substantial cost in the performance.
     1781I assessed the \CFA string library's speed and memory usage against strings in the \CC STL.
     1782Overall, this analysis shows common features in both APIs comes at no substantial cost in performance.
    15311783Moreover, the comparison shows that \CFA's high-level string features simplify text processing because the STL requires users to think more about memory management.
    15321784When the user does, and is successful, STL's performance can be very good.
     
    15411793
    15421794These tests use a \emph{corpus} of strings.
    1543 Their lengths are important; the specific characters occurring in them are immaterial.
     1795The lengths are important not the contents.
    15441796In a result graph, a corpus's mean string-length is often the independent variable on the x-axis.
    15451797When a corpus contains strings of different lengths, the lengths are drawn from a lognormal distribution.
     
    15531805The special treatment of length 16 deals with the SSO in STL @string@, currently not implemented in \CFA.
    15541806A fixed-size or from-16 distribution ensures that \CC's extra-optimized cases are isolated within, or removed from, the comparison.
    1555 In all experiments that use a corpus, its text is generated and loaded into the system under test before the timed phase begins.
     1807In all experiments that use a corpus, text is generated and loaded before the timing phase begins.
    15561808To ensure comparable results, a common memory allocator is used for \CFA and \CC.
    15571809\CFA runs the llheap allocator~\cite{Zulfiqar22}, which is also plugged into \CC.
    15581810The llheap allocator is significantly better than the standard @glibc@ allocator.
    15591811
    1560 The operations being measured take dozens of nanoseconds, so a succession of many invocations is run and timed as a group.
     1812The operations being measured take only dozens of nanoseconds, so a succession of many invocations is run and timed as a group.
    15611813The experiments run for a fixed duration (5 seconds), as determined by re-checking @clock()@ every 10,000 invocations, which is never more often than once per 80 ms.
    15621814Timing outcomes report mean nanoseconds per invocation, which includes harness overhead and the targeted string API execution.
     
    15661818In this mode, the \CFA string operates similarly to \CC's, by using a heap allocation for string text.
    15671819Some experiments include measurements in this mode for baselining purposes, called ``\CC emulation mode'' or ``nosharing''.
    1568 
    15691820See~\VRef{s:ExperimentalEnvironment} for a description of the hardware environment.
    15701821
     
    15781829// set alarm duration
    15791830for ( ... ) { $\C[1.5in]{// loop for duration}$
     1831        // reset accum to empty
    15801832        for ( i; N ) { $\C{// perform multiple appends (concatenations)}$
    15811833                accum += corpus[ f( i ) ];
     
    15851837\end{cfa}
    15861838The harness's outer loop executes for the experiment duration.
    1587 The string is reset to empty before appending (not shown).
     1839The string is reset to empty before appending (see below).
    15881840The inner loop builds up a growing-length string with successive appends.
    15891841Each run targets a specific (mean) corpus string length and produces one data point on the result graph.
    15901842Three specific comparisons are made with this harness.
    15911843Each picks its own independent-variable basis of comparison.
    1592 All three comparisons use the varying-from-1 corpus construction, \ie they allow the STL to show its advantage for SSO.
     1844All three comparisons use the varying-from-1 corpus construction, \ie they allow the STL to show its SSO advantage.
    15931845
    15941846
    15951847\subsubsection{Fresh vs Reuse in \CC, Emulation Baseline}
    15961848
    1597 The first experiment compares \CFA with \CC, with \CFA operating in nosharing mode and \CC having no other mode, hence both string package are using @malloc@/@free@.
     1849The first experiment compares \CFA with \CC in nosharing mode using @malloc@/@free@.
    15981850% This experiment establishes a baseline for other experiments.
    15991851This experiment also introduces the first \CC coding pitfall, which the next experiment shows is helped by turning on \CFA sharing.
    16001852% This pitfall shows, a \CC programmer must pay attention to string variable reuse.
    1601 In the following, both programs are doing the same thing: start with @accum@ empty and build it up by appending @N@ strings (type @string@ in \CC and the faster @string_res@ in \CFA).
     1853In the following, both programs start with @accum@ empty and build it up by appending @N@ strings (type @string@ in \CC and the faster @string_res@ in \CFA).
    16021854\begin{cquote}
    16031855\setlength{\tabcolsep}{40pt}
     
    16491901
    16501902\VRef[Figure]{fig:string-graph-peq-cppemu} shows the resulting performance.
    1651 The two fresh (solid spline lines) and the two reuse (dash spline lines) are identical, except for lengths $\le$10, where the \CC SSO has a 40\% average and minimally 24\% advantage.
     1903The two fresh (solid lines) and the two reuse (dash lines) are identical, except for lengths $\le$10, where the \CC SSO has a 40\% average and minimally 24\% advantage.
    16521904The gap between the fresh and reuse lines is the removal of the dynamic memory allocates and reuse of prior storage, \eg 100M allocations for fresh \vs 100 allocations for reuse across all experiments.
    16531905While allocation reduction is huge, data copying dominates the cost, so the lines are still reasonably close together.
     
    19192171The point chosen as \CFA's default liveness threshold (20\%) is marked with a circle.
    19202172For most string lengths, this point occurs as the doubling benefit becomes subjectively negligible.
    1921 At len-500, the amount of space needed to achieve 20\% liveness is so much that last-level cache misses begin occurring generating a further slowdown.
     2173At len-500 (top line in plot (a)), the amount of space needed to achieve 20\% liveness is so much ($>$~4M) that last-level cache misses begin occurring generating a further slowdown.
    19222174This effect is an anomaly of the experimental setup trying to compare such varied sizes in one view;
    19232175the len-500 default point is included only to provide a holistic picture of the STL comparisons (discussed next).
     
    19422194Beside \CFA's principal bars, a bar for separate-compilation overhead (\emph{sep.\ comp.}) shows the benefit that STL enjoys by using monolithic compilation.
    19432195\CFA currently forgoes this benefit.
    1944 This overhead figure was obtained by hacking a version of the \CFA string library to have a header-only implementation and measuring the resulting speed.
     2196This overhead figure is obtained by hacking a version of the \CFA string library to have a header-only implementation and measuring the resulting speed.
    19452197The difference between separately compiled (normal) and header-only (hacked) versions is the reported overhead.
    19462198It represents how much \CFA could speed up if it switched to a header-only implementation to match STL.
     
    19822234}
    19832235\end{lstlisting}
     2236
     2237
     2238\section{Future Work}
     2239
     2240\CFA strings provide an interesting set of operations for manipulating sets of characters.
     2241However, experience is needed to determine which operations are superfluous and which are missing.
     2242It is hard to find a strong string API in other general-purpose programming-languages to compare with, as most have simple string APIs, especially I/O.
     2243Supporting regular-expressions, as in text-editors and specialize string languages, \eg AWK~\cite{AWK}, Ruby~\cite{Ruby}, SNOBOL~\cite{SNOBOL}, is an important extension.
     2244String sharing also needs experience to determine if the rules for overlapping behaviour provide the right features for complex editing applications.
     2245
     2246Extending \CFA strings to non-ASCII characters is also a significant upgrade required for non-Latin languages.
     2247Working with fixed-size 16 and 32-bit characters should be a straightforward extension from 8-bit characters.
     2248Variable-sized UTF-8/16/32 characters require a significant rethink for implementation.
     2249For example, high-performance manipulation may require a companion array of pointers to the variable-sized characters.
     2250
     2251Finally, adding the \CC SSO trick and merging the two-part string-implementation, once the RAII problem in \CFA is fixed, will provide even better performance.
Note: See TracChangeset for help on using the changeset viewer.