Changeset bd72f517 for doc/theses/mike_brooks_MMath/string.tex
- Timestamp:
- May 13, 2025, 1:17:50 PM (4 months ago)
- Branches:
- master
- Children:
- 0528d79
- Parents:
- 7d02d35 (diff), 2410424 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/string.tex
r7d02d35 rbd72f517 17 17 \begin{cquote} 18 18 \begin{tabular}{@{}l|l|l|l@{}} 19 C @char [ ]@ & \CC @string@ & Java @String@ 19 C @char [ ]@ & \CC @string@ & Java @String@ & \CFA @string@ \\ 20 20 \hline 21 @strcpy@, @strncpy@ & @=@ & @=@ 22 @strcat@, @strncat@ & @+@, @+=@ & @+@, @+=@ 21 @strcpy@, @strncpy@ & @=@ & @=@ & @=@ \\ 22 @strcat@, @strncat@ & @+@, @+=@ & @+@, @+=@ & @+@, @+=@ \\ 23 23 @strcmp@, @strncmp@ & @==@, @!=@, @<@, @<=@, @>@, @>=@ 24 25 26 @strlen@ & @length@, @size@ & @length@ 27 @[ ]@ & @[ ]@ & @charAt@ 28 @strncpy@ & @substr@ & @substring@ 29 @strncpy@ & @replace@ & @replace@ 30 @strstr@ & @find@ & @indexOf@ 31 @strcspn@ & @find_first_of@ & @matches@ 32 @strspn@ & @find_first_not_of@ & @matches@ 33 n/a & @c_str@, @data@ & n/a& @strcpy@, @strncpy@ \\24 & @equals@, @compareTo@ 25 & @==@, @!=@, @<@, @<=@, @>@, @>=@ \\ 26 @strlen@ & @length@, @size@ & @length@ & @size@ \\ 27 @[ ]@ & @[ ]@ & @charAt@ & @[ ]@ \\ 28 @strncpy@ & @substr@ & @substring@ & @( )@, on RHS of @=@ \\ 29 @strncpy@ & @replace@ & @replace@ & @( )@, on LHS of @=@ \\ 30 @strstr@ & @find@ & @indexOf@ & @find@ \\ 31 @strcspn@ & @find_first_of@ & @matches@ & @include@ \\ 32 @strspn@ & @find_first_not_of@ & @matches@ & @exclude@ \\ 33 N/A & @c_str@, @data@ & N/A & @strcpy@, @strncpy@ \\ 34 34 \end{tabular} 35 35 \end{cquote} … … 53 53 54 54 55 \section{\CFA \lstinline{string} type}55 \section{\CFA \lstinline{string} Type} 56 56 \label{s:stringType} 57 57 … … 272 272 ch = ch + 'b'; $\C[2in]{// LHS disambiguate, add character values}$ 273 273 s = 'a' + 'b'; $\C{// LHS disambiguate, concatenate characters}$ 274 printf( "%c\n", @'a' + 'b'@ ); $\C [2in]{// no LHS information, ambiguous}$275 printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast} $274 printf( "%c\n", @'a' + 'b'@ ); $\C{// no LHS information, ambiguous}$ 275 printf( "%c\n", @(return char)@('a' + 'b') ); $\C{// disambiguate with ascription cast}\CRT$ 276 276 \end{cfa} 277 277 The ascription cast, @(return T)@, disambiguates by stating a (LHS) type to use during expression resolution (not a conversion). … … 319 319 ch = ch * 3; $\C[2in]{// LHS disambiguate, multiply character values}$ 320 320 s = 'a' * 3; $\C{// LHS disambiguate, concatenate characters}$ 321 printf( "%c\n", @'a' * 3@ ); $\C [2in]{// no LHS information, ambiguous}$322 printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast} $321 printf( "%c\n", @'a' * 3@ ); $\C{// no LHS information, ambiguous}$ 322 printf( "%c\n", @(return char)@('a' * 3) ); $\C{// disambiguate with ascription cast}\CRT$ 323 323 \end{cfa} 324 324 Fortunately, character multiplication without LHS information is even rarer than addition, so repurposing the operator @*@ for @string@ types is not a problem. … … 600 600 & 601 601 \begin{cfa} 602 for ( ;;) {602 for () { 603 603 size_t posn = exclude( line, alpha ); 604 604 if ( posn == len( line ) ) break; … … 722 722 \end{tabular} 723 723 \end{cquote} 724 Input text can be gulped, including whitespace, from the current point to an arbitrary delimiter character using @getline@.724 Input text can be \emph{gulped}, including whitespace, from the current point to an arbitrary delimiter character using @getline@. 725 725 726 726 The \CFA philosophy for input is that, for every constant type in C, these constants should be usable as input. … … 760 760 \end{tabular} 761 761 \end{cquote} 762 Note, the ability to read in quoted strings to match with program string constants.762 Note, the ability to read in quoted strings with whitespace to match with program string constants. 763 763 The @nl@ at the end of an input ignores the rest of the line. 764 764 … … 845 845 & Laxed: The target's type is anything string-like; it may have a different status concerning ownership. 846 846 & Strict: The target's type is the same as the source; both strings are equivalent peers concerning ownership. 847 & n/a& no & yes & yes \\847 & N/A & no & yes & yes \\ 848 848 \hline 849 849 Referent … … 863 863 The C ``string'' is @char *@, under the conventions of @<string.h>@. Because this type does not manage a text allocation, symmetry does not apply. 864 864 \item 865 The Java @String@ class is analyzed; its @StringBuffer@ class behaves similarly to @C++@.865 The Java @String@ class is analyzed; its @StringBuffer@ class behaves similarly to \CC. 866 866 \end{itemize} 867 867 \caption{Comparison of languages' strings, storage management perspective.} … … 869 869 \end{figure} 870 870 871 In C, these declarations give very different things.871 In C, these declarations are very different. 872 872 \begin{cfa} 873 873 char x[$\,$] = "abcde"; 874 874 char * y = "abcde"; 875 875 \end{cfa} 876 Both associate the declared name with fixed-six contiguous bytes, filled as@{'a', 'b', 'c', 'd', 'e', 0}@.877 But @x@ gets them allocated in the active stack frame (with values filled in as control passes the declaration), while @y@ refers into the executable's read-only datasection.876 Both associate the declared name with the fixed, six contiguous bytes: @{'a', 'b', 'c', 'd', 'e', 0}@. 877 But @x@ is allocated on the stack (with values filled at the declaration), while @y@ refers to the executable's read-only data-section. 878 878 With @x@ representing an allocation, it offers information in @sizeof(x)@ that @y@ does not. 879 But this extra information is second-class, as it can only be used in the immediate lexical context, \ie it cannot be passed onto string operations or user functions.879 But this extra information is second-class, as it can only be used in the immediate lexical context, \ie it cannot be passed to string operations or user functions. 880 880 Only pointers to text buffers are first-class, and discussed further. 881 882 881 \begin{cfa} 883 882 char * s = "abcde"; 884 char * s1 = s; $\C {// alias state, n/asymmetry, variable-constrained referent}$885 char * s2 = &s[1]; $\C{// alias state, n/a symmetry, variable-constrained referent}\CRT$886 char * s3 = &s2[1]; $\C{// alias state, n/a symmetry, variable-constrained referent}883 char * s1 = s; $\C[2.25in]{// alias state, N/A symmetry, variable-constrained referent}$ 884 char * s2 = &s[1]; $\C{// alias state, N/A symmetry, variable-constrained referent}$ 885 char * s3 = &s2[1]; $\C{// alias state, N/A symmetry, variable-constrained referent}\CRT$ 887 886 printf( "%s %s %s %s\n", s, s1, s2, s3 ); 888 887 $\texttt{\small abcde abcde bcde cde}$ … … 904 903 string & s5 = s.substr(2,4); $\C{// error: cannot point to temporary}\CRT$ 905 904 \end{cfa} 906 The @s1@ lax symmetry reflects how its validity ofdepends on the lifetime of @s@.905 The @s1@ lax symmetry reflects how its validity depends on the lifetime of @s@. 907 906 It is common practice in \CC to use the @s1@-style for a by-reference function parameter. 908 907 Doing so assumes that the callee only uses the referenced string for the duration of the call, \ie no storing the parameter (as a reference) for later. 909 908 So, when the called function is a constructor, its definition typically uses an @s2@-style copy-initialization. 910 909 Exceptions to this pattern are possible, but require the programmer to assure safety where the type system does not. 911 The @s3@ initialization must copy the substring because it must support a subsequent @c_str@ call, which provides anull-termination, generally at a different position than the source string's.910 The @s3@ initialization must copy the substring to support a subsequent @c_str@ call, which provides null-termination, generally at a different position than the source string's. 912 911 @s2@ assignment could be made fast, by reference-counting the text area and using copy-on-write, but would require an implementation upgrade. 913 912 … … 929 928 With @s2@, the case for fast-copy is more subtle. 930 929 Certainly, its value is not pointer-equal to @s@, implying at least a further allocation. 931 But because Java is notconstrained to use a null-terminated representation, a standard-library implementation is free to refer to the source characters in-place.930 But because Java is \emph{not} constrained to use a null-terminated representation, a standard-library implementation is free to refer to the source characters in-place. 932 931 Java does not meet the aliasing requirement because immutability makes it impossible to modify. 933 932 Java's @StringBuffer@ provides aliasing (see @replace@ example on \VPageref{p:JavaReplace}), though without supporting symmetric treatment of a fragment referent, \eg @substring@ of a @StringBuffer@ is a @String@; … … 960 959 961 960 962 963 \subsection{Logical overlap} 961 \subsection{Logical Overlap} 964 962 965 963 It may be unfamiliar to combine \VRef[Figure]{f:StrSemanticCompare}'s alias state and fragment referent in one API, or at the same time. 966 964 This section shows the capability in action. 967 965 968 In summary, the metaphor of a GUI text editor is intended. 969 Selecting a consecutive block of text using the mouse defines an aliased substring within the file. 970 Typing in this state overwrites what was there before, replacing the originally selected text with more or less text. 971 But the \emph{whole file} grows or shrinks as a result, not just the selection. 972 This action models assigning to an aliased substring when the two strings overlap by total containment: one string is the selection, the other is the whole file. 973 974 Now extend the metaphor to a multi-user online editor. 975 If Alice selects a range of text at the bottom of the file, wile Bob is rewriting a paragraph at the top, Alice's selection holds onto the logical characters initially selected, unaffected by Bob making the total file grow/shrink, and unaffectd by Bob causing the start index of Alice's selction to vary. 976 This action models assigning to an aliased substring when the two strings do not overlap at all: one string is Alice's selection, the other is Bob's. 977 978 If a third office worker were also watching Alice's and Bob's actions on the whole file (a string with ``all the text'' is kept around), then two further single-user-edit cases give the semantics of the individual edits flowing into the whole. 979 But, departing from the document analogy, it is not necessary to keep a such a third string: 980 no one has to resource-manage ``the document.'' 981 When an original string, from which both the Alice- and Bob-parts came, ceases to exist, Alice and Bob are left with two independent strings. 982 They are independent because Alice and Bob have no API for growing the bounds of a string to subsume text that may once have been around it. 983 984 Edge cases, notably ``Venn-diagram overlap,'' had to have handlings chosen. 985 The intent in fleshing out these details was to achieve the above story, with a single API, while keeping the rest as simple as possible. 986 The remainder of this section shows the resulting decisions, played out at the API level. 987 988 \CFA uses the marker @`share@ as a dynamic mechanism to indicate alias (mutations shared) \vs snapshot (not quite an immutable result, but one with subsequent mutations isolated). 966 \begin{comment} 967 The metaphor of a GUI text-editor is used to illustrate combining these features. 968 Most editors allow selecting a consecutive block of text (highlighted) to define an aliased substring within a document. 969 Typing in this area overwrites the prior text, replacing the selected text with less, same, or more text. 970 Importantly, the document also changes size, not just the selection. 971 %This alias model is assigning to an aliased substring for two strings overlapping by total containment: one is the selected string, the other is the document. 972 Extend the metaphor to two selected areas, where one area can be drag-and-dropped into another, changing the text in the drop area and correspondingly changing the document. 973 When the selected areas are indenpendent, the semantics of the drag-and-drop are straightforward. 974 However, for overlapping selections, either partial or full, there are multiple useful semantics. 975 For example, two areas overlap at the top, or bottom, or a block at a corner, where one areas is dropped into the other. 976 For selecting a smaller area within a larger, and dropping the smaller area into the larger to replace it. 977 In both cases, meaningful semantics must be constructed or the operation precluded. 978 However, without this advanced capability, certain operations become multi-step, possible requiring explicit temporaries. 979 \end{comment} 980 981 A GUI text-editor provides a metaphor. 982 Selecting a block of text using the mouse defines an aliased substring within a document. 983 Typing in this area overwrites what was there, replacing the originally selected text with more or less text. 984 But the \emph{containing document} also grows or shrinks, not just the selection. 985 This action models assigning to an aliased substring when one string is completely contained in the other. 986 987 Extend the metaphor to a multi-user editor. 988 If Alice selects a range of text at the bottom, while Bob is rewriting a paragraph at the top, Alice's selection holds onto the characters initially selected, unaffected by Bob making the document grow/shrink even though Alice's start index in the document is changing. 989 This action models assigning to an aliased substring when the two strings do not overlap. 990 991 Logically, Alice's and Bob's actions on the whole document are like two single-user-edit cases, giving the semantics of the individual edits flowing into a whole. 992 But, there is no need to have two separate document strings. 993 Even if a third selection removes all the text, both Alice's and Bob's strings remain. 994 The independence of their selections assumes that the editor API does not allow the selection to be enlarged, \ie adding text from the containing environment, which may have disappeared. 995 996 This leaves the ``Venn-diagram overlap'' cases, where Alice's and Bob's selections overlap at the top, bottom, or corner. 997 In this case, the selection areas are dependent, and so, changes in content and size in one may have an affect in the other. 998 There are multiple possible semantics for this case. 999 The remainder of this section shows the chosen semantics for all of the cases. 1000 1001 String sharing is expressed using the @`share@ marker to indicate aliasing (mutations shared) \vs snapshot (not quite an immutable result, but one with subsequent mutations isolated). 989 1002 This aliasing relationship is a sticky property established at initialization. 990 1003 For example, here strings @s1@ and @s1a@ are in an aliasing relationship, while @s2@ is in a copy relationship. 991 1004 \input{sharing1.tex} 992 1005 Here, the aliasing (@`share@) causes partial changes (subscripting) to flow in both directions. 993 (In the following examples, watchhow @s1@ and @s1a@ change together, and @s2@ is independent.)1006 (In the following examples, note how @s1@ and @s1a@ change together, and @s2@ is independent.) 994 1007 \input{sharing2.tex} 995 1008 Similarly for complete changes. … … 999 1012 \input{sharing4.tex} 1000 1013 1001 Now, consider string @s1_mid@ being an alias in the middle of @s1@, along with @s2@, made by a simplecopy from the middle of @s1@.1014 Now, consider string @s1_mid@ being an alias in the middle of @s1@, along with @s2@, made by a copy from the middle of @s1@. 1002 1015 \input{sharing5.tex} 1003 1016 Again, @`share@ passes changes in both directions; copy does not. … … 1020 1033 When @s1_bgn@'s size increases by 3, @s1_mid@'s starting location moves from 1 to 4 and @s1_end@'s from 3 to 6, 1021 1034 1022 When changes happen son an aliasing substring that overlap.1035 When changes happen on an aliasing substring that overlap. 1023 1036 \input{sharing10.tex} 1024 Strings @s1_crs@ and @s1_mid@ overlap at character 4, @j@ because the substrings are 3,2 and 4,2.1037 Strings @s1_crs@ and @s1_mid@ overlap at character 4, @j@, because the substrings are 3,2 and 4,2. 1025 1038 When @s1_crs@'s size increases by 1, @s1_mid@'s starting location moves from 4 to 5, but the overlapping character remains, changing to @'+'@. 1026 1039 … … 1079 1092 1080 1093 1081 1082 \section{Storage management} 1094 \section{Storage Management} 1083 1095 1084 1096 This section discusses issues related to storage management of strings. … … 1099 1111 const string s1 = "abc"; 1100 1112 \end{cfa} 1101 the @const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage. 1102 Hence, @s1@ is not pointing at an immutable constant, meaning its underlying string can be mutable, unless some other designation is specified, such as Java's global immutable rule. 1103 1104 1105 1106 \subsection{General implementation} 1113 @const@ applies to the @s1@ pointer to @"abc"@, and @"abc"@ is an immutable constant that is \emph{copied} into the string's storage. 1114 Hence, @s1@ is not pointing at an immutable constant and its underlying string is mutable, unless some other designation is specified, such as Java's global immutable rule. 1115 1116 1117 \subsection{General Implementation} 1107 1118 \label{string-general-impl} 1108 1119 … … 1113 1124 A string is a smart pointer into this buffer. 1114 1125 1115 This cycle of frequent cheap allocations, interspersed with infrequent expensive compactions, has obvious similarities to a general-purpose memory 1126 This cycle of frequent cheap allocations, interspersed with infrequent expensive compactions, has obvious similarities to a general-purpose memory-manager based on garbage collection (GC). 1116 1127 A few differences are noteworthy. 1117 1128 First, in a general purpose manager, the allocated objects may contain pointers to other objects, making the transitive reachability of these objects a crucial property. 1118 1129 Here, the allocations are text, so one allocation never keeps another alive. 1119 1130 Second, in a general purpose manager, the handle that keeps an allocation alive is a bare pointer. 1120 For strings, a fatter representation is acceptable because this pseudo-pointer is only used for ent y into the string-heap, not for general data-sub-structure linking around the general heap.1131 For strings, a fatter representation is acceptable because this pseudo-pointer is only used for entry into the string-heap, not for general data-substructure linking around the general heap. 1121 1132 1122 1133 \begin{figure} … … 1128 1139 \VRef[Figure]{f:memmgr-basic} shows the representation. 1129 1140 The heap header and text buffer define a sharing context. 1130 Normally, one global sharingcontext is appropriate for an entire program;1131 concurren t exceptions arediscussed in \VRef{s:ControllingImplicitSharing}.1132 A string is a handle into the buffer and node within a linked list.1141 Normally, one global context is appropriate for an entire program; 1142 concurrency is discussed in \VRef{s:ControllingImplicitSharing}. 1143 A string is a handle to a node in a linked list containing a information about a string text in the buffer. 1133 1144 The list is doubly linked for $O(1)$ insertion and removal at any location. 1134 1145 Strings are ordered in the list by text start address. 1135 The hea der maintains a next-allocation pointer, @alloc@, pointing to the last live allocation in the buffer.1146 The heap header maintains a next-allocation pointer, @alloc@, pointing to the last live allocation in the buffer. 1136 1147 No external references point into the buffer and the management procedure relocates the text allocations as needed. 1137 1148 A string handle references a containing string, while its string is contiguous and not null terminated. … … 1139 1150 String handles can be allocated in the stack or heap, and represent the string variables in a program. 1140 1151 Normal C life-time rules apply to guarantee correctness of the string linked-list. 1141 The text buffer is large enough with good management so that often only one dynamic allocation is necessary during program execution .1152 The text buffer is large enough with good management so that often only one dynamic allocation is necessary during program execution, but not so large as to cause program bloat. 1142 1153 % During this period, strings can vary in size dynamically. 1143 1154 1144 1155 When the text buffer fills, \ie the next new string allocation causes @alloc@ to point beyond the end of the buffer, the strings are compacted. 1145 1156 The linked handles define all live strings in the buffer, which indirectly defines the allocated and free space in the buffer. 1146 Since the string handles are in sorted order,the handle list can be traversed, copying the first live text to the start of the buffer, and subsequent strings after each other.1147 If, upon compaction, the amount of free storage wouldstill be less than the new string allocation, a larger text buffer is heap-allocated, the current buffer is copied into the new buffer, and the original buffer is freed.1157 The string handles are maintained in sorted order, so the handle list can be traversed, copying the first live text to the start of the buffer, and subsequent strings after each other. 1158 After compaction, if free storage is still be less than the new string allocation, a larger text buffer is heap-allocated, the current buffer is copied into the new buffer, and the original buffer is freed. 1148 1159 Note, the list of string handles is structurally unaffected during a compaction; 1149 1160 only the text pointers in the handles are modified to new buffer locations. … … 1157 1168 Both string initialization styles preserve the string module's internal invariant that the linked-list order matches the buffer order. 1158 1169 For string destruction, handles are removed from the list. 1159 As a result, once a last handle using a run of buffer characters is destroyed, that buffer space gets excluded from the next compaction, making its character-count available in the compacted buffer.1160 1161 Certain string operations canresult in a substring of another string.1162 The resulting handle is then placed in the correct sorted position in the list, possible witha short linear search to locate the position.1170 Once the last handle using a run of buffer characters is destroyed, that buffer space is excluded from use until the next compaction. 1171 1172 Certain string operations result in a substring of another string. 1173 The resulting handle is then placed in the correct sorted position in the list, possible requiring a short linear search to locate the position. 1163 1174 For string operations resulting in a new string, that string is allocated at the end of the buffer. 1164 1175 For shared-edit strings, handles that originally referenced containing locations need to see the new value at the new buffer location. … … 1175 1186 1176 1187 1177 \subsection{RAII limitations}1188 \subsection{RAII Limitations} 1178 1189 \label{string-raii-limit} 1179 1190 1180 1191 Earlier work on \CFA~\cite[ch.~2]{Schluntz17} implemented object constructors and destructors for all types (basic and user defined). 1181 A constructor is a user-defined function run implicitly \emph{after} an object's storage is allocated, and a destructor is a user-defined function run \emph{before} an object's storage is deall cated.1192 A constructor is a user-defined function run implicitly \emph{after} an object's storage is allocated, and a destructor is a user-defined function run \emph{before} an object's storage is deallocated. 1182 1193 This feature, called Resource Acquisition Is Initialization (RAII)~\cite[p.~389]{Stroustrup94}, helps guarantee invariants for users before accessing an object and for the programming environment after an object terminates. 1183 1194 … … 1213 1224 \end{cfa} 1214 1225 A module providing the @T@ type can traverse @all_T@ at relevant times, to keep the objects ``good.'' 1215 Hence, declaring a @T@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' in the future.1226 Hence, declaring a @T@ not only ensures that it begins with an initially ``good'' value, but it also provides an implicit subscription to a service that keeps the value ``good'' during its lifetime. 1216 1227 Again, both \CFA and \CC support this usage style. 1217 1228 1218 1229 A third capability concerns \emph{implicitly} requested copies. 1219 1230 When stack-allocated objects are used as parameter and return values, a sender's version exists in one stack frame and a receiver's version exists in another. 1220 In the parameter direction, the language's function-call handling must arrange for a copy-constructor call to happen\footnote{ 1221 \CC also offers move constructors and return-value optimization~\cite{RVO20}. 1222 These features help reduce unhelpful copy-constructor calls, which, for types like the example \lstinline{S}, would lead to extra memory allocations. 1223 \CFA does not currently have these features; adding similarly-intended features to \CFA is desirable. 1224 However, this section is about a problem in the realization of features that \CFA already supports. 1225 To understand the problem presented, the appropriate comparison is with classic versions of \CC that treated such copy-constructor calls as necessary.} 1226 at a time near the control transfer into the callee, with the source as the caller's (sender's) version and the target as the callee's (receiver's) version. 1227 (In the return direction, the roles are reversed and the copy-constructor call happens near the return of control.) 1228 \CC supports this capability without qualification. 1229 \CFA offers limited support here; simple examples work, but implicit copying does not combine successfully with the other RAII capabilities discussed. 1231 In the parameter direction, the language's function-call handling must arrange for a copy-constructor call to happen, at a time near the control transfer into the callee. %, with the source as the caller's (sender's) version and the target as the callee's (receiver's) version. 1232 In the return direction, the roles are reversed and the copy-constructor call happens near the return of control. 1233 \CC supports this capability.% without qualification. 1234 \CFA offers limited support; 1235 simple examples work, but implicit copying does not combine successfully with the other RAII capabilities discussed. 1236 1237 \CC also offers move constructors and return-value optimization~\cite{RVO20}. 1238 These features help reduce unhelpful copy-constructor calls, which, for types like the @S@ example, would lead to extra memory allocations. 1239 \CFA does not currently have these features; adding similarly-intended features to \CFA is desirable. 1240 However, this section is about a problem in the realization of features that \CFA already supports. 1241 Hence, the comparison continues with the classic version of \CC that treated such copy-constructor calls as necessary. 1230 1242 1231 1243 To summarize the unsupported combinations, the relevant features are: … … 1243 1255 At that time, adhering to a principal of minimal intervention, this code could always be treated as passthrough: 1244 1256 \begin{cfa} 1245 struct U { ...};1257 struct U { ... }; 1246 1258 // RAII to go here 1247 1259 void f( U u ) { F_BODY(u) } … … 1249 1261 f( x ); 1250 1262 \end{cfa} 1251 But adding custom RAII (at ``... here'') changes things.1252 The common C++ lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the presentCFA lowering.1253 1254 \ noindent1255 \begin{tabular}{ l|l}1256 \begin{cfa} 1257 // C++, likely CFA to be 1263 But adding custom RAII (at ``...go here'') changes things. 1264 The common \CC lowering~\cite[Sec. 3.1.2.3]{cxx:raii-abi} proceeds differently than the present \CFA lowering. 1265 \begin{cquote} 1266 \setlength{\tabcolsep}{15pt} 1267 \begin{tabular}{@{}l|l@{}} 1268 \begin{cfa} 1269 $\C[0.0in]{// \CC, \CFA future}\CRT$ 1258 1270 struct U {...}; 1259 1271 // RAII elided 1260 1272 void f( U * __u_orig ) { 1261 1273 U u = * __u_orig; // call copy ctor 1262 F_BODY( u)1274 F_BODY( u ); 1263 1275 // call dtor, u 1264 1276 } 1265 1277 U x; // call default ctor 1266 f( & x ) ; 1278 1279 1280 f( &x ) ; 1281 1282 1267 1283 // call dtor, x 1268 1284 \end{cfa} 1269 1285 & 1270 1286 \begin{cfa} 1271 // CFA today 1287 $\C[0.0in]{// \CFA today}\CRT$ 1272 1288 struct U {...}; 1273 1289 // RAII elided 1274 1290 void f( U u ) { 1275 F_BODY(u) 1291 1292 F_BODY( u ); 1293 1276 1294 } 1277 1295 U x; // call default ctor … … 1284 1302 \end{cfa} 1285 1303 \end{tabular} 1286 1287 In the CFA-today scheme, the lowered form is still using a by-value C call. 1288 C does a @memcpy@ on structs passed by value. 1289 And so, @F_BDY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions. 1290 If @U@ is trying to have a style-\#2 invariant, it shows up broken in @F_BDY@: references that are supposed to be to @u@ are actually to the different location @__u_for_f@. 1291 The \CC scheme does not have this problem because it constructs the for-@f@ copy in the correct location. 1292 1293 Yet, the \CFA-today scheme is sufficient to deliver style-\#1 invariants (in this style-\#3 use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times. So, reference-counting or simple ownership applications get their invariants respected under call/return-by-value. 1304 \end{cquote} 1305 The current \CFA scheme is still using a by-value C call. 1306 C does a @memcpy@ on structures passed by value. 1307 And so, @F_BODY@ sees the bits of @__u_for_f@ occurring at an address that has never been presented to the @U@ lifecycle functions. 1308 If @U@ is trying to have a style-\#2 invariant, it shows up broken in @F_BODY@: references supposedly to @u@ are actually to @__u_for_f@. 1309 The \CC scheme does not have this problem because it constructs the for @f@ copy in the correct location within @f@. 1310 1311 Yet, the current \CFA scheme is sufficient to deliver style-\#1 invariants (in this style-\#3 use case) because this scheme still does the correct number of lifecycle calls, using correct values, at correct times. 1312 So, reference-counting or simple ownership applications get their invariants respected under call/return-by-value. 1294 1313 1295 1314 % [Mike is not currently seeing how distinguishing initialization from assignment is relevant] … … 1325 1344 % The following discusses the consequences of this semantics with respect to lifetime management of \CFA strings. 1326 1345 1327 1328 The string API offers style \#3's pass-by-value in, for example, in the return of @"a" + "b"@. 1346 The string API offers style \#3's pass-by-value in, \eg in the return of @"a" + "b"@. 1329 1347 Its implementation uses the style-\#2 invariant of the string handles being linked to each other, helping to achieve high performance. 1330 Since these two RAII styles canno nt coexist, a workaround splits the API into two layers: one that provides pass-by-value, built upon the other with inter-linked handles.1348 Since these two RAII styles cannot coexist, a workaround splits the API into two layers: one that provides pass-by-value, built upon the other with inter-linked handles. 1331 1349 The layer with pass-by-value incurs a performance penalty, while the layer without delivers the desired runtime performance. 1332 The slower, friendlier High Level API (HL, type @string@) wrap ps the faster, more primitive Low Level API (LL, type @string_res@, abbreviating ``resource'').1350 The slower, friendlier High Level API (HL, type @string@) wraps the faster, more primitive Low Level API (LL, type @string_res@, abbreviating ``resource''). 1333 1351 Both APIs present the same features, up to return-by-value operations being unavailable in LL and implemented via the workaround in HL. 1334 1352 The intention is for most future code to target HL. 1335 When the RAII issue is fixed, the full HL feature set will be acheivable using the LL-style lifetime management. 1336 So then, there will be no need for two API levels; HL will be removed; LL's type will be renamed to @string@; programs written for current HL will run faster. 1353 When the RAII issue is fixed, the full HL feature set will be achievable using the LL-style lifetime management. 1354 Then, HL will be removed; 1355 LL's type will be renamed @string@ and programs written for current HL will run faster. 1337 1356 In the meantime, performance-critical sections of applications must use LL. 1338 1357 Subsequent performance experiments \see{\VRef{s:PerformanceAssessment}} use the LL API when comparing \CFA to other languages. 1339 1358 This measurement gives a fair estimate of the goal state for \CFA. 1340 1359 A separate measure of the HL overhead is also included. 1341 1342 \VRef[Section]{string-general-impl} described the goal state for \CFA.In present state, the type @string_res@ replaces its mention of @string@ as inter-linked handle.1343 1344 To use LL, a programmer rewrites invocations that used pass-by-value APIs into invocations where the resourcing is more explicit.1345 Many invocations are unaffected, notably includingassignment and comparison.1346 Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases haverevisions.1347 1348 \ noindent1360 hence, \VRef[Section]{string-general-impl} us describing the goal state for \CFA. 1361 In present state, the type @string_res@ replaces its mention of @string@ as inter-linked handle. 1362 1363 To use LL, a programmer rewrites invocations using pass-by-value APIs into invocations where resourcing is more explicit. 1364 Many invocations are unaffected, notably assignment and comparison. 1365 Of the capabilities listed in \VRef[Figure]{f:StrApiCompare}, only the following three cases need revisions. 1366 \begin{cquote} 1367 \setlength{\tabcolsep}{15pt} 1349 1368 \begin{tabular}{ll} 1350 1369 HL & LL \\ 1351 1370 \hline 1352 1371 \begin{cfa} 1372 1353 1373 string s = "a" + "b"; 1354 1374 \end{cfa} … … 1363 1383 string s = "abcde"; 1364 1384 string s2 = s(2, 3); // s2 == "cde" 1385 1365 1386 s(2,3) = "x"; // s == "abx" && s2 == "cde" 1366 1387 \end{cfa} … … 1376 1397 \begin{cfa} 1377 1398 string s = "abcde"; 1399 1378 1400 s[2] = "xxx"; // s == "abxxxde" 1379 1401 \end{cfa} … … 1385 1407 \end{cfa} 1386 1408 \end{tabular} 1387 1409 \end{cquote} 1388 1410 The actual HL workaround is having @string@ wrap a pointer to a uniquely owned, heap-allocated @string_res@. This arrangement has @string@ being style-\#1 RAII, which is compatible with pass-by-value. 1389 1411 1390 1412 1391 1392 \subsection{Sharing implementation} 1413 \subsection{Sharing Implementation} 1393 1414 \label{sharing-impl} 1394 1415 1395 The \CFA string module has two mechanisms to handle the case when string handles share a run of text. 1396 1416 The \CFA string module has two mechanisms to deal with string handles sharing text. 1397 1417 In the first type of sharing, the user requests that both string handles be views of the same logical, modifiable string. 1398 1418 This state is typically produced by the substring operation. … … 1404 1424 $\texttt{\small axcde xc}$ 1405 1425 \end{cfa} 1406 In a typical substring call, the source string-handle is referencing an entire string, and the resulting, newly made, string handle is referencing aportion of the original.1407 In this state, a subsequent modification made by either is visible in both.1426 Here, the source string-handle is referencing an entire string, and the resulting, newly made, string handle is referencing a contained portion of the original. 1427 In this state, a modification made in the overlapping area is visible in both strings. 1408 1428 1409 1429 The second type of sharing happens when the system implicitly delays the physical execution of a logical \emph{copy} operation, as part of its copy-on-write optimization. … … 1418 1438 In this state, a subsequent modification done on one handle triggers the deferred copy action, leaving the handles referencing different text within the buffer, holding distinct values. 1419 1439 1420 A further abstraction , in the string module's implementation,helps distinguish the two senses of sharing.1440 A further abstraction helps distinguish the two senses of sharing. 1421 1441 A share-edit set (SES) is an equivalence class over string handles, being the reflexive, symmetric and transitive closure of the relationship of one string being constructed from another, with the ``share'' option given. 1422 1442 The SES is represented by a second linked list among the handles. … … 1427 1447 1428 1448 1429 \subsection{Controlling implicit sharing}1449 \subsection{Controlling Implicit Sharing} 1430 1450 \label{s:ControllingImplicitSharing} 1431 1451 … … 1434 1454 In detail, string sharing has inter-linked string handles, so managing one string is also managing the neighbouring strings, and from there, a data structure of the ``set of all strings.'' 1435 1455 Therefore, it is useful to toggle this capability on or off when it is not providing any application benefit. 1436 1437 \begin{figure}1438 \begin{tabular}{ll}1439 \lstinputlisting[language=CFA, firstline=10, lastline=55]{sharectx.run.cfa}1440 &1441 \raisebox{-0.17\totalheight}{\includegraphics{string-sharectx.pdf}} % lower1442 \end{tabular}1443 \caption{Controlling copying vs sharing of strings using \lstinline{string_sharectx}.}1444 \label{fig:string-sharectx}1445 \end{figure}1446 1456 1447 1457 The \CFA string library provides the type @string_sharectx@ to control an ambient sharing context. … … 1456 1466 Executing the example does not produce an interesting outcome, but the comments in the picture indicate when the logical copy operation runs with 1457 1467 \begin{description} 1458 1459 1468 \item[share:] the copy being deferred, as described through the rest of this section (fast), or 1469 \item[copy:] the copy performed eagerly (slow). 1460 1470 \end{description} 1461 1471 Only eager copies can cross @string_sharectx@ boundaries. 1462 1472 The intended use is with stack-managed lifetimes, in which the established context lasts until the current function returns, and affects all functions called that do not create their own contexts. 1463 1473 1464 [ TODO: true up with ``is thread local'' (implement that and expand this discussion to give a concurrent example, or adjust this wording) ] 1465 1466 1467 \subsection{Sharing and threading} 1474 \begin{figure} 1475 \begin{tabular}{ll} 1476 \lstinputlisting[language=CFA, firstline=10, lastline=55]{sharectx.run.cfa} 1477 & 1478 \raisebox{-0.17\totalheight}{\includegraphics{string-sharectx.pdf}} % lower 1479 \end{tabular} 1480 \caption{Controlling copying vs sharing of strings using \lstinline{string_sharectx}.} 1481 \label{fig:string-sharectx} 1482 \end{figure} 1483 1484 1485 \subsection{Sharing and Threading} 1468 1486 1469 1487 The \CFA string library provides no thread safety, the same as \CC string, providing similar performance goals. … … 1474 1492 1475 1493 1476 \subsection{Future work} 1477 1478 Implementing the small-string optimization is straightforward, as a string header contains a pointer to the string text in the buffer. 1479 This pointer could be marked with a flag and contain a small string. 1480 However, there is now a conditional check required on the fast-path to switch between small and large string operations. 1481 1482 It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as 8-bit characters. 1483 Again, locations for identification flags must be found and checked along the fast path to select the correct actions. 1484 Handling utf8 (variable length), is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters. 1485 Trying to use a secondary array of fixed-sized pointers/offsets to the characters is possible, but raises the question of storage management for the utf8 characters themselves. 1486 1487 1488 \section{Performance assessment} 1489 \label{s:PerformanceAssessment} 1490 1491 I assessed the \CFA string library's speed and memory usage against strings in \CC STL. 1492 1493 Overall, this analysis shows that adding support for the features shown earlier in the chapter comes at no substantial cost in the performance of featrues common to both APIs. 1494 1495 Moreover, the results support the \CFA string's position as a high-level enabler of simplified text processing. 1496 STL makes its user think about memory management. 1497 When the user does, and is successful, STL's performance can be very good. 1498 But when the user fails to think through the consequences of the STL representation, performance becomes poor. 1499 The \CFA string lets the user work at the level of just putting the right text into right variables, with corresponding performance degradations reduced or eliminated. 1500 1501 % The final test shows the overall win of the \CFA text-sharing mechanism. 1502 % It exercises several operations together, showing \CFA enabling clean user code to achieve performance that STL requires less-clean user code to achieve. 1503 1504 1505 \subsection{Methodology} 1506 1507 These tests use a \emph{corpus} of strings. 1508 Their lengths are important; the specific characters occurring in them are immaterial. 1509 In a result graph, a corpus's mean string length is often the independent variable shown on the X axis. 1510 1511 When a corpus contains strings of different lenghths, the lengths are drawn from a geometric distribution. 1512 Therefore, strings much longer than the mean occur nontrivially and strings slightly shorter than the mean occur most often. 1513 A corpus's string sizes are one of: 1514 \begin{description} 1515 \item [Fixed-size] all string lengths are of the stated size. 1516 \item [Varying 1 and up] the string lengths are drawn from the geometric distribution with a stated mean and all lengths occur. 1517 \item [Varying 16 and up] string lengths are drawn from the geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16. \PAB{Is this one unused? May have just been for ``normalize.''} 1518 \end{description} 1519 The special treatment of length 16 deals with the short-string optimization (SSO) in STL @string@, currently not implemented in \CFA, though a fine future improvement to \CFA. 1520 In the general case, an STL string handle is a pointer (to separately allocated text) and a length. 1521 But when the text is shorter than this representation, the optimization repurposes the handle's storage to eliminate using the heap. 1494 \subsection{Short-String Optimization} 1495 1496 \CC implements a short-string ($\le$16) optimization (SSO). 1497 As a string header contains a pointer to the string text, this pointer can be tagged and used to contain a short string, removing a dynamic memory allocation/deallocation. 1522 1498 \begin{c++} 1523 1499 class string { … … 1529 1505 char sstr[sizeof(lstr)]; $\C{// short string <16 characters, text in situ}$ 1530 1506 }; 1531 $\C{// tagging for kind (short or long) elided}$1507 // some tagging for short or long strings 1532 1508 }; 1533 1509 \end{c++} 1534 1510 However, there is now a conditional check required on the fast-path to switch between short and long string operations. 1511 1512 It might be possible to pack 16- or 32-bit Unicode characters within the same string buffer as 8-bit characters. 1513 Again, locations for identification flags must be found and checked along the fast path to select the correct actions. 1514 Handling utf8 (variable length), is more problematic because simple pointer arithmetic cannot be used to stride through the variable-length characters. 1515 Trying to use a secondary array of fixed-sized pointers/offsets to the characters is possible, but raises the question of storage management for the utf8 characters themselves. 1516 1517 1518 \section{Performance Assessment} 1519 \label{s:PerformanceAssessment} 1520 1521 I assessed the \CFA string library's speed and memory usage against strings in \CC STL. 1522 Overall, this analysis shows that adding support for the features shown earlier in the chapter comes at no substantial cost in the performance of features common to both APIs. 1523 1524 Moreover, the results support the \CFA string's position as a high-level enabler of simplified text processing. 1525 STL makes its user think about memory management. 1526 When the user does, and is successful, STL's performance can be very good. 1527 But when the user fails to think through the consequences of the STL representation, performance becomes poor. 1528 The \CFA string lets the user work at the level of just putting the right text into the right variables, with corresponding performance degradations reduced or eliminated. 1529 1530 % The final test shows the overall win of the \CFA text-sharing mechanism. 1531 % It exercises several operations together, showing \CFA enabling clean user code to achieve performance that STL requires less-clean user code to achieve. 1532 1533 1534 \subsection{Methodology} 1535 1536 These tests use a \emph{corpus} of strings. 1537 Their lengths are important; the specific characters occurring in them are immaterial. 1538 In a result graph, a corpus's mean string length is often the independent variable shown on the X axis. 1539 1540 When a corpus contains strings of different lengths, the lengths are drawn from a geometric distribution. 1541 Therefore, strings much longer than the mean occur less often and strings slightly shorter than the mean occur most often. 1542 A corpus's string sizes are one of: 1543 \begin{description} 1544 \item [Fixed-size] all string lengths are of the stated size. 1545 \item [Varying 1 and up] the string lengths are drawn from the geometric distribution with a stated mean and all lengths occur. 1546 \item [Varying 16 and up] string lengths are drawn from the geometric distribution with the stated mean, but only lengths 16 and above occur; thus, the stated mean is above 16. 1547 \end{description} 1548 The special treatment of length 16 deals with the SSO in STL @string@, currently not implemented in \CFA. 1535 1549 A fixed-size or from-16 distribution ensures that \CC's extra-optimized cases are isolated within, or removed from, the comparison. 1536 1537 1550 In all experiments that use a corpus, its text is generated and loaded into the system under test before the timed phase begins. 1538 1539 1551 To ensure comparable results, a common memory allocator is used for \CFA and \CC. 1540 \CFA runs the llheap allocator~\cite{Zulfiqar22} ; the test rig plugs this same allocatorinto \CC.1552 \CFA runs the llheap allocator~\cite{Zulfiqar22}, which is also plugged into \CC. 1541 1553 1542 1554 The operations being measured take dozens of nanoseconds, so a succession of many invocations is run and timed as a group. 1543 The experiments run with fixed duration (targeting approximately 5 seconds), stopping upon passing a goal time, as determined by re-checking @clock()@ every 10,000 invocations, which is never more often than once per 80 ms.1544 Timing outcomes rep rt mean nanoseconds per invocation, which includes harness overhead and the targeted string API execution.1555 The experiments run for a fixed duration (5 seconds), as determined by re-checking @clock()@ every 10,000 invocations, which is never more often than once per 80 ms. 1556 Timing outcomes report mean nanoseconds per invocation, which includes harness overhead and the targeted string API execution. 1545 1557 1546 1558 \PAB{To discuss: hardware and such} 1547 1559 1548 As discussed in \VRef[Section]{string-raii-limit}, general performance comparisons are made using \CFA's faster, low-level string API, whose string type is named @string_res@. 1549 1550 \VRef{s:ControllingImplicitSharing} presents an operational mode where \CFA string sharing is turned off. In this mode, the \CFA string operates similarly to \CC's, by using a distinct heap allocation for each string's text. 1551 Some experiments include measurements in this mode for baselining purposes. 1552 It is called ``\CC emulation mode'' or ``nosharing'' here. 1553 1560 As discussed in \VRef[Section]{string-raii-limit}, general performance comparisons are made using \CFA's faster, low-level string API, named @string_res@. 1561 \VRef{s:ControllingImplicitSharing} presents an operational mode where \CFA string sharing is turned off. 1562 In this mode, the \CFA string operates similarly to \CC's, by using a heap allocation for string text. 1563 Some experiments include measurements in this mode for baselining purposes, called ``\CC emulation mode'' or ``nosharing''. 1554 1564 1555 1565 1556 1566 \subsection{Test: Append} 1557 1567 1558 These tests measure the speed of appending strings from the corpus onto a larger, growing string. They show \CFA performing comparably to \CC overall, though with reduced penalties for simple API misuses for which \CC programmers may not know to watch out.1559 1568 These tests measure the speed of appending strings from the corpus onto a larger, growing string. 1569 They show \CFA performing comparably to \CC overall, though with penalties for simple API misuses. 1560 1570 The basic harness is: 1561 \begin{cquote} 1562 \setlength{\tabcolsep}{20pt} 1563 \begin{cfa} 1564 START_TIMER 1565 for ( ... ) { 1566 string_res accum; 1567 for ( i; 100 ) { 1568 accum += corpus[ f(i) ]; // importing from char * here 1569 COUNT_ONE_OP_DONE 1571 \begin{cfa} 1572 // set alarm duration 1573 for ( ... ) { $\C[1.5in]{// loop for duration}$ 1574 for ( i; N ) { $\C{// perform multiple appends (concatenations)}$ 1575 accum += corpus[ f( i ) ]; 1570 1576 } 1577 count += N; $\C{// count number of appends}\CRT$ 1571 1578 } 1572 STOP_TIMER 1573 \end{cfa} 1574 \end{cquote} 1575 The harness's outer loop executes until a sample-worthy amount of execution has happened. 1576 The inner loop builds up the desired-length string with successive appends, before the outer makes it start over from a blank accumulator. 1577 Each harness run targets a specific (mean) corpus string length and produces one data point on the result graph. 1578 1579 \end{cfa} 1580 The harness's outer loop executes for the experiment duration. 1581 The string is reset to empty before appending (not shown). 1582 The inner loop builds up a growing-length string with successive appends. 1583 Each run targets a specific (mean) corpus string length and produces one data point on the result graph. 1579 1584 Three specific comparisons are made with this harness. 1580 1585 Each picks its own independent-variable basis of comparison. 1581 1582 All three comparisons use the varying-from-1 corpus construction, \ie they allow the STL to show its advantage from small-string optimization. 1586 All three comparisons use the varying-from-1 corpus construction, \ie they allow the STL to show its advantage for SSO. 1583 1587 1584 1588 1585 1589 \subsubsection{Fresh vs Reuse in \CC, Emulation Baseline} 1586 1590 1587 The first experiment compares \CFA with \CC, with \CFA operating in nosharing mode (and \CC having no other mode).1588 This experiment simply baselines how \CFA modestly lags \CC's optimization/tuning level generally, yet reproduces a coarser phenomenon.1589 1590 This experiment also introduces the first \CC coding pitfall, which the next experiment will show is helped by turning on \CFA sharing. By this pitfall, a \CC programmer must pay attention to string variable reuse.1591 1592 \begin{cquote} 1593 \setlength{\tabcolsep}{ 20pt}1591 The first experiment compares \CFA with \CC, with \CFA operating in nosharing mode and \CC having no other mode, hence both string package are using @malloc@/@free@. 1592 % This experiment establishes a baseline for other experiments. 1593 This experiment also introduces the first \CC coding pitfall, which the next experiment shows is helped by turning on \CFA sharing. 1594 % This pitfall shows, a \CC programmer must pay attention to string variable reuse. 1595 In the following, both programs are doing the same thing: start with @accum@ empty and build it up by appending @N@ strings (type @string@ in \CC and the faster @string_res@ in \CFA). 1596 \begin{cquote} 1597 \setlength{\tabcolsep}{40pt} 1594 1598 \begin{tabular}{@{}ll@{}} 1595 1599 % \multicolumn{1}{c}{\textbf{fresh}} & \multicolumn{1}{c}{\textbf{reuse}} \\ … … 1597 1601 1598 1602 for ( ... ) { 1599 @string_res accum;@ // fresh1600 for ( ...)1601 accum @+=@ ... 1603 @string_res accum;@ $\C[1.5in]{// fresh}$ 1604 for ( N ) 1605 accum @+=@ ... $\C{// append}\CRT$ 1602 1606 } 1603 1607 \end{cfa} … … 1606 1610 string_res accum; 1607 1611 for ( ... ) { 1608 @accum = "";@ $\C[1 in]{// reuse\CRT}$1609 for ( ...)1610 accum @+=@ ... 1612 @accum = "";@ $\C[1.5in]{// reuse}$ 1613 for ( N ) 1614 accum @+=@ ... $\C{// append}\CRT$ 1611 1615 } 1612 1616 \end{cfa} 1613 1617 \end{tabular} 1614 1618 \end{cquote} 1615 1616 Both programs are doing the same thing: start with @x@ empty and build it up by appending the same chunks. 1617 A programmer should not have to consider this difference. 1618 But from under the covers, each string being an individual allocation leaks through. 1619 While the inner loop is appending text to an @x@ that had not yet grown to have a large capacity, the program is, naturally, paying to extend the variable-length allocation, occasionally. 1620 This capacity stretching is a sticky property that survives assigning a (short, empty-string) value into an existing initialization. 1621 So, the ``reuse'' version benefits from not growing the allocation on subsequent runs of the inner loop. 1622 Yet, the ``fresh'' version is constantly restarting from a small buffer. 1619 The difference is creating a new or reusing an existing string variable. 1620 The pitfall is that most programmers do not consider this difference. 1621 However, creating a new variable implies deallocating the previous string storage and allocating new empty storage. 1622 As the string grows, further deallocations/allocations are required to release the previous and extend the current string storage. 1623 So, the fresh version is constantly restarting with zero string storage, while the reuse version benefits from having its prior large storage from the last append sequence. 1623 1624 1624 1625 \begin{figure} … … 1626 1627 \includegraphics{plot-string-peq-cppemu.pdf} 1627 1628 % \includegraphics[width=\textwidth]{string-graph-peq-cppemu.png} 1628 \caption{Fresh vs Reuse in \CC, Emulation Baseline. Average time per iteration with one \lstinline{x += y} invocation (lower is better). Comparing \CFA's STL emulation mode with STL implementations, and comparing the ``fresh'' with ``reused'' reset styles.} 1629 \caption{Fresh vs Reuse in \CC, Emulation Baseline. 1630 Average time per iteration with one \lstinline{x += y} invocation (lower is better). 1631 Comparing \CFA's STL emulation mode with STL implementations, and comparing the fresh with reused reset styles.} 1629 1632 \label{fig:string-graph-peq-cppemu} 1630 \end{figure} 1631 1632 \VRef[Figure]{fig:string-graph-peq-cppemu} shows the resulting performance. 1633 The fresh \vs reuse penalty is the dominant difference. 1634 The cost is 40\% averaged over the cases shown and minimally 24\%. 1635 It shows up consistently on both the \CFA and STL implementations, and this cost is more prominent with larger strings. 1636 1637 The lesser \CFA \vs STL difference shows \CFA reproducing STL's performance, up to a 15\% penalty averaged over the cases shown, diminishing with larger strings, and 50\% in the worst case. 1638 This penalty characterizes implementation fine tuning done with STL and not done yet done with \CFA. 1639 1640 1641 \subsubsection{\CFA's Fresh-Reuse Compromise} 1642 1643 This comparison has the same setup as the last one, except that the \CFA implementation is switched to use its sharing mode. The outcome is that the fresh/reuse difference vanishes in \CFA, with \CFA consistently delivering performance that compromises between the two \CC cases. 1644 1645 \begin{figure} 1646 \centering 1647 \includegraphics{string-graph-peq-sharing.pdf} 1633 \bigskip 1634 \bigskip 1635 \includegraphics{plot-string-peq-sharing.pdf} 1648 1636 % \includegraphics[width=\textwidth]{string-graph-peq-sharing.png} 1649 \caption{\CFA Compromise for Fresh \vs Reuse. Average time per iteration with one \lstinline{x += y} invocation (lower is better). Comparing \CFA's sharing mode with STL, and comparing the ``fresh'' with ``reused'' reset styles. The \CC results are repeated from \ref{fig:string-graph-peq-cppemu}.} 1637 \caption{\CFA Compromise for Fresh \vs Reuse. 1638 Average time per iteration with one \lstinline{x += y} invocation (lower is better). 1639 Comparing \CFA's sharing mode with STL, and comparing the fresh with reused reset styles. 1640 The \CC results are repeated from \VRef[Figure]{fig:string-graph-peq-cppemu}.} 1650 1641 \label{fig:string-graph-peq-sharing} 1651 1642 \end{figure} 1652 1643 1653 \VRef[Figure]{fig:string-graph-peq-sharing} has the result. 1654 At append lengths 5 and above, \CFA not only splits the two STL cases, but its slowdown of 16\% over STL with user-managed reuse is close to the baseline \CFA-v-STL implementation difference seen with \CFA in STL-emulation mode. 1655 1656 1657 \subsubsection{\CFA's low overhead for misusing \lstinline{+}} 1658 1659 A further pitfall occurs when the user writes @x = x + y@, rather than @x += y@. Again, they are logically equivalent. 1644 \VRef[Figure]{fig:string-graph-peq-cppemu} shows the resulting performance. 1645 The two fresh (solid) lines and the two reuse (dash) lines are identical, except for lengths $\le$10, where the \CC SSO has a 40\% average and minimally 24\% advantage. 1646 The gap between the fresh and reuse lines is the removal of the dynamic memory allocates and reuse of prior storage, \eg 100M allocations for fresh \vs 100 allocations for reuse across all experiments. 1647 While allocation reduction is huge, data copying dominates the cost, so the lines are still reasonably close together. 1648 1649 1650 \subsubsection{\CFA's Sharing Mode} 1651 1652 This comparison is the same as the last one, except the \CFA implementation is using sharing mode. 1653 Hence, both \CFA's fresh and reuse versions have no memory allocations, and as before, only for reuse does \CC have no memory allocations. 1654 \VRef[Figure]{fig:string-graph-peq-sharing} shows the resulting performance. 1655 For fresh at append lengths 5 and above, \CFA is now closer to the \CC reuse performance, because of removing the dynamic allocations. 1656 However, for reuse, \CFA has slowed down slightly, to performance matching the new fresh version, as the two versions are now implemented virtually the same. 1657 The reason for the \CFA reuse slow-down is the overhead of managing the sharing scheme (primarily maintaining the list of handles), without gaining any benefit. 1658 1659 \begin{comment} 1660 FIND A HOME!!! 1661 The potential benefits of the sharing scheme do not give \CFA an edge over \CC when appending onto a reused string, though the first one helps \CFA win at going onto a fresh string. These abilities are: 1662 \begin{itemize} 1663 \item 1664 To grow a text allocation repeatedly without copying it elsewhere. 1665 This ability is enabled by \CFA's most-recently modified string being located immediately before the text buffer's \emph{shared} bump-pointer area, \ie often a very large greenfield, relative to the \emph{individual} string being grown. 1666 With \CC-reuse, this benefit is already reaped by the user's reuse of a pre-stretched allocation. 1667 Yet \CC-fresh pays the higher cost because its room to grow for free is at most a constant times the original string's length. 1668 \item 1669 To share an individual text allocation across multiple related strings. 1670 This ability is not applicable to appending with @+=@. 1671 It in play in [xref sub-experiment pta] and [xref experiment pbv]. 1672 \item 1673 To share a text arena across unrelated strings, sourcing disparate allocations from a common place. 1674 That is, always allocating from a bump pointer, and never maintaining free lists. 1675 This ability is not relevant to running any append scenario on \CFA with sharing, because appending modifies an existing allocation and is not driving several allocations. 1676 This ability is assessed in [xref experiment allocn]. 1677 \end{itemize} 1678 This cost, of slowing down append-with-reuse, is \CFA paying the piper for other scenarios done well. 1679 \CFA prioritizes the fresh use case because it is more natural. 1680 The \emph{user-invoked} reuse scheme is an unnatural programming act because it deliberately misuses lexical scope: a variable (@accum@) gets its lifetime extended beyond the scope in which it is used. 1681 1682 A \CFA user needing the best performance on an append scenario can still access the \CC-like speed by invoking noshare. 1683 This (indirect) resource management is memory-safe, as compared to that required in \CC to use @string&@, where knowledge of another string's lifetime comes into play. 1684 This abstraction opt-out is also different from invoking the LL API-level option. 1685 In fact, these considerations are orthogonal. 1686 But the key difference is that invoking the LL API would be a temporary measure, to use a workaround of a known \CFA language issue; choosing to exempt a string from sharing is a permanent act of program tuning. 1687 Beyond these comparisons, opting for noshare actually provides program ``eye candy,'' indicating that under-the-hood thinking is becoming relevant here. 1688 \end{comment} 1689 1690 1691 \subsubsection{Misusing Concatenation} 1692 1693 A further pitfall occurs writing the apparently equivalent @x = x + y@ \vs @x += y@. 1660 1694 For numeric types, the generated code is equivalent, giving identical performance. 1661 1695 However, for string types there can be a significant difference. 1662 This pitfall is a particularly likely hazardfor beginners.1696 This pitfall is a particularly likely for beginners. 1663 1697 1664 1698 In earlier experiments, the choice of \CFA API among HL and LL had no impact on the functionality being tested. … … 1708 1742 \end{tabular} 1709 1743 \end{cquote} 1710 Note that this ``Goal'' code functions today in HL.1744 Note, the goal code functions today in HL but with slower performance. 1711 1745 1712 1746 \begin{figure} 1713 1747 \centering 1714 \includegraphics{ string-graph-pta-sharing.pdf}1748 \includegraphics{plot-string-pta-sharing.pdf} 1715 1749 % \includegraphics[width=\textwidth]{string-graph-pta-sharing.png} 1716 \caption{CFA's low overhead for misusing \lstinline{+}. Average time per iteration with one \lstinline{x += y} invocation (lower is better). Comparing \CFA (having implicit sharing activated) with STL, and comparing the \lstinline{+}-then-\lstinline{=} with the \lstinline{+=} append styles. The \lstinline{+=} results are repeated from \VRef[Figure]{fig:string-graph-peq-sharing}.}1750 \caption{CFA's low overhead for misusing concatenation. Average time per iteration with one \lstinline{x += y} invocation (lower is better). Comparing \CFA (having implicit sharing activated) with STL, and comparing the \lstinline{+}-then-\lstinline{=} with the \lstinline{+=} append styles. The \lstinline{+=} results are repeated from \VRef[Figure]{fig:string-graph-peq-sharing}.} 1717 1751 \label{fig:string-graph-pta-sharing} 1718 1752 \end{figure} 1719 1753 1720 \VRef[Figure]{fig:string-graph-pta-sharing} gives the outcome. The STL's penalty is $8 \times$ while \CFA's is only $2 \times$, averaged across the cases shown here. 1754 \VRef[Figure]{fig:string-graph-pta-sharing} gives the outcome, where the Y-axis is log scale because of the large differences. 1755 The STL's penalty is $8 \times$ while \CFA's is only $2 \times$, averaged across the cases shown here. 1721 1756 Moreover, the STL's gap increases with string size, while \CFA's converges. 1722 1757 So again, \CFA helps users who just want to treat strings as values, and not think about the resource management under the covers. 1723 1758 1724 While not a design goal, and not graphed out, \CFA in STL-emulation mode heppened to outperform STL in this case. User-managed allocation reuse did not affect either implementation in this case; only ``fresh'' results are shown. 1725 1726 1727 \subsection{Test: Pass argument} 1759 While not a design goal, and not graphed, \CFA in STL-emulation mode outperformed STL in this case. 1760 User-managed allocation reuse did not affect either implementation in this case; only ``fresh'' results are shown. 1761 1762 1763 \subsection{Test: Pass Argument} 1728 1764 1729 1765 STL has a penalty for passing a string by value, which forces users to think about memory management when communicating values with a function. 1730 The key \CFA value-add is that a user can think of a string simply as a value; this test shows that \CC charges a stiff penalty for thin ing this way, while \CFA does not.1766 The key \CFA value-add is that a user can think of a string simply as a value; this test shows that \CC charges a stiff penalty for thinking this way, while \CFA does not. 1731 1767 This test illustrates a main advantage of the \CFA sharing algorithm (in one case). 1732 It shows STL's small-string optimizationproviding a successful mitigation (in the other case).1768 It shows STL's SSO providing a successful mitigation (in the other case). 1733 1769 1734 1770 The basic operation considered is: … … 1749 1785 1750 1786 } 1751 START_TIMER 1752 for ( i; ... ) { 1753 helper( corpus[ f(i) ] ); // imported from char * previously 1754 COUNT_ONE_OP_DONE 1787 for ( ... ) { // loop for duration 1788 helper( corpus[ f( i ) ] ); 1789 count += 1; 1755 1790 } 1756 STOP_TIMER1757 1791 \end{cfa} 1758 1792 & … … 1761 1795 string_res q = { qref, COPY_VALUE }; 1762 1796 } 1763 // rest same, elided 1764 1765 1766 1767 1768 1769 \end{cfa} 1770 \end{tabular} 1771 \end{cquote} 1772 The Goal (HL) version gives the simplest sketch of the test harness. 1773 It uses a single level of looping. 1774 Each iteration uses a corpus item as the argument to a function call. 1797 1798 1799 1800 1801 \end{cfa} 1802 \end{tabular} 1803 \end{cquote} 1804 The goal (HL) version gives the modified test harness, with a single loop. 1805 Each iteration uses a corpus item as the argument to the function call. 1775 1806 These corpus items were imported to the string heap before beginning the timed run. 1776 1777 1807 1778 1808 \begin{figure} 1779 1809 \centering 1780 \includegraphics{ string-graph-pbv.pdf}1810 \includegraphics{plot-string-pbv.pdf} 1781 1811 % \includegraphics[width=\textwidth]{string-graph-pbv.png} 1812 \begin{tabularx}{\linewidth}{>{\centering\arraybackslash}X >{\centering\arraybackslash}X} (a) & (b) \end{tabularx} 1782 1813 \caption{Average time per iteration (lower is better) with one call to a function that takes a by-value string argument, comparing \CFA (having implicit sharing activated) with STL. 1783 (a) With \emph{Varying-from-1} corpus construction, in which the STL-only benefit of small-string optimization occurs, in varying degrees, at all string sizes. 1784 (b) With \emph{Fixed-size} corpus construction, in which this benefit applies exactly to strings with length below 16. 1785 [TODO: show version (b)]} 1814 (a) With \emph{Varying-from-1} corpus construction, in which the STL-only benefit of SSO optimization occurs, in varying degrees, at all string sizes. 1815 (b) With \emph{Fixed-size} corpus construction, in which this benefit applies exactly to strings with length below 16.} 1786 1816 \label{fig:string-graph-pbv} 1787 1817 \end{figure} 1788 1818 1789 1790 1819 \VRef[Figure]{fig:string-graph-pbv} shows the costs for calling a function that receives a string argument by value. 1791 STL's performance worsens as string length increases, while \CFA has the same performance at all sizes.1792 1820 STL's performance worsens uniformly as string length increases, while \CFA has the same performance at all sizes. 1821 Although the STL is better than \CFA until string length 10 because of the SSO. 1793 1822 While improved, the \CFA cost to pass a string is still nontrivial. 1794 1823 The contributor is adding and removing the callee's string handle from the global list. 1795 This cost is $1.5 \times$ to $2 \times$ over STL's when small-string optimization applies, though this cost should be avoidable in the same case, upon a \CFA realization ofthis optimization.1796 At the larger sizes, when STL has to manage storage for the string, STL runs more than $3 \times$ slower, mainly due to time spent in the general-purpose memory allocator.1797 \PAB{Need to check that. Expecting copying to dominate.} 1824 This cost is $1.5 \times$ to $2 \times$ over STL's when SSO applies, but is avoidable once \CFA realizes this optimization. 1825 At the larger sizes, the STL runs more than $3 \times$ slower, because it has to allocation/deallocate storage for the parameter and copy the argument string to the parameter. 1826 If the \CC string is passed by reference, the results are better and flat across string lengths like \CFA. 1798 1827 1799 1828 … … 1805 1834 1806 1835 A garbage collector, afforded the freedom of managed memory (where it knows about all the pointers and is allowed to modify them), often runs faster than malloc-free in an amortized analysis, even though it must occasionally stop to collect. 1807 The sppedup happens because GC is able to use its collection time to move objects. 1808 (In the case of the mini-allocator powering the \CFA string library, objects are runs of text.) Moving objects lets fresh allocations consume from a large contiguous store of available memory; the ``bump pointer'' bookkeeping for such a scheme is very light. 1836 The speedup happens because GC is able to use its collection time to move objects. 1837 (In the case of the mini-allocator powering the \CFA string library, objects are runs of text.) 1838 Moving objects lets fresh allocations consume from a large contiguous store of available memory; the ``bump pointer'' bookkeeping for such a scheme is very light. 1809 1839 A malloc-free implementation without the freedom to move objects must, in the general case, allocate in the spaces between existing objects; doing so entails the heavier bookkeeping of maintaining a linked structure of freed allocations and/or coalescing freed allocations. 1810 1840 … … 1862 1892 \begin{figure} 1863 1893 \centering 1864 \includegraphics{ string-graph-allocn.pdf}1894 \includegraphics{plot-string-allocn.pdf} 1865 1895 % \includegraphics[width=\textwidth]{string-graph-allocn.png} 1866 1896 \caption{Space and time performance, under varying fraction-live targets, for the five string lengths shown, at \emph{Fixed-size} corpus construction.
Note:
See TracChangeset
for help on using the changeset viewer.