Changeset ca4f2b2 for doc/theses


Ignore:
Timestamp:
May 13, 2024, 10:07:35 AM (7 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
31f4837, ccfbfd9
Parents:
bf4fe05
Message:

proofread string section of background chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/background.tex

    rbf4fe05 rca4f2b2  
    437437class seqnode : public uSeqable { ... }
    438438%]
    439 A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inherting from @uColable@ can only appear in a collection.
     439A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inheriting from @uColable@ can only appear in a collection.
    440440Along with providing the appropriate link fields, the types @uColable@ and @uSeqable@ also provide one member routine:
    441441%[
     
    604604supplying the link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@.
    605605An example of an explicit attribute is cache alignment of the link fields in conjunction with other @req@ fields, improving locality and/or avoiding false sharing.
    606 Wrapped reference has no control over the link fields, but the seperate data allows some control;
     606Wrapped reference has no control over the link fields, but the separate data allows some control;
    607607wrapped value has no control over data or links.
    608608
     
    690690Each group of intrusive links become the links for each separate STL list.
    691691The upside is the unlimited number of a lists a node can be associated with simultaneously, any number of STL lists can be created dynamically.
    692 The downside is the dynamic allocation of the link nodes and manging multiple lists.
     692The downside is the dynamic allocation of the link nodes and managing multiple lists.
    693693Note, it might be possible to wrap the multiple lists in another type to hide this implementation issue.
    694694
     
    776776\section{String}
    777777
    778 A string is a logical sequence of symbols, where the form of the symbols can vary significantly: 7/8-bit characters (ASCII/Latin-1), or 2/4/8-byte (UNICODE) characters/symbols or variable length (UTF-8/16/32) characters.
     778A string is a sequence of symbols, where the form of a symbol can vary significantly: 7/8-bit characters (ASCII/Latin-1), or 2/4/8-byte (UNICODE) characters/symbols or variable length (UTF-8/16/32) characters.
    779779A string can be read left-to-right, right-to-left, top-to-bottom, and have stacked elements (Arabic).
    780780
    781 An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in @'x'@.
    782 A wide character constant is the same, except prefixed by the letter @L@, @u@, or @U@.
    783 Except for escape sequences, the elements of the sequence are any members of the source character set;
    784 they are mapped in an implementation-defined manner to members of the execution character set.
    785 
    786 A C character-string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in @"xyz"@.
    787 A UTF-8 string literal is the same, except prefixed by @u8@.
    788 A wide string literal is the same, except prefixed by the letter @L@, @u@, or @U@.
    789 
    790 For UTF-8 string literals, the array elements have type @char@, and are initialized with the characters of the multibyte character sequence, as encoded in UTF-8.
    791 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the @mbstowcs@ function with an implementation-defined current locale.
    792 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by successive calls to the @mbrtoc16@, or @mbrtoc32@ function as appropriate for its type, with an implementation-defined current locale.
     781A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@.
     782A wide C character constant is the same, except prefixed by the letter @L@, @u@, or @U@, \eg @u'\u25A0'@ (black square), where the @\u@ identifies a universal character name.
     783A character can be formed from an escape sequence, which expresses a non-typable character (@'\n'@), a delimiter character @'\''@, or a raw character @'\x2f'@.
     784
     785A character sequence is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
     786The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@.
     787
     788For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multibyte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
     789For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding of the multibyte character sequence, \eg @L"abc@$\mu$@"@ and read/print using @wsanf@/@wprintf@.
     790The value of a wide-character is implementation-defined, usually a UTF-16 character.
     791For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multibyte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
     792The value of a @"u"@ character is an UTF-16 character;
     793the value of a @"U"@ character is an UTF-32 character.
    793794The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.
    794795
    795 
    796 Another bad C design decision is to have null-terminated strings rather than maintaining a separate string length.
     796C strings are null-terminated rather than maintaining a separate string length.
    797797\begin{quote}
    798798Technically, a string is an array whose elements are single characters.
     
    800800This representation means that there is no real limit to how long a string can be, but programs have to scan one completely to determine its length.
    801801\end{quote}
     802Unfortunately, this design decision is both unsafe and inefficient.
     803It is common error in C to forget the space in a character array for the terminator or overwrite the terminator, resulting in array overruns in string operations.
     804The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is not possible to cache the length in many cases.
     805
     806C strings are fixed size because arrays are used for the implementation.
     807However, string manipulation commonly results in dynamically-sized temporary and final string values.
     808As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results.
     809
     810Collectively, these design decisions make working with strings in C, awkward, time consuming, and very unsafe.
     811While there are companion string routines that take the maximum lengths of strings to prevent array overruns, that means the semantics of the operation can fail because strings are truncated.
     812Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the @string@ type.
Note: See TracChangeset for help on using the changeset viewer.