Changeset c721105 for doc/theses


Ignore:
Timestamp:
May 24, 2024, 2:16:09 PM (7 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
df699e0
Parents:
76425bc
Message:

proofreading changes

Location:
doc/theses/mike_brooks_MMath
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    r76425bc rc721105  
    11\chapter{Array}
     2\label{c:Array}
    23
    34\section{Introduction}
     
    126127Unsigned integers have a special status in this type system.
    127128Unlike how C++ allows
    128 \begin{lstlisting}[language=c++]
     129\begin{c++}
    129130template< size_t N, char * msg, typename T >... // declarations
    130 \end{lstlisting}
     131\end{c++}
    131132\CFA does not accommodate values of any user-provided type.
    132133TODO: discuss connection with dependent types.
  • doc/theses/mike_brooks_MMath/background.tex

    r76425bc rc721105  
    1313Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction.
    1414The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
    15 Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), it is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
    16 Many C errors result from performing pointer arithmetic instead of using subscripting.
    17 Some C textbooks erroneously teach pointer arithmetic suggesting it is faster than subscripting.
     15Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), the computation is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
     16Many C errors result from performing pointer arithmetic instead of using subscripting;
     17some C textbooks teach pointer arithmetic erroneously suggesting it is faster than subscripting.
     18A sound and efficient C program does not require explicit pointer arithmetic.
    1819
    1920C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
     
    2728Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture).
    2829% Now, set aside for a moment the claim that this first assertion is giving information about a type.
    29 Clearly, an array and a pointer to its first element are different things.
     30Clearly, an array and a pointer to its first element are different.
    3031
    3132In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
     
    9293\CFA provides its own type, variable and routine declarations, using a simpler syntax.
    9394The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.
    94 The qualifiers have the same meaning in \CFA as in C.
     95The qualifiers have the same syntax and semantics in \CFA as in C.
    9596Then, a \CFA declaration is read left to right, where a function return type is enclosed in brackets @[@\,@]@.
    9697\begin{cquote}
     
    119120\end{tabular}
    120121\end{cquote}
    121 As declaration complexity increases, it becomes corresponding difficult to read and understand the C declaration form.
     122As declaration size increases, it becomes corresponding difficult to read and understand the C declaration form, whereas reading and understanding a \CFA declaration has linear complexity as the declaration size increases.
    122123Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations.
    123124
    124 \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef{XXX}.
     125\VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}.
    125126The \CFA-thesis column shows the new array declaration form, which is my contributed improvements for safety and ergonomics.
    126127The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features.
     
    152153        \hline
    153154        & ar.\ of immutable val. & @const T x[10];@ & @[10] const T x@ & @const array(T, 10) x@ \\
    154     & & @T const x[10];@ & @[10] T const x@ & @array(T, 10) const x@ \\
     155        & & @T const x[10];@ & @[10] T const x@ & @array(T, 10) const x@ \\
    155156        \hline
    156157        & ar.\ of ptr.\ to value & @T * x[10];@ & @[10] * T x@ & @array(T *, 10) x@ \\
     
    203204``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11}
    204205\end{quote}
    205 This phenomenon is the famous ``pointer decay,'' which is a decay of an array-typed expression into a pointer-typed one.
     206This phenomenon is the famous \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one.
    206207It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@.
    207208Thus, subscripting happens on pointers not arrays.
     
    212213Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing!
    213214
    214 Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
    215 While the standard affords a C compiler freedom about the meaning of an out-of-bound access,
    216 or of subscripting a pointer that does not refer to an array element at all,
    217 the fact that C is famously both generally high-performance, and specifically not bound-checked,
    218 leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
     215Subscripting a pointer when the target is standard inappropriate is still practically well-defined.
     216While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all,
     217the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
    219218Moreover, consider the common pattern of subscripting on a @malloc@ result:
    220219\begin{cfa}
     
    226225
    227226Under this assumption, a pointer being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around the @p@ address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@.
    228 I call this phenomenon ``array diffraction,'' which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions.
     227I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions.
    229228No pointer is exempt from array diffraction.
    230229No array shows its elements without pointer decay.
     
    242241The caller of such a function is left with the reality that a pointer parameter is a pointer, no matter how it is spelled:
    243242\lstinput{18-21}{bkgd-carray-decay.c}
    244 This fragment gives no warnings.
     243This fragment gives the warning for the first argument, in the second call.
     244\begin{cfa}
     245warning: 'f' accessing 40 bytes in a region of size 4
     246\end{cfa}
    245247
    246248The shortened parameter syntax @T x[]@ is a further way to spell ``pointer.''
     
    248250This point of confusion is illustrated in:
    249251\lstinput{23-30}{bkgd-carray-decay.c}
     252Note, \CC gives a warning for the initialization of @cp@.
     253\begin{cfa}
     254warning: ISO C++ forbids converting a string constant to 'char*'
     255\end{cfa}
     256and C gives a warning at the call of @edit@, if @const@ is added to the declaration of @cp@.
     257\begin{cfa}
     258warning: passing argument 1 of 'edit' discards 'const' qualifier from pointer target type
     259\end{cfa}
    250260The basic two meanings, with a syntactic difference helping to distinguish,
    251 are illustrated in the declarations of @ca@ vs.\ @cp@,
     261are illustrated in the declarations of @ca@ \vs @cp@,
    252262whose subsequent @edit@ calls behave differently.
    253263The syntax-caused confusion is in the comparison of the first and last lines,
     
    300310        \hline
    301311        & ptr.\ to ptr.\ to imm.\ val. & @const char **@ & @const char * argv[],@ & @[] * const char argv,@ \\
    302     & & & \emph{others elided} & \emph{others elided} \\
     312        & & & \emph{others elided} & \emph{others elided} \\
    303313        \hline
    304314\end{tabular}
    305315\end{table}
     316
     317
     318\subsection{Multi-Dimensional}
     319
     320As in the last section, multi-dimensional array declarations are examined.
     321\lstinput{16-18}{bkgd-carray-mdim.c}
     322The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
     323\lstinput{20-44}{bkgd-carray-mdim.c}
    306324
    307325
     
    323341Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not;
    324342both @gcc@ and @g++@ support VLAs.
    325 As well, there is misinformation about VLAs, \eg VLAs cause stack failures or are inefficient.
     343As well, there is misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
    326344VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type.
    327345
    328346For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads).
    329 Here, VLAs can overflow the stack, so a heap allocation is used.
     347Here, VLAs can overflow the stack without appropriately sizing the stack, so a heap allocation is used.
    330348\begin{cfa}
    331349float * ax1 = malloc( sizeof( float[n] ) );
    332 float * ax2 = malloc( n * sizeof( float ) );
     350float * ax2 = malloc( n * sizeof( float ) );    $\C{// arrays}$
    333351float * bx1 = malloc( sizeof( float[1000000] ) );
    334352float * bx2 = malloc( 1000000 * sizeof( float ) );
     
    384402\subsection{The pointer-to-array type has been noticed before}
    385403
    386 \subsection{Multi-Dimensional}
    387 
    388 As in the last section, we inspect the declaration ...
    389 \lstinput{16-18}{bkgd-carray-mdim.c}
    390 The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
    391 \lstinput{20-44}{bkgd-carray-mdim.c}
    392 
    393404
    394405\section{Linked List}
     
    398409Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous.
    399410
    400 Linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc.
     411Storage linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc.
    401412Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value;
    402413hence, all data structure routines take and return pointers to nodes and not the nodes themselves.
    403 
    404 
    405 \begin{comment}
    406 \subsection{Linked-List Packages}
    407 
    408 C only supports type-eraser polymorphism, with no help from the type system.
    409 This approach is used in the @queue@ library providing macros that define and operate on four types of data structures: singly-linked lists, singly-linked tail queues, lists, and tail queues.
    410 These linked structures are \newterm{intrusive list}, where the link fields are defined (intrude) with data fields.
    411 \begin{cfa}
    412 struct DS {
    413         // link fields, intrustive
    414         // data fields
    415 }
    416 \end{cfa}
    417 
    418 \uCpp~\cite{uC++} is a concurrent extension of \CC, and provides a basic set of intrusive lists, where the link fields are defined with the data fields using inheritance.
    419 \begin{cfa}
    420 struct DS : public uColable {
    421         // implicit link fields
    422         // data fields
    423 }
    424 \end{cfa}
    425 
    426 Intrusive nodes eliminate the need to dynamically allocate/deallocate the link fields when a node is added/removed to/from a data-structure.
    427 Reducing dynamic allocation is important in concurrent programming because the heap is a shared resource with the potential for high contention.
    428 The two formats are one link field, which form a \Index{collection}, and two link fields, which form a \Index{sequence}.
    429 \begin{center}
    430 %\input{DSLNodes}
    431 \end{center}
    432 @uStack@ and @uQueue@ are collections and @uSequence@ is a sequence.
    433 To get the appropriate link fields associated with a user node, it must be a public descendant of @uColable@\index{uColable@@uColable@} or @uSeqable@\index{uSeqable@@uSeqable@}, respectively, e.g.:
    434 %[
    435 class stacknode : public uColable { ... }
    436 class queuenode : public uColable { ... }
    437 class seqnode : public uSeqable { ... }
    438 %]
    439 A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inheriting from @uColable@ can only appear in a collection.
    440 Along with providing the appropriate link fields, the types @uColable@ and @uSeqable@ also provide one member routine:
    441 %[
    442 bool listed() const;
    443 %]
    444 which returns @true@ if the node is an element of any collection or sequence and @false@ otherwise.
    445 
    446 Finally, no header files are necessary to access the uC DSL.
    447 
    448 Some uC DSL restrictions are:
    449 \begin{itemize}
    450 \item
    451 None of the member routines are virtual in any of the data structures for efficiency reasons.
    452 Therefore, pointers to data structures must be used with care or incorrect member routines may be invoked.
    453 \end{itemize}
    454 \end{comment}
    455414
    456415
     
    462421Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}).
    463422\begin{itemize}
    464     \item A doubly-linked list is being designed.
    465           Generally, the discussed issues apply similarly for singly-linked lists.
    466           Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
    467     \item Link fields are system-managed.
    468           The user works with the system-provided API to query and modify list membership.
    469           The system has freedom over how to represent these links.
     423        \item A doubly-linked list is being designed.
     424                Generally, the discussed issues apply similarly for singly-linked lists.
     425                Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
     426        \item Link fields are system-managed.
     427                The user works with the system-provided API to query and modify list membership.
     428                The system has freedom over how to represent these links.
    470429        \item The user data must provide storage for the list link-fields.
    471           Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
     430                Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
    472431\end{itemize}
    473432
     
    478437and further libraries are introduced as needed.
    479438\begin{enumerate}
    480     \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
    481     \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl}
     439        \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
     440        \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl}
    482441\end{enumerate}
    483 A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
    484 
     442%A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
    485443For the discussion, assume the fictional type @req@ (request) is the user's payload in examples.
    486444As well, the list library is helping the user manage (organize) requests, \eg a request can be work on the level of handling a network arrival event or scheduling a thread.
     
    492450Link attachment deals with the question:
    493451Where are the libraries' inter-element link fields stored, in relation to the user's payload data fields?
    494 Figure~\ref{fig:lst-issues-attach} shows three basic styles.
    495 The \newterm{intrusive} style places the link fields inside the payload structure.
    496 The two \newterm{wrapped} styles place the payload inside a generic library-provided structure that then defines the link fields.
    497 Library LQ is intrusive; STL is wrapped.
    498 The wrapped style further distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@.
     452\VRef[Figure]{fig:lst-issues-attach} shows three basic styles.
     453\VRef[Figure]{f:Intrusive} shows the \newterm{intrusive} style, placing the link fields inside the payload structure.
     454\VRef[Figures]{f:WrappedRef} and \subref*{f:WrappedValue} show the two \newterm{wrapped} styles, which place the payload inside a generic library-provided structure that then defines the link fields.
     455The wrapped style distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@.
    499456(For this discussion, @list<req &>@ is similar to @list<req *>@.)
    500457This difference is one of user style, not framework capability.
     458Library LQ is intrusive; STL is wrapped with reference and value.
    501459
    502460\begin{comment}
    503461\begin{figure}
    504     \begin{tabularx}{\textwidth}{Y|Y|Y}
     462        \begin{tabularx}{\textwidth}{Y|Y|Y}
    505463                \lstinput[language=C]{20-39}{lst-issues-intrusive.run.c}
    506         &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp}
    507         &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp}
    508       \\ & &
    509       \\
    510         \includegraphics[page=1]{lst-issues-attach.pdf}
    511         &
    512         \includegraphics[page=2]{lst-issues-attach.pdf}
    513         &
    514         \includegraphics[page=3]{lst-issues-attach.pdf}
    515       \\ & &
    516       \\
    517         (a) & (b) & (c)
    518     \end{tabularx}
     464                &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp}
     465                &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp}
     466          \\ & &
     467          \\
     468                \includegraphics[page=1]{lst-issues-attach.pdf}
     469                &
     470                \includegraphics[page=2]{lst-issues-attach.pdf}
     471                &
     472                \includegraphics[page=3]{lst-issues-attach.pdf}
     473          \\ & &
     474          \\
     475                (a) & (b) & (c)
     476        \end{tabularx}
    519477\caption{
    520         Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
    521         The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
    522         head objects are discussed in Section~\ref{toc:lst:issue:ident}.
    523         In (a), the field \lstinline{req.x} names a list direction;
    524         these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
    525         In (b) and (c), the type \lstinline{node} represents a system-internal type,
    526         which is \lstinline{std::_List_node} in the GNU implementation.
    527         (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
    528     }
    529     \label{fig:lst-issues-attach}
     478                Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
     479                The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
     480                head objects are discussed in Section~\ref{toc:lst:issue:ident}.
     481                In (a), the field \lstinline{req.x} names a list direction;
     482                these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
     483                In (b) and (c), the type \lstinline{node} represents a system-internal type,
     484                which is \lstinline{std::_List_node} in the GNU implementation.
     485                (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
     486        }
     487        \label{fig:lst-issues-attach}
    530488\end{figure}
    531489\end{comment}
     
    569527
    570528\caption{
    571         Three styles of link attachment:
    572                 % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value. 
    573         The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
    574         head objects are discussed in Section~\ref{toc:lst:issue:ident}.
    575         In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
    576         these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
    577         In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
     529                Three styles of link attachment:
     530                % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value.
     531                The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
     532                head objects are discussed in Section~\ref{toc:lst:issue:ident}.
     533                In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
     534                these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
     535                In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
    578536                library-internal type, which is \lstinline{std::_List_node} in the GNU implementation
    579         \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}.
    580     }
    581     \label{fig:lst-issues-attach}
     537                \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}.
     538        }
     539        \label{fig:lst-issues-attach}
    582540\end{figure}
    583541
    584542Each diagrammed example is using the fewest dynamic allocations for its respective style:
    585 in \subref*{f:Intrusive}, here are no dynamic allocations, in \subref*{f:WrappedRef} only the linked fields are dynamically allocated, and in \subref*{f:WrappedValue} the copied data and linked fields are dynamically allocated.
    586 The advantage of intrusive attachment is the control in memory layout and storage placement.
    587 Both wrapped attachment styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
     543in intrusive, here is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copied data and linked fields are dynamically allocated.
     544The advantage of intrusive is the control in memory layout and storage placement.
     545Both wrapped styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
    588546In all three cases, a @req@ object can enter and leave a list many times.
    589 However, in \subref*{f:Intrusive} a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
    590 In \subref*{f:WrappedRef}, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency.
    591 In \subref*{f:WrappedValue}, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes;
     547However, in intrusive a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
     548In wrapped reference, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency.
     549In wrapped value, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes;
    592550however, knowing which of the @req@ object is the ``true'' object becomes complex.
    593551\see*{\VRef{toc:lst:issue:simultaneity} for further discussion.}
     
    601559
    602560A further aspect of layout control is allowing the user to explicitly specify link fields controlling attributes and placement within the @req@ object.
    603 LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.};
     561LQ allows this ability through the @LIST_ENTRY@ macro;\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.}
    604562supplying the link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@.
    605563An example of an explicit attribute is cache alignment of the link fields in conjunction with other @req@ fields, improving locality and/or avoiding false sharing.
     
    608566
    609567Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
    610 In LQ, \subref*{f:Intrusive}, a @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
     568In LQ, the intrusive @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
    611569there is no distinguishing a @req@ from ``a @req@ in a list.''
    612 The same is not true of STL, \subref*{f:WrappedRef} or \subref*{f:WrappedValue}.
    613 There, the analogous operations work on a parameter of type @list<T>::iterator@;
    614 they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@.
    615 There is no mapping from @req &@ to @list<req>::iterator@, except for linear search.
    616 
    617 The advantage of wrapped attachment is the abstraction of a data item from its list membership(s).
     570The same is not true of STL, wrapped reference or value.
     571There, the analogous operations, @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@, work on a parameter of type @list<T>::iterator@;
     572There is no mapping from @req &@ to @list<req>::iterator@. %, for linear search.
     573
     574The advantage of wrapped is the abstraction of a data item from its list membership(s).
    618575In the wrapped style, the @req@ type can come from a library that serves many independent uses,
    619576which generally have no need for listing.
    620 Then, a novel use can put @req@ in list, without requiring any upstream change in the @req@ library.
    621 In intrusive attachment, the ability to be listed must be planned during the definition of @req@.
    622 
    623 Finally, for wrapper reference a single node can appear at multiple places in the same list or different list, which might be useful in certain read-only cases.
    624 For intrusive and wrapper value, a node must be duplicated to appear at multiple locations, presenting additional cost.
    625 This scenario becomes difficult to imagine when the nodes are written because three link styles have issues.
     577Then, a novel use can put a @req@ in a list, without requiring any upstream change in the @req@ library.
     578In intrusive, the ability to be listed must be planned during the definition of @req@.
    626579
    627580\begin{figure}
    628     \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp}
    629     \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp}
    630     \caption{
    631         Reduction of wrapped attachment to intrusive attachment.
    632         Illustrated by pseudocode implementation of an STL-compatible API fragment
    633         using LQ as the underlying implementation.
    634         The gap that makes it pseudocode is that
    635         the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
    636         When using a custom-patched version of LQ to work around this issue,
    637         the programs of Figure~\ref{f:WrappedRef} and \protect\subref*{f:WrappedValue} work with this shim in place of real STL.
    638         Their executions lead to the same memory layouts.
    639     }
    640     \label{fig:lst-issues-attach-reduction}
     581        \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp}
     582        \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp}
     583        \caption{
     584                Simulation of wrapped using intrusive.
     585                Illustrated by pseudocode implementation of an STL-compatible API fragment using LQ as the underlying implementation.
     586                The gap that makes it pseudocode is that
     587                the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
     588                When using a custom-patched version of LQ to work around this issue,
     589                the programs of Figure~\ref{f:WrappedRef} and wrapped value work with this shim in place of real STL.
     590                Their executions lead to the same memory layouts.
     591        }
     592        \label{fig:lst-issues-attach-reduction}
    641593\end{figure}
    642594
    643 Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
     595It is possible to simulate wrapped using intrusive, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
    644596This shim layer performs the implicit dynamic allocations that pure intrusion avoids.
    645597But there is no reduction going the other way.
    646598No shimming can cancel the allocations to which wrapped membership commits.
    647599
    648 So intrusion is a lower-level listing primitive.
    649 And so, the system design choice is not between forcing users to use intrusion or wrapping.
     600Because intrusion is a lower-level listing primitive, the system design choice is not between forcing users to use intrusion or wrapping.
    650601The choice is whether or not to provide access to an allocation-free layer of functionality.
     602An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
    651603A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
    652 An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
    653604
    654605
     
    657608
    658609\begin{figure}
    659     \parbox[t]{3.5in} {
    660         \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
    661     }\parbox[t]{20in} {
    662         ~\\
    663         \includegraphics[page=1]{lst-issues-direct.pdf} \\
    664         ~\\
    665         \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
    666     }
    667     \caption{
    668         Example of simultaneity using LQ lists. 
    669         The zoomed-out diagram (right/top) shows the complete multi-linked data structure.
    670         This structure can navigate all requests in priority order, and navigate among requests with a common request value.
    671         The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists.
    672     }
    673     \label{fig:lst-issues-multi-static}
     610        \parbox[t]{3.5in} {
     611                \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
     612        }\parbox[t]{20in} {
     613                ~\\
     614                \includegraphics[page=1]{lst-issues-direct.pdf} \\
     615                ~\\
     616                \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
     617        }
     618        \caption{
     619                Example of simultaneity using LQ lists.
     620                The zoomed-out diagram (right/top) shows the complete multi-linked data structure.
     621                This structure can navigate all requests in priority order ({\color{blue}blue}), and navigate among requests with a common request value ({\color{orange}orange}).
     622                The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists.
     623        }
     624        \label{fig:lst-issues-multi-static}
    674625\end{figure}
    675626
     
    681632The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists).
    682633
    683 As stated, the limitation of intrusive attachment is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
     634As stated, the limitation of intrusive is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
    684635Thus, the intrusive LQ example supports multiple, but statically many, link lists.
    685 Note, it is possible to reuse links for different purposes, \eg if a list in linked one at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
     636Note, it is possible to reuse links for different purposes, \eg if a list in linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
    686637This feature is used in the \CFA runtime where a thread node may be on a blocked or running list, both never on both simultaneously.
    687638
     
    696647Again, it is possible to construct the same simultaneity by creating multiple STL lists, each copying the appropriate nodes, where the intrusive links become the links for each separate STL list.
    697648The upside is the same as for wrapped-reference arrangement with an unlimited number of a list bindings.
    698 The downside is the dynamic allocation and significant storage usage due to copying.
     649The downside is the dynamic allocation and significant storage usage due to node copying.
    699650As well, it is unclear how node updates work in this scenario, without some notation of ultimately merging node information.
    700651
     
    709660% The example uses @x@; @reqs@ would be a more readily ignored choice. \PAB{wording?}
    710661
    711 \uCpp offers an intrusive list that makes the opposite ergonomic choice.  TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions.
    712 
    713 STL may seem to have similar ergonomics to LQ, but in fact, the current ergonomic distinction is not applicable there,
    714 where one static direction is enough to achieve multiple dynamic directions.
    715 Note that all options in Figure~\ref{fig:lst-issues-attach} have a \emph{variable} named @refs@
    716 just as both Figure~\ref{fig:lst-issues-multi-static} and Figure~(TODO~new) have \emph{variables} with names including @pri@ vs @rqr@.
    717 But only the intrusive model has this naming showing up within the definition of a structure.
    718 This lack of named parts of a structure lets Figure~\ref{fig:lst-issues-attach} \subref*{f:WrappedRef} and \subref*{f:WrappedValue}, just like \uCpp,
    719 insert into a list without mentioning a part's name, while only version \subref*{f:Intrusive} has to mention @x@ at this step.
    720 LQ demands this same extraneous part-naming when removing, iterating, and even asking for a neighbour.
    721 At issue in this distinction is whether an API that offers multiple static directions (and so requires these to be named differently)
    722 allows the sole direction (when several are not wanted) to be \emph{implicit}.
    723 \uCpp allows it, LQ does not, and STL does not have this question as applicable.
     662\uCpp is a concurrent extension of \CC, which provides a basic set of intrusive lists~\cite[appx.~F]{uC++}, where the link fields are defined with the data fields using inheritance.
     663\begin{cquote}
     664\setlength{\tabcolsep}{15pt}
     665\begin{tabular}{@{}ll@{}}
     666\multicolumn{1}{c}{singly-linked list} & \multicolumn{1}{c}{doubly-linked list} \\
     667\begin{c++}
     668struct Node : public uColable {
     669        int i;  // data
     670        Node( int i ) : i{ i } {}
     671};
     672\end{c++}
     673&
     674\begin{c++}
     675struct Node : public uSeqable {
     676        int i;  // data
     677        Node( int i ) : i{ i } {}
     678};
     679\end{c++}
     680\end{tabular}
     681\end{cquote}
     682A node can be placed in the following data structures depending on its link fields: @uStack@ and @uQueue@ (singly linked), and @uSequence@ (doubly linked).
     683A node inheriting from @uSeqable@ can appear in a singly or doubly linked structure.
     684Structure operations implicitly know the link-field location through the inheritance.
     685\begin{c++}
     686uStack<Node> stack;
     687Node node;
     688stack.push( node );  // link fields at beginning of node
     689\end{c++}
     690
     691Simultaneity cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.
     692Instead, a special type is require that contains the link fields and points at the node.
     693\begin{cquote}
     694\setlength{\tabcolsep}{10pt}
     695\begin{tabular}{@{}ll@{}}
     696\begin{c++}
     697struct NodeDL : public uSeqable {
     698        @Node & node;@  // node pointer
     699        NodeDL( Node & node ) : node( node ) {}
     700        Node & get() const { return node; }
     701};
     702\end{c++}
     703&
     704\begin{c++}
     705struct Node : public uColable {
     706        int i;  // data
     707        @NodeDL nodeseq;@  // embedded intrusive links
     708        Node( int i ) : i{ i }, @nodeseq{ this }@ {}
     709};
     710\end{c++}
     711\end{tabular}
     712\end{cquote}
     713This node can now be inserted into a doubly-linked list through the embedded intrusive links.
     714\begin{c++}
     715uSequence<NodeDL> sequence;
     716sequence.add_front( node.nodeseq );             $\C{// link fields in embedded type}$
     717NodeDL nodedl = sequence.remove( node.nodeseq );
     718int i = nodedl.get().i;                                 $\C{// indirection to node}$
     719\end{c++}
     720Hence, the \uCpp approach optimizes one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.
     721However, \uCpp cannot apply the LQ trick for finding the links and node.
     722
     723The major ergonomic difference among the approaches is naming and name usage.
     724The intrusive model requires naming each set of intrusive links, \eg @by_pri@ and @by_rqr@ in \VRef[Figure]{fig:lst-issues-multi-static}.
     725\uCpp cheats by using inheritance for the first intrusive links, after which explicit naming of intrusive links is required.
     726Furthermore, these link names must be used in all list operations, including iterating, whereas wrapped reference and value hide the list names in the implicit dynamically-allocated node-structure.
     727At issue is whether an API for simultaneity can support one list (when several are not wanted) to be \emph{implicit}.
     728\uCpp allows it, LQ does not, and the STL does not have this question.
    724729
    725730
     
    737742\label{toc:lst:issue:ident}
    738743
    739 All examples so far have used distinct user-facing types: 
     744All examples so far have used distinct user-facing types:
    740745an item found in a list (type @req@, of variables like @r1@), and
    741746a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@).
     
    781786A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@.
    782787A wide C character constant is the same, except prefixed by the letter @L@, @u@, or @U@, \eg @u'\u25A0'@ (black square), where the @\u@ identifies a universal character name.
    783 A character can be formed from an escape sequence, which expresses a non-typable character (@'\n'@), a delimiter character @'\''@, or a raw character @'\x2f'@.
    784 
    785 A character sequence is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
     788A character can be formed from an escape sequence, which expresses a non-typable character @'\f'@ form feed, a delimiter character @'\''@ embedded single quote, or a raw character @'\xa3'@ \textsterling.
     789
     790A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
    786791The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@.
    787792
    788793For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multibyte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
    789 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding of the multibyte character sequence, \eg @L"abc@$\mu$@"@ and read/print using @wsanf@/@wprintf@.
     794For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multibyte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.
    790795The value of a wide-character is implementation-defined, usually a UTF-16 character.
    791796For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multibyte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
     
    802807Unfortunately, this design decision is both unsafe and inefficient.
    803808It is common error in C to forget the space in a character array for the terminator or overwrite the terminator, resulting in array overruns in string operations.
    804 The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is not possible to cache the length in many cases.
     809The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is impossible to cache the length in many cases.
    805810
    806811C strings are fixed size because arrays are used for the implementation.
     
    808813As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results.
    809814
    810 Collectively, these design decisions make working with strings in C, awkward, time consuming, and very unsafe.
     815Collectively, these design decisions make working with strings in C, awkward, time consuming, and unsafe.
    811816While there are companion string routines that take the maximum lengths of strings to prevent array overruns, that means the semantics of the operation can fail because strings are truncated.
    812817Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the @string@ type.
  • doc/theses/mike_brooks_MMath/intro.tex

    r76425bc rc721105  
    33All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
    44Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
     5For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor.
    56
    6 \cite{Blache19}
    7 \cite{Oorschot23}
    8 \cite{Ruef19}
    97
    108\section{Array}
    119
    12 Array provides a homogeneous container with $O(1)$ access to elements using subscripting.
     10An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic).
    1311The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
    1412For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
     13Because array layout has contiguous components, subscripting is a computation.
     14However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
     15The goal is to provide good performance with safety.
    1516
    1617
    1718\section{Linked List}
    1819
    19 Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
     20A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
    2021Subscripting by value is sometimes available, \eg hash table.
    2122Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
     
    2627\section{String}
    2728
    28 String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
    29 What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements.
    30 Nevertheless, subscripting is often available.
    31 The cost of string operations is less important than the power of the block operation to accomplish complex manipulation.
    32 The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
     29A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
     30What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@.
     31Subscripting individual elements is often available.
     32Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
     33The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
    3334
    3435
    3536\section{Motivation}
    3637
    37 The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{Cforall}, which is based on C.
     38The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C.
    3839Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
    3940Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
     
    4546However, most programming languages are only partially explained by standard's manuals.
    4647When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
    47 Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
     48Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
    4849While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
    4950These example programs show
     
    5556This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
    5657Any discovered anomalies among compilers or versions is discussed.
    57 In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
     58In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
    5859
    5960
     
    7677\end{cfa}
    7778with a segmentation fault at runtime.
    78 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness.
    79 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings.
     79Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
     80Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable.
    8081In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
    8182Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
     
    99100
    100101\subsection{String}
     102
     103\subsection{Iterator}
Note: See TracChangeset for help on using the changeset viewer.