Changeset 0775468
- Timestamp:
- May 6, 2024, 1:14:46 PM (7 months ago)
- Branches:
- master
- Children:
- 123e8b9
- Parents:
- 297b796
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/background.tex
r297b796 r0775468 395 395 396 396 Linked-lists are blocks of storage connected using one or more pointers. 397 The storage block is logically divided into data and links (pointers), where the links are the only component used by the list structure. 398 Since the data is opaque, list structures are often polymorphic over the data, which is normally homogeneous. 397 The storage block is logically divided into data (user payload) and links (list pointers), where the links are the only component used by the list structure. 398 Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous. 399 400 Linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc. 401 Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value; 402 hence, all data structure routines take and return pointers to nodes and not the nodes themselves. 403 404 405 \begin{comment} 406 \subsection{Linked-List Packages} 407 408 C only supports type-eraser polymorphism, with no help from the type system. 409 This approach is used in the @queue@ library providing macros that define and operate on four types of data structures: singly-linked lists, singly-linked tail queues, lists, and tail queues. 410 These linked structures are \newterm{intrusive list}, where the link fields are defined (intrude) with data fields. 411 \begin{cfa} 412 struct DS { 413 // link fields, intrustive 414 // data fields 415 } 416 \end{cfa} 417 418 \uCpp~\cite{uC++} is a concurrent extension of \CC, and provides a basic set of intrusive lists, where the link fields are defined with the data fields using inheritance. 419 \begin{cfa} 420 struct DS : public uColable { 421 // implicit link fields 422 // data fields 423 } 424 \end{cfa} 425 426 Intrusive nodes eliminate the need to dynamically allocate/deallocate the link fields when a node is added/removed to/from a data-structure. 427 Reducing dynamic allocation is important in concurrent programming because the heap is a shared resource with the potential for high contention. 428 The two formats are one link field, which form a \Index{collection}, and two link fields, which form a \Index{sequence}. 429 \begin{center} 430 %\input{DSLNodes} 431 \end{center} 432 @uStack@ and @uQueue@ are collections and @uSequence@ is a sequence. 433 To get the appropriate link fields associated with a user node, it must be a public descendant of @uColable@\index{uColable@@uColable@} or @uSeqable@\index{uSeqable@@uSeqable@}, respectively, e.g.: 434 %[ 435 class stacknode : public uColable { ... } 436 class queuenode : public uColable { ... } 437 class seqnode : public uSeqable { ... } 438 %] 439 A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inherting from @uColable@ can only appear in a collection. 440 Along with providing the appropriate link fields, the types @uColable@ and @uSeqable@ also provide one member routine: 441 %[ 442 bool listed() const; 443 %] 444 which returns @true@ if the node is an element of any collection or sequence and @false@ otherwise. 445 446 Finally, no header files are necessary to access the uC DSL. 447 448 Some uC DSL restrictions are: 449 \begin{itemize} 450 \item 451 None of the member routines are virtual in any of the data structures for efficiency reasons. 452 Therefore, pointers to data structures must be used with care or incorrect member routines may be invoked. 453 \end{itemize} 454 \end{comment} 455 456 457 \subsection{Design Issues} 458 \label{toc:lst:issue} 459 460 This section introduces the design space for linked lists that target \emph{system programmers}. 461 Within this restricted target, all design-issue discussions assume the following invariants. 462 Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}). 463 \begin{itemize} 464 \item A doubly-linked list is being designed. 465 Generally, the discussed issues apply similarly for singly-linked lists. 466 Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}). 467 \item Link fields are system-managed. 468 The user works with the system-provided API to query and modify list membership. 469 The system has freedom over how to represent these links. 470 \item The user data must provide storage for the list link-fields. 471 Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}. 472 \end{itemize} 473 474 475 \subsection{Preexisting Linked-List Libraries} 476 477 Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined, 478 and further libraries are introduced as needed. 479 \begin{enumerate} 480 \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@. 481 \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl} 482 \end{enumerate} 483 A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}). 484 485 For the discussion, assume the fictional type @req@ (request) is the user's payload in examples. 486 As well, the list library is helping the user manage (organize) requests, \eg a request can be work on the level of handling a network arrival event or scheduling a thread. 487 488 489 \subsection{Link attachment: Intrusive vs.\ Wrapped} 490 \label{toc:lst:issue:attach} 491 492 Link attachment deals with the question: 493 Where are the libraries' inter-element link fields stored, in relation to the user's payload data fields? 494 Figure~\ref{fig:lst-issues-attach} shows three basic styles. 495 The \newterm{intrusive} style places the link fields inside the payload structure. 496 The two \newterm{wrapped} styles place the payload inside a generic library-provided structure that then defines the link fields. 497 Library LQ is intrusive; STL is wrapped. 498 The wrapped style further distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@. 499 (For this discussion, @list<req &>@ is similar to @list<req *>@.) 500 This difference is one of user style, not framework capability. 501 502 \begin{comment} 503 \begin{figure} 504 \begin{tabularx}{\textwidth}{Y|Y|Y} 505 \lstinput[language=C]{20-39}{lst-issues-intrusive.run.c} 506 &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp} 507 &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp} 508 \\ & & 509 \\ 510 \includegraphics[page=1]{lst-issues-attach.pdf} 511 & 512 \includegraphics[page=2]{lst-issues-attach.pdf} 513 & 514 \includegraphics[page=3]{lst-issues-attach.pdf} 515 \\ & & 516 \\ 517 (a) & (b) & (c) 518 \end{tabularx} 519 \caption{ 520 Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value. 521 The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs}; 522 head objects are discussed in Section~\ref{toc:lst:issue:ident}. 523 In (a), the field \lstinline{req.x} names a list direction; 524 these are discussed in Section~\ref{toc:lst:issue:simultaneity}. 525 In (b) and (c), the type \lstinline{node} represents a system-internal type, 526 which is \lstinline{std::_List_node} in the GNU implementation. 527 (TODO: cite? found in /usr/include/c++/7/bits/stl\_list.h ) 528 } 529 \label{fig:lst-issues-attach} 530 \end{figure} 531 \end{comment} 532 533 \begin{figure} 534 \centering 535 \newsavebox{\myboxA} % used with subfigure 536 \newsavebox{\myboxB} 537 \newsavebox{\myboxC} 538 539 \begin{lrbox}{\myboxA} 540 \begin{tabular}{@{}l@{}} 541 \lstinput[language=C]{20-35}{lst-issues-intrusive.run.c} \\ 542 \includegraphics[page=1]{lst-issues-attach.pdf} 543 \end{tabular} 544 \end{lrbox} 545 546 \begin{lrbox}{\myboxB} 547 \begin{tabular}{@{}l@{}} 548 \lstinput[language=C++]{20-35}{lst-issues-wrapped-byref.run.cpp} \\ 549 \includegraphics[page=2]{lst-issues-attach.pdf} 550 \end{tabular} 551 \end{lrbox} 552 553 \begin{lrbox}{\myboxC} 554 \begin{tabular}{@{}l@{}} 555 \lstinput[language=C++]{20-35}{lst-issues-wrapped-emplaced.run.cpp} \\ 556 \includegraphics[page=3]{lst-issues-attach.pdf} 557 \end{tabular} 558 \end{lrbox} 559 560 \subfloat[Intrusive]{\label{f:Intrusive}\usebox\myboxA} 561 \hspace{6pt} 562 \vrule 563 \hspace{6pt} 564 \subfloat[Wrapped reference]{\label{f:WrappedRef}\usebox\myboxB} 565 \hspace{6pt} 566 \vrule 567 \hspace{6pt} 568 \subfloat[Wrapped value]{\label{f:WrappedValue}\usebox\myboxC} 569 570 \caption{ 571 Three styles of link attachment: 572 % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value. 573 The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs}; 574 head objects are discussed in Section~\ref{toc:lst:issue:ident}. 575 In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction; 576 these are discussed in Section~\ref{toc:lst:issue:simultaneity}. 577 In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a 578 library-internal type, which is \lstinline{std::_List_node} in the GNU implementation 579 \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}. 580 } 581 \label{fig:lst-issues-attach} 582 \end{figure} 583 584 Each diagrammed example is using the fewest dynamic allocations for its respective style: 585 in \subref*{f:Intrusive}, here are no dynamic allocations, in \subref*{f:WrappedRef} only the linked fields are dynamically allocated, and in \subref*{f:WrappedValue} the copied data and linked fields are dynamically allocated. 586 The advantage of intrusive attachment is the control in memory layout and storage placement. 587 Both wrapped attachment styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list. 588 In all three cases, a @req@ object can enter and leave a list many times. 589 However, in \subref*{f:Intrusive} a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list. 590 In \subref*{f:WrappedRef}, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency. 591 In \subref*{f:WrappedValue}, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes; 592 however, knowing which of the @req@ object is the ``true'' object becomes complex. 593 \see*{\VRef{toc:lst:issue:simultaneity} for further discussion.} 594 595 The implementation of @LIST_ENTRY@ uses a trick to find the links and the node containing the links. 596 The macro @LIST_INSERT_HEAD(&reqs, &r2, d);@ takes the list header, a pointer to the node, and the offset of the link fields in the node. 597 One of the fields generated by @LIST_ENTRY@ is a pointer to the node, which is set to the node address, \eg @r2@. 598 Hence, the offset to the link fields provides an access to the entire node, \ie the node points at itself. 599 For list traversal, @LIST_FOREACH(cur, &reqs_pri, by_pri)@, there is the node cursor, the list, and the offset of the link fields within the node. 600 The traversal actually moves from link fields to link fields within a node and sets the node cursor from the pointer within the link fields back to the node. 601 602 A further aspect of layout control is allowing the user to explicitly specify link fields controlling attributes and placement within the @req@ object. 603 LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.}; 604 supplying the link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@. 605 An example of an explicit attribute is cache alignment of the link fields in conjunction with other @req@ fields, improving locality and/or avoiding false sharing. 606 Wrapped reference has no control over the link fields, but the seperate data allows some control; 607 wrapped value has no control over data or links. 608 609 Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership. 610 In LQ, \subref*{f:Intrusive}, a @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@; 611 there is no distinguishing a @req@ from ``a @req@ in a list.'' 612 The same is not true of STL, \subref*{f:WrappedRef} or \subref*{f:WrappedValue}. 613 There, the analogous operations work on a parameter of type @list<T>::iterator@; 614 they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@. 615 There is no mapping from @req &@ to @list<req>::iterator@, except for linear search. 616 617 The advantage of wrapped attachment is the abstraction of a data item from its list membership(s). 618 In the wrapped style, the @req@ type can come from a library that serves many independent uses, 619 which generally have no need for listing. 620 Then, a novel use can put @req@ in list, without requiring any upstream change in the @req@ library. 621 In intrusive attachment, the ability to be listed must be planned during the definition of @req@. 622 623 Finally, for wrapper reference a single node can appear at multiple places in the same list or different list, which might be useful in certain read-only cases. 624 For intrusive and wrapper value, a node must be duplicated to appear at multiple locations, presenting additional cost. 625 This scenario becomes difficult to imagine when the nodes are written because three link styles have issues. 626 627 \begin{figure} 628 \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp} 629 \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp} 630 \caption{ 631 Reduction of wrapped attachment to intrusive attachment. 632 Illustrated by pseudocode implementation of an STL-compatible API fragment 633 using LQ as the underlying implementation. 634 The gap that makes it pseudocode is that 635 the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}. 636 When using a custom-patched version of LQ to work around this issue, 637 the programs of Figure~\ref{f:WrappedRef} and \protect\subref*{f:WrappedValue} work with this shim in place of real STL. 638 Their executions lead to the same memory layouts. 639 } 640 \label{fig:lst-issues-attach-reduction} 641 \end{figure} 642 643 Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}. 644 This shim layer performs the implicit dynamic allocations that pure intrusion avoids. 645 But there is no reduction going the other way. 646 No shimming can cancel the allocations to which wrapped membership commits. 647 648 So intrusion is a lower-level listing primitive. 649 And so, the system design choice is not between forcing users to use intrusion or wrapping. 650 The choice is whether or not to provide access to an allocation-free layer of functionality. 651 A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits. 652 An intrusive-primitive library like LQ lets users choose when to make this tradeoff. 653 654 655 \subsection{Simultaneity: Single vs.\ Multi-Static vs.\ Dynamic} 656 \label{toc:lst:issue:simultaneity} 657 658 \begin{figure} 659 \parbox[t]{3.5in} { 660 \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c} 661 }\parbox[t]{20in} { 662 ~\\ 663 \includegraphics[page=1]{lst-issues-direct.pdf} \\ 664 ~\\ 665 \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf} 666 } 667 \caption{ 668 Example of simultaneity using LQ lists. 669 The zoomed-out diagram (right/top) shows the complete multi-linked data structure. 670 This structure can navigate all requests in priority order, and navigate among requests with a common request value. 671 The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists. 672 } 673 \label{fig:lst-issues-multi-static} 674 \end{figure} 675 676 \newterm{Simultaneity} deals with the question: 677 In how many different lists can a node be stored, at the same time? 678 Figure~\ref{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same request value (field @rqr@). 679 Each of ``by priority'' and ``by common request value'' is a separate list. 680 For example, there is a single priority-list linked in order [1, 2, 2, 3, 3, 4], where nodes may have the same priority, and there are three common request-value lists combining requests with the same values: [42, 42], [17, 17, 17], and [99], giving four head nodes one for each list. 681 The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists). 682 683 As stated, the limitation of intrusive attachment is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists. 684 Thus, the intrusive LQ example supports multiple, but statically many, link lists. 685 Note, it is possible to reuse links for different purposes, \eg if a list in linked one at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields. 686 This feature is used in the \CFA runtime where a thread node may be on a blocked or running list, both never on both simultaneously. 687 688 Now consider the STL in the wrapped-reference arrangement of Figure~\ref{f:WrappedRef}. 689 Here it is possible to construct the same simultaneity by creating multiple STL lists, each pointing at the appropriate nodes. 690 Each group of intrusive links become the links for each separate STL list. 691 The upside is the unlimited number of a lists a node can be associated with simultaneously, any number of STL lists can be created dynamically. 692 The downside is the dynamic allocation of the link nodes and manging multiple lists. 693 Note, it might be possible to wrap the multiple lists in another type to hide this implementation issue. 694 695 Now consider the STL in the wrapped-value arrangement of Figure~\ref{f:WrappedValue}. 696 Again, it is possible to construct the same simultaneity by creating multiple STL lists, each copying the appropriate nodes, where the intrusive links become the links for each separate STL list. 697 The upside is the same as for wrapped-reference arrangement with an unlimited number of a list bindings. 698 The downside is the dynamic allocation and significant storage usage due to copying. 699 As well, it is unclear how node updates work in this scenario, without some notation of ultimately merging node information. 700 701 % https://www.geeksforgeeks.org/introduction-to-multi-linked-list/ -- example of building a bespoke multi-linked list out of STL primitives (providing indication that STL doesn't offer one); offers dynamic directionality by embedding `vector<struct node*> pointers;` 702 703 % When allowing multiple static directions, frameworks differ in their ergonomics for 704 % the typical case: when the user needs only one direction, vs.\ the atypical case, when the user needs several. 705 % LQ's ergonomics are well-suited to the uncommon case of multiple list directions. 706 % Its intrusion declaration and insertion operation both use a mandatory explicit parameter naming the direction. 707 % This decision works well in Figure~\ref{fig:lst-issues-multi-static}, where the names @by_pri@ and @by_rqr@ work well, 708 % but it clutters Figure~\ref{f:Intrusive}, where a contrived name must be invented and used. 709 % The example uses @x@; @reqs@ would be a more readily ignored choice. \PAB{wording?} 710 711 \uCpp offers an intrusive list that makes the opposite ergonomic choice. TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions. 712 713 STL may seem to have similar ergonomics to LQ, but in fact, the current ergonomic distinction is not applicable there, 714 where one static direction is enough to achieve multiple dynamic directions. 715 Note that all options in Figure~\ref{fig:lst-issues-attach} have a \emph{variable} named @refs@ 716 just as both Figure~\ref{fig:lst-issues-multi-static} and Figure~(TODO~new) have \emph{variables} with names including @pri@ vs @rqr@. 717 But only the intrusive model has this naming showing up within the definition of a structure. 718 This lack of named parts of a structure lets Figure~\ref{fig:lst-issues-attach} \subref*{f:WrappedRef} and \subref*{f:WrappedValue}, just like \uCpp, 719 insert into a list without mentioning a part's name, while only version \subref*{f:Intrusive} has to mention @x@ at this step. 720 LQ demands this same extraneous part-naming when removing, iterating, and even asking for a neighbour. 721 At issue in this distinction is whether an API that offers multiple static directions (and so requires these to be named differently) 722 allows the sole direction (when several are not wanted) to be \emph{implicit}. 723 \uCpp allows it, LQ does not, and STL does not have this question as applicable. 724 725 726 \subsection{User integration: Preprocessed vs.\ Type-System Mediated} 727 728 % example of poor error message due to LQ's preprocessed integration 729 % programs/lst-issues-multi-static.run.c:46:1: error: expected identifier or '(' before 'do' 730 % 46 | LIST_INSERT_HEAD(&reqs_rtr_42, &r42b, by_rqr); 731 % | ^~~~~~~~~~~~~~~~ 732 % 733 % ... not a wonderful example; it was a missing semicolon on the preceding line; but at least real 734 735 736 \subsection{List identity: Headed vs.\ Ad-hoc} 737 \label{toc:lst:issue:ident} 738 739 All examples so far have used distinct user-facing types: 740 an item found in a list (type @req@, of variables like @r1@), and 741 a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@). 742 \see{Figure~\ref{fig:lst-issues-attach} and Figure~\ref{fig:lst-issues-multi-static}} 743 The latter type is a head, and these examples are of headed lists. 744 745 A bespoke ``pointer to next @req@'' implementation often omits the latter type. 746 Such a list model is ad-hoc. 747 748 In headed thinking, there are length-zero lists (heads with no elements), and an element can be listed or not listed. 749 In ad-hoc thinking, there are no length-zero lists and every element belongs to a list of length at least one. 750 \PAB{Create a figure for this.} 751 752 By omitting the head, elements can enter into an adjacency relationship, 753 without requiring that someone allocate room for the head of the possibly-resulting list, 754 or being able to find a correct existing head. 755 756 A head defines one or more element roles, among elements that share a transitive adjacency. 757 ``First'' and ``last'' are element roles. 758 One moment's ``first'' need not be the next moment's. 759 760 There is a cost to maintaining these roles. 761 A runtime component of this cost is evident in LQ's offering the choice of type generators @LIST@ vs.~@TAILQ@. 762 Its @LIST@ maintains a ``first,'' but not a ``last;'' its @TAILQ@ maintains both roles. 763 (Both types are doubly linked and an analogous choice is available for singly linked.) 764 765 TODO: finish making this point 766 767 See WIP in lst-issues-adhoc-*.ignore.*. 768 769 The code-complexity component of the cost ... 770 771 Ability to offer heads is good. Point: Does maintaining a head mean that the user has to provide more state when manipulating the list? Requiring the user to do so is bad, because the user may have lots of "list" typed variables in scope, and detecting that the user passed the wrong one requires testing all the listing edge cases. 772 773 \subsection{End treatment: Uniform } 399 774 400 775 … … 406 781 An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in @'x'@. 407 782 A wide character constant is the same, except prefixed by the letter @L@, @u@, or @U@. 408 With a few exceptions detailed later, the elements of the sequence are any members of the source character set;783 Except for escape sequences, the elements of the sequence are any members of the source character set; 409 784 they are mapped in an implementation-defined manner to members of the execution character set. 410 785 … … 416 791 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the @mbstowcs@ function with an implementation-defined current locale. 417 792 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by successive calls to the @mbrtoc16@, or @mbrtoc32@ function as appropriate for its type, with an implementation-defined current locale. 418 The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.793 The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined. 419 794 420 795
Note: See TracChangeset
for help on using the changeset viewer.