Changeset c721105
- Timestamp:
- May 24, 2024, 2:16:09 PM (6 months ago)
- Branches:
- master
- Children:
- df699e0
- Parents:
- 76425bc
- Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r76425bc rc721105 1 1 \chapter{Array} 2 \label{c:Array} 2 3 3 4 \section{Introduction} … … 126 127 Unsigned integers have a special status in this type system. 127 128 Unlike how C++ allows 128 \begin{ lstlisting}[language=c++]129 \begin{c++} 129 130 template< size_t N, char * msg, typename T >... // declarations 130 \end{ lstlisting}131 \end{c++} 131 132 \CFA does not accommodate values of any user-provided type. 132 133 TODO: discuss connection with dependent types. -
doc/theses/mike_brooks_MMath/background.tex
r76425bc rc721105 13 13 Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction. 14 14 The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element. 15 Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), it is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions. 16 Many C errors result from performing pointer arithmetic instead of using subscripting. 17 Some C textbooks erroneously teach pointer arithmetic suggesting it is faster than subscripting. 15 Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), the computation is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions. 16 Many C errors result from performing pointer arithmetic instead of using subscripting; 17 some C textbooks teach pointer arithmetic erroneously suggesting it is faster than subscripting. 18 A sound and efficient C program does not require explicit pointer arithmetic. 18 19 19 20 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.'' … … 27 28 Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture). 28 29 % Now, set aside for a moment the claim that this first assertion is giving information about a type. 29 Clearly, an array and a pointer to its first element are different things.30 Clearly, an array and a pointer to its first element are different. 30 31 31 32 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element. … … 92 93 \CFA provides its own type, variable and routine declarations, using a simpler syntax. 93 94 The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type. 94 The qualifiers have the same meaningin \CFA as in C.95 The qualifiers have the same syntax and semantics in \CFA as in C. 95 96 Then, a \CFA declaration is read left to right, where a function return type is enclosed in brackets @[@\,@]@. 96 97 \begin{cquote} … … 119 120 \end{tabular} 120 121 \end{cquote} 121 As declaration complexity increases, it becomes corresponding difficult to read and understand the C declaration form.122 As declaration size increases, it becomes corresponding difficult to read and understand the C declaration form, whereas reading and understanding a \CFA declaration has linear complexity as the declaration size increases. 122 123 Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations. 123 124 124 \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef {XXX}.125 \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}. 125 126 The \CFA-thesis column shows the new array declaration form, which is my contributed improvements for safety and ergonomics. 126 127 The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features. … … 152 153 \hline 153 154 & ar.\ of immutable val. & @const T x[10];@ & @[10] const T x@ & @const array(T, 10) x@ \\ 154 155 & & @T const x[10];@ & @[10] T const x@ & @array(T, 10) const x@ \\ 155 156 \hline 156 157 & ar.\ of ptr.\ to value & @T * x[10];@ & @[10] * T x@ & @array(T *, 10) x@ \\ … … 203 204 ``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11} 204 205 \end{quote} 205 This phenomenon is the famous ``pointer decay,''which is a decay of an array-typed expression into a pointer-typed one.206 This phenomenon is the famous \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one. 206 207 It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@. 207 208 Thus, subscripting happens on pointers not arrays. … … 212 213 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing! 213 214 214 Subscripting a pointer when the target is standard-inappropriate is still practically well-defined. 215 While the standard affords a C compiler freedom about the meaning of an out-of-bound access, 216 or of subscripting a pointer that does not refer to an array element at all, 217 the fact that C is famously both generally high-performance, and specifically not bound-checked, 218 leads to an expectation that the runtime handling is uniform across legal and illegal accesses. 215 Subscripting a pointer when the target is standard inappropriate is still practically well-defined. 216 While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all, 217 the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses. 219 218 Moreover, consider the common pattern of subscripting on a @malloc@ result: 220 219 \begin{cfa} … … 226 225 227 226 Under this assumption, a pointer being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around the @p@ address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@. 228 I call this phenomenon ``array diffraction,''which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions.227 I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions. 229 228 No pointer is exempt from array diffraction. 230 229 No array shows its elements without pointer decay. … … 242 241 The caller of such a function is left with the reality that a pointer parameter is a pointer, no matter how it is spelled: 243 242 \lstinput{18-21}{bkgd-carray-decay.c} 244 This fragment gives no warnings. 243 This fragment gives the warning for the first argument, in the second call. 244 \begin{cfa} 245 warning: 'f' accessing 40 bytes in a region of size 4 246 \end{cfa} 245 247 246 248 The shortened parameter syntax @T x[]@ is a further way to spell ``pointer.'' … … 248 250 This point of confusion is illustrated in: 249 251 \lstinput{23-30}{bkgd-carray-decay.c} 252 Note, \CC gives a warning for the initialization of @cp@. 253 \begin{cfa} 254 warning: ISO C++ forbids converting a string constant to 'char*' 255 \end{cfa} 256 and C gives a warning at the call of @edit@, if @const@ is added to the declaration of @cp@. 257 \begin{cfa} 258 warning: passing argument 1 of 'edit' discards 'const' qualifier from pointer target type 259 \end{cfa} 250 260 The basic two meanings, with a syntactic difference helping to distinguish, 251 are illustrated in the declarations of @ca@ vs.\@cp@,261 are illustrated in the declarations of @ca@ \vs @cp@, 252 262 whose subsequent @edit@ calls behave differently. 253 263 The syntax-caused confusion is in the comparison of the first and last lines, … … 300 310 \hline 301 311 & ptr.\ to ptr.\ to imm.\ val. & @const char **@ & @const char * argv[],@ & @[] * const char argv,@ \\ 302 312 & & & \emph{others elided} & \emph{others elided} \\ 303 313 \hline 304 314 \end{tabular} 305 315 \end{table} 316 317 318 \subsection{Multi-Dimensional} 319 320 As in the last section, multi-dimensional array declarations are examined. 321 \lstinput{16-18}{bkgd-carray-mdim.c} 322 The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).'' 323 \lstinput{20-44}{bkgd-carray-mdim.c} 306 324 307 325 … … 323 341 Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not; 324 342 both @gcc@ and @g++@ support VLAs. 325 As well, there is misinformation about VLAs, \eg VLAs cause stack failures or are inefficient.343 As well, there is misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient. 326 344 VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type. 327 345 328 346 For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads). 329 Here, VLAs can overflow the stack , so a heap allocation is used.347 Here, VLAs can overflow the stack without appropriately sizing the stack, so a heap allocation is used. 330 348 \begin{cfa} 331 349 float * ax1 = malloc( sizeof( float[n] ) ); 332 float * ax2 = malloc( n * sizeof( float ) ); 350 float * ax2 = malloc( n * sizeof( float ) ); $\C{// arrays}$ 333 351 float * bx1 = malloc( sizeof( float[1000000] ) ); 334 352 float * bx2 = malloc( 1000000 * sizeof( float ) ); … … 384 402 \subsection{The pointer-to-array type has been noticed before} 385 403 386 \subsection{Multi-Dimensional}387 388 As in the last section, we inspect the declaration ...389 \lstinput{16-18}{bkgd-carray-mdim.c}390 The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''391 \lstinput{20-44}{bkgd-carray-mdim.c}392 393 404 394 405 \section{Linked List} … … 398 409 Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous. 399 410 400 Linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc.411 Storage linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc. 401 412 Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value; 402 413 hence, all data structure routines take and return pointers to nodes and not the nodes themselves. 403 404 405 \begin{comment}406 \subsection{Linked-List Packages}407 408 C only supports type-eraser polymorphism, with no help from the type system.409 This approach is used in the @queue@ library providing macros that define and operate on four types of data structures: singly-linked lists, singly-linked tail queues, lists, and tail queues.410 These linked structures are \newterm{intrusive list}, where the link fields are defined (intrude) with data fields.411 \begin{cfa}412 struct DS {413 // link fields, intrustive414 // data fields415 }416 \end{cfa}417 418 \uCpp~\cite{uC++} is a concurrent extension of \CC, and provides a basic set of intrusive lists, where the link fields are defined with the data fields using inheritance.419 \begin{cfa}420 struct DS : public uColable {421 // implicit link fields422 // data fields423 }424 \end{cfa}425 426 Intrusive nodes eliminate the need to dynamically allocate/deallocate the link fields when a node is added/removed to/from a data-structure.427 Reducing dynamic allocation is important in concurrent programming because the heap is a shared resource with the potential for high contention.428 The two formats are one link field, which form a \Index{collection}, and two link fields, which form a \Index{sequence}.429 \begin{center}430 %\input{DSLNodes}431 \end{center}432 @uStack@ and @uQueue@ are collections and @uSequence@ is a sequence.433 To get the appropriate link fields associated with a user node, it must be a public descendant of @uColable@\index{uColable@@uColable@} or @uSeqable@\index{uSeqable@@uSeqable@}, respectively, e.g.:434 %[435 class stacknode : public uColable { ... }436 class queuenode : public uColable { ... }437 class seqnode : public uSeqable { ... }438 %]439 A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inheriting from @uColable@ can only appear in a collection.440 Along with providing the appropriate link fields, the types @uColable@ and @uSeqable@ also provide one member routine:441 %[442 bool listed() const;443 %]444 which returns @true@ if the node is an element of any collection or sequence and @false@ otherwise.445 446 Finally, no header files are necessary to access the uC DSL.447 448 Some uC DSL restrictions are:449 \begin{itemize}450 \item451 None of the member routines are virtual in any of the data structures for efficiency reasons.452 Therefore, pointers to data structures must be used with care or incorrect member routines may be invoked.453 \end{itemize}454 \end{comment}455 414 456 415 … … 462 421 Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}). 463 422 \begin{itemize} 464 465 466 467 468 469 423 \item A doubly-linked list is being designed. 424 Generally, the discussed issues apply similarly for singly-linked lists. 425 Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}). 426 \item Link fields are system-managed. 427 The user works with the system-provided API to query and modify list membership. 428 The system has freedom over how to represent these links. 470 429 \item The user data must provide storage for the list link-fields. 471 430 Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}. 472 431 \end{itemize} 473 432 … … 478 437 and further libraries are introduced as needed. 479 438 \begin{enumerate} 480 481 439 \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@. 440 \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl} 482 441 \end{enumerate} 483 A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}). 484 442 %A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}). 485 443 For the discussion, assume the fictional type @req@ (request) is the user's payload in examples. 486 444 As well, the list library is helping the user manage (organize) requests, \eg a request can be work on the level of handling a network arrival event or scheduling a thread. … … 492 450 Link attachment deals with the question: 493 451 Where are the libraries' inter-element link fields stored, in relation to the user's payload data fields? 494 Figure~\ref{fig:lst-issues-attach} shows three basic styles. 495 The \newterm{intrusive} style places the link fields inside the payload structure. 496 The two \newterm{wrapped} styles place the payload inside a generic library-provided structure that then defines the link fields. 497 Library LQ is intrusive; STL is wrapped. 498 The wrapped style further distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@. 452 \VRef[Figure]{fig:lst-issues-attach} shows three basic styles. 453 \VRef[Figure]{f:Intrusive} shows the \newterm{intrusive} style, placing the link fields inside the payload structure. 454 \VRef[Figures]{f:WrappedRef} and \subref*{f:WrappedValue} show the two \newterm{wrapped} styles, which place the payload inside a generic library-provided structure that then defines the link fields. 455 The wrapped style distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@. 499 456 (For this discussion, @list<req &>@ is similar to @list<req *>@.) 500 457 This difference is one of user style, not framework capability. 458 Library LQ is intrusive; STL is wrapped with reference and value. 501 459 502 460 \begin{comment} 503 461 \begin{figure} 504 462 \begin{tabularx}{\textwidth}{Y|Y|Y} 505 463 \lstinput[language=C]{20-39}{lst-issues-intrusive.run.c} 506 507 508 509 510 511 512 513 514 515 516 517 518 464 &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp} 465 &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp} 466 \\ & & 467 \\ 468 \includegraphics[page=1]{lst-issues-attach.pdf} 469 & 470 \includegraphics[page=2]{lst-issues-attach.pdf} 471 & 472 \includegraphics[page=3]{lst-issues-attach.pdf} 473 \\ & & 474 \\ 475 (a) & (b) & (c) 476 \end{tabularx} 519 477 \caption{ 520 Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value. 521 522 head objects are discussed in Section~\ref{toc:lst:issue:ident}. 523 524 525 526 527 528 529 478 Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value. 479 The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs}; 480 head objects are discussed in Section~\ref{toc:lst:issue:ident}. 481 In (a), the field \lstinline{req.x} names a list direction; 482 these are discussed in Section~\ref{toc:lst:issue:simultaneity}. 483 In (b) and (c), the type \lstinline{node} represents a system-internal type, 484 which is \lstinline{std::_List_node} in the GNU implementation. 485 (TODO: cite? found in /usr/include/c++/7/bits/stl\_list.h ) 486 } 487 \label{fig:lst-issues-attach} 530 488 \end{figure} 531 489 \end{comment} … … 569 527 570 528 \caption{ 571 572 % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value. 573 574 head objects are discussed in Section~\ref{toc:lst:issue:ident}. 575 576 577 529 Three styles of link attachment: 530 % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value. 531 The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs}; 532 head objects are discussed in Section~\ref{toc:lst:issue:ident}. 533 In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction; 534 these are discussed in Section~\ref{toc:lst:issue:simultaneity}. 535 In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a 578 536 library-internal type, which is \lstinline{std::_List_node} in the GNU implementation 579 580 581 537 \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}. 538 } 539 \label{fig:lst-issues-attach} 582 540 \end{figure} 583 541 584 542 Each diagrammed example is using the fewest dynamic allocations for its respective style: 585 in \subref*{f:Intrusive}, here are no dynamic allocations, in \subref*{f:WrappedRef} only the linked fields are dynamically allocated, and in \subref*{f:WrappedValue}the copied data and linked fields are dynamically allocated.586 The advantage of intrusive attachmentis the control in memory layout and storage placement.587 Both wrapped attachmentstyles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.543 in intrusive, here is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copied data and linked fields are dynamically allocated. 544 The advantage of intrusive is the control in memory layout and storage placement. 545 Both wrapped styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list. 588 546 In all three cases, a @req@ object can enter and leave a list many times. 589 However, in \subref*{f:Intrusive}a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.590 In \subref*{f:WrappedRef}, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency.591 In \subref*{f:WrappedValue}, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes;547 However, in intrusive a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list. 548 In wrapped reference, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency. 549 In wrapped value, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes; 592 550 however, knowing which of the @req@ object is the ``true'' object becomes complex. 593 551 \see*{\VRef{toc:lst:issue:simultaneity} for further discussion.} … … 601 559 602 560 A further aspect of layout control is allowing the user to explicitly specify link fields controlling attributes and placement within the @req@ object. 603 LQ allows this ability through the @LIST_ENTRY@ macro \footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.};561 LQ allows this ability through the @LIST_ENTRY@ macro;\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.} 604 562 supplying the link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@. 605 563 An example of an explicit attribute is cache alignment of the link fields in conjunction with other @req@ fields, improving locality and/or avoiding false sharing. … … 608 566 609 567 Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership. 610 In LQ, \subref*{f:Intrusive}, a@req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;568 In LQ, the intrusive @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@; 611 569 there is no distinguishing a @req@ from ``a @req@ in a list.'' 612 The same is not true of STL, \subref*{f:WrappedRef} or \subref*{f:WrappedValue}. 613 There, the analogous operations work on a parameter of type @list<T>::iterator@; 614 they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@. 615 There is no mapping from @req &@ to @list<req>::iterator@, except for linear search. 616 617 The advantage of wrapped attachment is the abstraction of a data item from its list membership(s). 570 The same is not true of STL, wrapped reference or value. 571 There, the analogous operations, @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@, work on a parameter of type @list<T>::iterator@; 572 There is no mapping from @req &@ to @list<req>::iterator@. %, for linear search. 573 574 The advantage of wrapped is the abstraction of a data item from its list membership(s). 618 575 In the wrapped style, the @req@ type can come from a library that serves many independent uses, 619 576 which generally have no need for listing. 620 Then, a novel use can put @req@ in list, without requiring any upstream change in the @req@ library. 621 In intrusive attachment, the ability to be listed must be planned during the definition of @req@. 622 623 Finally, for wrapper reference a single node can appear at multiple places in the same list or different list, which might be useful in certain read-only cases. 624 For intrusive and wrapper value, a node must be duplicated to appear at multiple locations, presenting additional cost. 625 This scenario becomes difficult to imagine when the nodes are written because three link styles have issues. 577 Then, a novel use can put a @req@ in a list, without requiring any upstream change in the @req@ library. 578 In intrusive, the ability to be listed must be planned during the definition of @req@. 626 579 627 580 \begin{figure} 628 \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp} 629 \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp} 630 \caption{ 631 Reduction of wrapped attachment to intrusive attachment. 632 Illustrated by pseudocode implementation of an STL-compatible API fragment 633 using LQ as the underlying implementation. 634 The gap that makes it pseudocode is that 635 the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}. 636 When using a custom-patched version of LQ to work around this issue, 637 the programs of Figure~\ref{f:WrappedRef} and \protect\subref*{f:WrappedValue} work with this shim in place of real STL. 638 Their executions lead to the same memory layouts. 639 } 640 \label{fig:lst-issues-attach-reduction} 581 \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp} 582 \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp} 583 \caption{ 584 Simulation of wrapped using intrusive. 585 Illustrated by pseudocode implementation of an STL-compatible API fragment using LQ as the underlying implementation. 586 The gap that makes it pseudocode is that 587 the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}. 588 When using a custom-patched version of LQ to work around this issue, 589 the programs of Figure~\ref{f:WrappedRef} and wrapped value work with this shim in place of real STL. 590 Their executions lead to the same memory layouts. 591 } 592 \label{fig:lst-issues-attach-reduction} 641 593 \end{figure} 642 594 643 Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.595 It is possible to simulate wrapped using intrusive, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}. 644 596 This shim layer performs the implicit dynamic allocations that pure intrusion avoids. 645 597 But there is no reduction going the other way. 646 598 No shimming can cancel the allocations to which wrapped membership commits. 647 599 648 So intrusion is a lower-level listing primitive. 649 And so, the system design choice is not between forcing users to use intrusion or wrapping. 600 Because intrusion is a lower-level listing primitive, the system design choice is not between forcing users to use intrusion or wrapping. 650 601 The choice is whether or not to provide access to an allocation-free layer of functionality. 602 An intrusive-primitive library like LQ lets users choose when to make this tradeoff. 651 603 A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits. 652 An intrusive-primitive library like LQ lets users choose when to make this tradeoff.653 604 654 605 … … 657 608 658 609 \begin{figure} 659 660 661 662 663 664 665 666 667 668 Example of simultaneity using LQ lists. 669 670 This structure can navigate all requests in priority order, and navigate among requests with a common request value.671 672 673 610 \parbox[t]{3.5in} { 611 \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c} 612 }\parbox[t]{20in} { 613 ~\\ 614 \includegraphics[page=1]{lst-issues-direct.pdf} \\ 615 ~\\ 616 \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf} 617 } 618 \caption{ 619 Example of simultaneity using LQ lists. 620 The zoomed-out diagram (right/top) shows the complete multi-linked data structure. 621 This structure can navigate all requests in priority order ({\color{blue}blue}), and navigate among requests with a common request value ({\color{orange}orange}). 622 The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists. 623 } 624 \label{fig:lst-issues-multi-static} 674 625 \end{figure} 675 626 … … 681 632 The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists). 682 633 683 As stated, the limitation of intrusive attachmentis knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.634 As stated, the limitation of intrusive is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists. 684 635 Thus, the intrusive LQ example supports multiple, but statically many, link lists. 685 Note, it is possible to reuse links for different purposes, \eg if a list in linked one at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.636 Note, it is possible to reuse links for different purposes, \eg if a list in linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields. 686 637 This feature is used in the \CFA runtime where a thread node may be on a blocked or running list, both never on both simultaneously. 687 638 … … 696 647 Again, it is possible to construct the same simultaneity by creating multiple STL lists, each copying the appropriate nodes, where the intrusive links become the links for each separate STL list. 697 648 The upside is the same as for wrapped-reference arrangement with an unlimited number of a list bindings. 698 The downside is the dynamic allocation and significant storage usage due to copying.649 The downside is the dynamic allocation and significant storage usage due to node copying. 699 650 As well, it is unclear how node updates work in this scenario, without some notation of ultimately merging node information. 700 651 … … 709 660 % The example uses @x@; @reqs@ would be a more readily ignored choice. \PAB{wording?} 710 661 711 \uCpp offers an intrusive list that makes the opposite ergonomic choice. TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions. 712 713 STL may seem to have similar ergonomics to LQ, but in fact, the current ergonomic distinction is not applicable there, 714 where one static direction is enough to achieve multiple dynamic directions. 715 Note that all options in Figure~\ref{fig:lst-issues-attach} have a \emph{variable} named @refs@ 716 just as both Figure~\ref{fig:lst-issues-multi-static} and Figure~(TODO~new) have \emph{variables} with names including @pri@ vs @rqr@. 717 But only the intrusive model has this naming showing up within the definition of a structure. 718 This lack of named parts of a structure lets Figure~\ref{fig:lst-issues-attach} \subref*{f:WrappedRef} and \subref*{f:WrappedValue}, just like \uCpp, 719 insert into a list without mentioning a part's name, while only version \subref*{f:Intrusive} has to mention @x@ at this step. 720 LQ demands this same extraneous part-naming when removing, iterating, and even asking for a neighbour. 721 At issue in this distinction is whether an API that offers multiple static directions (and so requires these to be named differently) 722 allows the sole direction (when several are not wanted) to be \emph{implicit}. 723 \uCpp allows it, LQ does not, and STL does not have this question as applicable. 662 \uCpp is a concurrent extension of \CC, which provides a basic set of intrusive lists~\cite[appx.~F]{uC++}, where the link fields are defined with the data fields using inheritance. 663 \begin{cquote} 664 \setlength{\tabcolsep}{15pt} 665 \begin{tabular}{@{}ll@{}} 666 \multicolumn{1}{c}{singly-linked list} & \multicolumn{1}{c}{doubly-linked list} \\ 667 \begin{c++} 668 struct Node : public uColable { 669 int i; // data 670 Node( int i ) : i{ i } {} 671 }; 672 \end{c++} 673 & 674 \begin{c++} 675 struct Node : public uSeqable { 676 int i; // data 677 Node( int i ) : i{ i } {} 678 }; 679 \end{c++} 680 \end{tabular} 681 \end{cquote} 682 A node can be placed in the following data structures depending on its link fields: @uStack@ and @uQueue@ (singly linked), and @uSequence@ (doubly linked). 683 A node inheriting from @uSeqable@ can appear in a singly or doubly linked structure. 684 Structure operations implicitly know the link-field location through the inheritance. 685 \begin{c++} 686 uStack<Node> stack; 687 Node node; 688 stack.push( node ); // link fields at beginning of node 689 \end{c++} 690 691 Simultaneity cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance. 692 Instead, a special type is require that contains the link fields and points at the node. 693 \begin{cquote} 694 \setlength{\tabcolsep}{10pt} 695 \begin{tabular}{@{}ll@{}} 696 \begin{c++} 697 struct NodeDL : public uSeqable { 698 @Node & node;@ // node pointer 699 NodeDL( Node & node ) : node( node ) {} 700 Node & get() const { return node; } 701 }; 702 \end{c++} 703 & 704 \begin{c++} 705 struct Node : public uColable { 706 int i; // data 707 @NodeDL nodeseq;@ // embedded intrusive links 708 Node( int i ) : i{ i }, @nodeseq{ this }@ {} 709 }; 710 \end{c++} 711 \end{tabular} 712 \end{cquote} 713 This node can now be inserted into a doubly-linked list through the embedded intrusive links. 714 \begin{c++} 715 uSequence<NodeDL> sequence; 716 sequence.add_front( node.nodeseq ); $\C{// link fields in embedded type}$ 717 NodeDL nodedl = sequence.remove( node.nodeseq ); 718 int i = nodedl.get().i; $\C{// indirection to node}$ 719 \end{c++} 720 Hence, the \uCpp approach optimizes one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links. 721 However, \uCpp cannot apply the LQ trick for finding the links and node. 722 723 The major ergonomic difference among the approaches is naming and name usage. 724 The intrusive model requires naming each set of intrusive links, \eg @by_pri@ and @by_rqr@ in \VRef[Figure]{fig:lst-issues-multi-static}. 725 \uCpp cheats by using inheritance for the first intrusive links, after which explicit naming of intrusive links is required. 726 Furthermore, these link names must be used in all list operations, including iterating, whereas wrapped reference and value hide the list names in the implicit dynamically-allocated node-structure. 727 At issue is whether an API for simultaneity can support one list (when several are not wanted) to be \emph{implicit}. 728 \uCpp allows it, LQ does not, and the STL does not have this question. 724 729 725 730 … … 737 742 \label{toc:lst:issue:ident} 738 743 739 All examples so far have used distinct user-facing types: 744 All examples so far have used distinct user-facing types: 740 745 an item found in a list (type @req@, of variables like @r1@), and 741 746 a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@). … … 781 786 A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@. 782 787 A wide C character constant is the same, except prefixed by the letter @L@, @u@, or @U@, \eg @u'\u25A0'@ (black square), where the @\u@ identifies a universal character name. 783 A character can be formed from an escape sequence, which expresses a non-typable character (@'\n'@), a delimiter character @'\''@, or a raw character @'\x2f'@.784 785 A character sequenceis zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.788 A character can be formed from an escape sequence, which expresses a non-typable character @'\f'@ form feed, a delimiter character @'\''@ embedded single quote, or a raw character @'\xa3'@ \textsterling. 789 790 A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@. 786 791 The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@. 787 792 788 793 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multibyte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO). 789 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding of the multibyte character sequence, \eg @L"abc@$\mu$@"@ and read/printusing @wsanf@/@wprintf@.794 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multibyte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@. 790 795 The value of a wide-character is implementation-defined, usually a UTF-16 character. 791 796 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multibyte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@. … … 802 807 Unfortunately, this design decision is both unsafe and inefficient. 803 808 It is common error in C to forget the space in a character array for the terminator or overwrite the terminator, resulting in array overruns in string operations. 804 The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is notpossible to cache the length in many cases.809 The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is impossible to cache the length in many cases. 805 810 806 811 C strings are fixed size because arrays are used for the implementation. … … 808 813 As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results. 809 814 810 Collectively, these design decisions make working with strings in C, awkward, time consuming, and veryunsafe.815 Collectively, these design decisions make working with strings in C, awkward, time consuming, and unsafe. 811 816 While there are companion string routines that take the maximum lengths of strings to prevent array overruns, that means the semantics of the operation can fail because strings are truncated. 812 817 Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the @string@ type. -
doc/theses/mike_brooks_MMath/intro.tex
r76425bc rc721105 3 3 All modern programming languages provide three high-level containers (collection): array, linked-list, and string. 4 4 Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list. 5 For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor. 5 6 6 \cite{Blache19}7 \cite{Oorschot23}8 \cite{Ruef19}9 7 10 8 \section{Array} 11 9 12 A rray provides a homogeneous container with $O(1)$ access to elements using subscripting.10 An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic). 13 11 The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation. 14 12 For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap. 13 Because array layout has contiguous components, subscripting is a computation. 14 However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}. 15 The goal is to provide good performance with safety. 15 16 16 17 17 18 \section{Linked List} 18 19 19 Linked-list provides a homogeneous containerwith $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.20 A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations. 20 21 Subscripting by value is sometimes available, \eg hash table. 21 22 Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes). … … 26 27 \section{String} 27 28 28 String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.29 What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements.30 Nevertheless, subscriptingis often available.31 The cost of string operations is less important than the power of the block operation to accomplish complex manipulation.32 The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.29 A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters. 30 What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@. 31 Subscripting individual elements is often available. 32 Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing. 33 The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages. 33 34 34 35 35 36 \section{Motivation} 36 37 37 The goal of this work is to introduce safe and complex versions of array, link -lists, and stringinto the programming language \CFA~\cite{Cforall}, which is based on C.38 The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C. 38 39 Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design. 39 40 Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built. … … 45 46 However, most programming languages are only partially explained by standard's manuals. 46 47 When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}. 47 Often other C compilers must \emph{ape}@gcc@ because a large part of the C library (runtime) system contains @gcc@ features.48 Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features. 48 49 While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour. 49 50 These example programs show … … 55 56 This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures. 56 57 Any discovered anomalies among compilers or versions is discussed. 57 In thiscase, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.58 In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard. 58 59 59 60 … … 76 77 \end{cfa} 77 78 with a segmentation fault at runtime. 78 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness.79 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings .79 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate. 80 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable. 80 81 In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing. 81 82 Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors. … … 99 100 100 101 \subsection{String} 102 103 \subsection{Iterator}
Note: See TracChangeset
for help on using the changeset viewer.