# Changeset 1716e1c

Ignore:
Timestamp:
May 4, 2021, 12:31:26 PM (17 months ago)
Branches:
arm-eh, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
0c4df43
Parents:
8cd34e9 (diff), c0c940a (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into andrew-mmath, implementation updates.

Files:
10 edited
2 moved

Unmodified
Removed
• ## doc/theses/andrew_beach_MMath/cfalab.sty

 r8cd34e9 numberstyle=\footnotesize\sf, % Replace/adjust listing characters that look bad in sanserif. literate={-}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.75ex}{0.1ex}}}}1 literate={-}{\makebox[1ex][c]{\raisebox{0.7ex}{\rule{0.75ex}{0.1ex}}}}1 {^}{\raisebox{0.6ex}{$\scriptscriptstyle\land\,$}}1 {~}{\raisebox{0.3ex}{$\scriptstyle\sim\,$}}1 {}{\ttfamily\upshape\hspace*{-0.1ex}}1
• ## doc/theses/andrew_beach_MMath/implement.tex

 r8cd34e9 \subsection{Virtual Type} Virtual types only have one change to their structure, the addition of a pointer to the virtual table. This is always the first field so that if it is cast to a supertype the field's location is still known. This field is set as part of all new generated constructors. Virtual types only have one change to their structure: the addition of a pointer to the virtual table, called the \emph{virtual-table pointer}. Internally, the field is called @virtual_table@. This constant pointer is always the first field of the table so when casting to a supertype, the field's location is always known. The field is initialized as part of all generated constructors. \todo{They only come as part exceptions and don't work.} After the object is created the field is constant. However it can be read from, internally it is just a regular field called @virtual_table@. Dereferencing it gives the virtual table and access to the %After the object is created the field is constant. Dereferencing it gives the virtual table and access to the type's virtual members. \subsection{Virtual Table} Every time a virtual type is defined the new virtual table type must also be defined. The unique instance is important because the address of the virtual table instance is used as the identifier for the virtual type. So a pointer to the virtual table and the ID for the virtual type are interchangable. % PAB: These 2 paragraphs are repeated below, and maybe some of the paragraph above, too. \begin{comment} Every time a virtual type is defined, a new virtual table-type is instantiated. The uniqueness of the virtual-table instance is important because its address is used as the identifier for the virtual type. Hence, a pointer to the virtual table and the ID for the virtual type are interchangeable. \todo{Unique instances might be going so we will have to talk about the new system instead.} The first step in putting it all together is to create the virtual table type. The virtual table type is just a structure and can be described in terms of its fields. The first field is always the parent type ID (or a pointer to the parent virtual table) or 0 (the null pointer). Next are other fields on the parent virtual table are repeated. Finally are the fields used to store any new virtual members of the new The virtual type The virtual system is accessed through a private constant field inserted at the beginning of every virtual type, called the virtual-table pointer. This field points at a type's virtual table and is assigned during the object's construction. The address of a virtual table acts as the unique identifier for The first step is creating the virtual-table type. The virtual-table type is a structure and is described in terms of its fields. The first field contains the parent-type ID (or a pointer to the parent virtual-table) or 0 (null pointer). Next are repeated fields from on the parent virtual-table. Finally, the fields used to store any new virtual members of the new the virtual type. \end{comment} %The virtual system is accessed through a private constant field inserted at the %beginning of every virtual type. This field The virtual-table pointer points at a type's virtual table (see Figure~\vref{f:VirtualTableLayout}). %and is assigned during the object's %construction. The address of a virtual table acts as the unique identifier for the virtual type, and the first field of a virtual table is a pointer to the parent virtual-table or @0p@. The remaining fields are duplicated from the parent virtual-table or @0p@ (null pointer). The remaining fields are duplicated from the parent tables in this type's inheritance chain, followed by any fields this type introduces. Parent fields are duplicated so they can be changed (all virtual members are overridable), so that references to the dispatched type introduces. Parent fields are duplicated so they can be changed, \ie all virtual members are overridable, while the parent pointer allows access to the original values. Hence, references to the dispatched type are replaced with the current virtual type. % These are always taken by pointer or reference. \begin{figure} % Simple ascii diragram: \begin{verbatim} parent_pointer  \ parent_field0   | ...             | Same layout as parent. parent_fieldN   / child_field0 \begin{cfa} parent_pointer  // \C{parent pointer to access its fields} parent_field0   // \C{same layout as parent to allow replacement} ... parent_fieldN child_field0    // \C{new types for this virtual table} ... child_fieldN \end{verbatim} \todo{Refine the diagram} size alignment \end{cfa} %\todo{Refine the diagram} \caption{Virtual Table Layout} \label{f:VirtualTableLayout} \end{figure} % For each virtual type, a virtual table is constructed. This is both a new type % the type and the instance. A virtual table is created when the virtual type is created. The name of the \begin{comment} PAB: seems to be said already. A virtual table is created when a virtual type is created. The name of the type is created by mangling the name of the base type. The name of the instance is also generated by name mangling. The fields are initialized automatically. The parent field is initialized by getting the type of the parent field and using that to calculate the mangled name of the parent's virtual table type. using that to calculate the mangled name of the parent's virtual-table type. \end{comment} There are two special fields that are included like normal fields but have special initialization rules: the @size@ field is the type's size and is These operations are split up into several groups depending on where they take place which varies for monomorphic and polymorphic types. The first devision is place, which varies for monomorphic and polymorphic types. The first devision is between the declarations and the definitions. Declarations, such as a function signature or a aggregate's name, must always be visible but may be repeated in signature or an aggregate's name, must always be visible but may be repeated in the form of forward declarations in headers. Definitions, such as function bodies and a aggregate's layout, can be separately compiled but must occur exactly once in a source file. \begin{sloppypar} The declarations include the virtual type definition and forward declarations of the virtual table instance, constructor, message function and The declarations include the virtual-type definition and forward declarations of the virtual-table instance, constructor, message function and @get_exception_vtable@. The definition includes the storage and initialization of the virtual table instance and the bodies of the three functions. \end{sloppypar} Monomorphic instances put all of these two groups in one place each. Polymorphic instances also split out the core declarations and definitions from the per-instance information. The virtual table type and most of the functions are polymorphic so they are all part of the core. The virtual table instance and the @get_exception_vtable@ function. \begin{sloppypar} Monomorphic instances put all of these two groups in one place. Polymorphic instances split out the core declarations and definitions from the per-instance information. The virtual-table type and most of the functions are polymorphic so they are all part of the core. The virtual-table instance and the @get_exception_vtable@ function \PAB{ are ...}. Coroutines and threads need instances of @CoroutineCancelled@ and @ThreadCancelled@ respectively to use all of their functionality. When a new data type is declared with @coroutine@ or @thread@ the forward declaration for data type is declared with @coroutine@ or @thread@, the forward declaration for the instance is created as well. The definition of the virtual table is created at the definition of the main function. \end{sloppypar} \PAB{You need an example here to show what happens for this case.} \subsection{Virtual Cast} The function is \begin{cfa} void * __cfa__virtual_cast( struct __cfa__parent_vtable const * parent, void * __cfa__virtual_cast( struct __cfa__parent_vtable const * parent, struct __cfa__parent_vtable const * const * child ); \end{cfa} and it is implemented in the standard library. The structure reperents the head of a vtable which is the pointer to the parent virtual table. The @parent@ points directly at the parent type virtual table while the @child@ points at the object of the (possibe) child type. In terms of the virtual cast expression, @parent@ comes from looking up the and it is implemented in the standard library. The structure represents the head of a virtual table, which is the pointer to the parent virtual table. The @parent@ points directly at the parent-type virtual-table, while the @child@ points at the object of the (possible) child type. \PAB{Need a figure to show this relationship.} In terms of the virtual-cast expression, @parent@ comes from looking up the type being cast to and @child@ is the result of the expression being cast. Because the complier outputs C code, some type C type casts are also used. The last bit of glue is an map that saves every virtual type the compiler sees. This is used to check the type used in a virtual cast is a virtual Because the complier outputs C code, some C-type casts are also used. The last bit of glue is a map that saves every virtual type the compiler sees. This table is used to check the type used in a virtual cast is a virtual type and to get its virtual table. (It also checks for conflicting definitions.) Inside the function it is a simple conditional. If the type repersented by @parent@ is or is an ancestor of the type repersented by @*child@ (it requires one more level of derefence to pass through the object) then @child@ \PAB{Can this be rolled into the figure above?} Inside the function is a simple conditional. If the type represented by @parent@ is an ancestor of the type represented by @*child@ (it requires one more level of dereference to pass through the object) then @child@ is returned, otherwise the null pointer is returned. The check itself is preformed is a simple linear search. If the child virtual table or any of its ancestors (which are retreved through the first field of every virtual table) are the same as the parent virtual table then The check is a simple linear search (like \Cpp RTTI). If the child virtual table or any of its ancestors (which are retrieved through the first field of every virtual table) are the same as the parent virtual-table then the cast succeeds. % resumption doesn't as well. % Many modern languages work with an interal stack that function push and pop % Many modern languages work with an internal stack that function push and pop % their local data to. Stack unwinding removes large sections of the stack, % often across functions. Stack unwinding is the process of removing stack frames (activations) from the stack. On function entry and return, unwinding is handled directly by the code embedded in the function. Usually, the stack-frame size is known statically stack. On function entry and return, unwinding is handled directly by the call/return code embedded in a function. Usually, the stack-frame size is known statically based on parameter and local variable declarations. For dynamically-sized local variables, a runtime computation is necessary to know the frame local variables. (Often called a variable-length array or VLA, even when the variable type is an aggregate.) For VLAs, a runtime computation is necessary to know the frame size. Finally, a function's frame-size may change during execution as local variables (static or dynamic sized) go in and out of scope. variables (static or dynamic sized) go in and out of scope, which is a form of VLA. Allocating/deallocating stack space is usually an $O(1)$ operation achieved by bumping the hardware stack-pointer up or down as needed. Unwinding across multiple stack frames is more complex because individual stack management code associated with each frame is bypassed. That is, the location Unwinding across multiple stack frames is more complex because individual stack-management code associated with each frame can be bypassed. That is, the location of a function's frame-management code is largely unknown and dispersed throughout the function, hence the current frame size managed by that code is reseting to a snap-shot of an arbitrary but existing function frame on the stack. It is up to the programmer to ensure the snap-shot is valid when it is reset, making this unwinding approach fragile with potential errors that are reset and that unwound frames do not have side-effects. Hence, this unwinding approach is fragile with potential errors that are difficult to debug because the stack becomes corrupted. However, many languages define cleanup actions that must be taken when objects are deallocated from the stack or blocks end, such as running a variable's destructor or a @try@ statement's @finally@ clause. Handling these mechanisms With respect to stack side-effects, many languages define cleanup actions that must be taken when objects are deallocated from the stack, when the function of blocks within the function end, such as running a variable's destructor or a @try@ statement's @finally@ clause. The purpose of these side-effects is to reestablish the global state of the program, such as dynamic memory-allocation or file access. Handling these side-effect mechanisms requires walking the stack and checking each stack frame for these potential actions. For exceptions, it must be possible to walk the stack frames in search of @try@ actions, where a frame can be any block with declarations. In languages like \Cpp and Java, it must be possible to walk the stack frames in search of @try@ statements to match and execute a handler. For termination exceptions, it must also be possible to unwind all stack frames from the throw to the matching catch, and each of these frames must be checked for cleanup actions. Stack catch (including the @try@ block), and each of these frames must be checked for cleanup actions. Stack walking is where most of the complexity and expense of exception handling appears. LSDA can contain any information but conventionally it is a table with entries representing regions of the function and what has to be done there during unwinding. These regions are bracketed by the instruction pointer. If the unwinding. These regions are bracketed by instruction addresses. If the instruction pointer is within a region's start/end, then execution is currently executing in that region. Regions are used to mark out the scopes of objects The GCC compilation flag @-fexceptions@ causes the generation of an LSDA and attaches its personality function. However, this attaches its personality function. It attaches a series of opaque directives (@.cfi_personality@ directive) used internally and not part of this work. However, this flag only handles the cleanup attribute: \todo{Peter: What is attached? Andrew: It uses the .cfi\_personality directive and that's all I know.} \begin{cfa} void clean_up( int * var ) { ... } int avar __attribute__(( cleanup(clean_up) )); \end{cfa} which is used on a variable and specifies a function, in this case @clean_up@, run when the variable goes out of scope. The function is passed a pointer to the object being removed from the stack so it can be used to mimic destructors. that is used on a variable and specifies a function, in this case @clean_up@, run when the variable goes out of scope, which is used to mimic destructors. However, this feature cannot be used to mimic @try@ statements as it cannot control the unwinding. section covers some of the important parts of the interface. A personality function can preform different actions depending on how it is A personality function can perform different actions depending on how it is called. \begin{lstlisting}[language=C,{moredelim=**[is][\color{red}]{@}{@}}] \end{lstlisting} The @action@ argument is a bitmask of possible actions: \begin{enumerate} \begin{enumerate}[topsep=5pt] \item @_UA_SEARCH_PHASE@ specifies a search phase and tells the personality function @_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs the cleanup phase and uses a different means to decide when to stop (see \vref{s:ForcedUnwind}). (see Section~\vref{s:ForcedUnwind}). \end{enumerate} The @exception_class@ argument is a copy of the \lstinline[language=C]|exception|'s @exception_class@ field. \PAB{Say more.} The \lstinline[language=C]|exception| argument is a pointer to the user provided storage object. It has two public fields, the exception class, which is actually just a number, identifying the exception handling mechanism that is just a number, identifying the exception handling mechanism that created it, and the cleanup function. The cleanup function is called if required by the exception. that can be passed several places in libunwind. It includes a number of messages for special cases (some of which should never be used by the personality function) and error codes but unless otherwise noted the personality function) and error codes. However, unless otherwise noted, the personality function should always return @_URC_CONTINUE_UNWIND@. @_URC_END_OF_STACK@. Second, when a handler is matched, raise exception continues onto the cleanup Second, when a handler is matched, raise exception walks the stack again performing the cleanup phase. Once again, it calls the personality functions of each stack frame from newest Forced Unwind is the other central function in libunwind. \begin{cfa} _Unwind_Reason_Code _Unwind_ForcedUnwind( _Unwind_Exception *, _Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *, _Unwind_Stop_Fn, void *); \end{cfa} Each stack must have its own exception context. In a sequential \CFA program, there is only one stack with a single global exception-context. However, when the library @libcfathread@ is linked, there are multiple stacks where each the library @libcfathread@ is linked, there are multiple stacks, where each needs its own exception context. General access to the exception context is provided by function @this_exception_context@. For sequential execution, this function is defined as The function @this_exception_context@ provides general access to the exception context. For sequential execution, this function is defined as a weak symbol in the \CFA system-library, @libcfa@. When a \CFA program is concurrent, it links with @libcfathread@, where this function is defined with a The sequential @this_exception_context@ returns a hard-coded pointer to the global execption context. global exception context. The concurrent version adds the exception context to the data stored at the base of each stack. When @this_exception_context@ is called it retrieves the base of each stack. When @this_exception_context@ is called, it retrieves the active stack and returns the address of the context saved there. % catches. Talk about GCC nested functions. Termination exceptions use libunwind heavily because it matches the intended use from \Cpp exceptions closely. The main complication for \CFA is that the Termination exceptions use libunwind heavily because \CFA termination exceptions match \Cpp exceptions closely. The main complication for \CFA is that the compiler generates C code, making it very difficult to generate the assembly to form the LSDA for try blocks or destructors. The first step of a termination raise is to copy the exception into memory managed by the exception system. Currently, the system uses @malloc@, rather than reserved memory or the stack top. The exception handling mechanism manages than reserved memory or the stack top. The exception-handling mechanism manages memory for the exception as well as memory for libunwind and the system's own per-exception storage. [Quick ASCII diagram to get started.] \begin{figure} \begin{verbatim} Fixed Header  | _Unwind_Exception   <- pointer target V ... \end{verbatim} Exceptions are stored in variable-sized blocks. The first component is a fixed sized data structure that contains the \caption{Exception Layout} \label{f:ExceptionLayout} \end{figure} Exceptions are stored in variable-sized blocks (see Figure~\vref{f:ExceptionLayout}). The first component is a fixed-sized data-structure that contains the information for libunwind and the exception system. The second component is an area of memory big enough to store the exception. Macros with pointer arthritic @_Unwind_Exception@ to the entire node. All of these nodes are linked together in a list, one list per stack, with the Multiple exceptions can exist because handlers can call functions that raise exceptions.  Figure~\vref{f:MultipleExceptions} shows a \Cpp program where exceptions are handled, and then a function is called from the handler that raises a new exception. The previous exception must persist because it is unhandled, and hence, control can return to the handler and that exception is reraised. \begin{figure} \centering \newsavebox{\myboxA} \newsavebox{\myboxB} \begin{lrbox}{\myboxA} \begin{lstlisting}[language=C++,{moredelim=**[is][\color{red}]{@}{@}}] struct E {}; int cnt = 3; void f( int i ) { if ( i == 0 ) @throw E();@ try { @f( i - 1 );@ } catch( E ) { // handler h cnt -= 1; if ( cnt > 0 ) @f( 2 );@ } } int main() { @f( 2 );@ } \end{lstlisting} \end{lrbox} \begin{lrbox}{\myboxB} \begin{lstlisting} h  $\makebox[0pt][l]{\textbackslash}f$ f f h  $\makebox[0pt][l]{\textbackslash}f$  throw E$$$_2$$$ f f h  $\makebox[0pt][l]{\textbackslash}f$  throw E$$$_1$$$ f f \end{lstlisting} \end{lrbox} {\usebox\myboxA} \hspace{25pt} {\usebox\myboxB} \caption{Multiple Exceptions} \label{f:MultipleExceptions} \end{figure} In this case, the exception nodes are linked together in a list, one list per stack, with the list head stored in the exception context. Within each linked list, the most recently thrown exception is at the head followed by older thrown exception, the copy function, and the free function, so they are specific to an exception type. The size and copy function are used immediately to copy an exception into managed memory. After the exception is handled the free function exception into managed memory. After the exception is handled, the free function is used to clean up the exception and then the entire node is passed to free so the memory can be given back to the heap. \subsection{Try Statements and Catch Clauses} The try statement with termination handlers is complex because it must compensate for the lack of assembly-code generated from \CFA. Libunwind compensate for the lack of assembly code generated from \CFA. Libunwind requires an LSDA and personality function for control to unwind across a function. The LSDA in particular is hard to mimic in generated C code. calls them. Because this function is known and fixed (and not an arbitrary function that happens to contain a try statement) this means the LSDA can be generated ahead happens to contain a try statement), this means the LSDA can be generated ahead of time. Both the LSDA and the personality function are set ahead of time using embedded assembly. This is handcrafted using C @asm@ statements and contains enough information for the single try statement the function repersents. embedded assembly. This assembly code is handcrafted using C @asm@ statements and contains enough information for the single try statement the function represents. The three functions passed to try terminate are: nested functions and all other functions besides @__cfaehm_try_terminate@ in \CFA use the GCC personality function and the @-fexceptions@ flag to generate the LSDA. This allows destructors to be implemented with the cleanup attribute. the LSDA. Through this mechanism, \CFA destructors are implemented via the cleanup attribute. \PAB{Try to put together an example try statement illustrating these components.} \section{Resumption} % The stack-local data, the linked list of nodes. Resumption simple to implement because there is no stack unwinding. The Resumption is simpler to implement than termination because there is no stack unwinding.  \PAB{You need to explain how the \lstinline{catchResume} clauses are handled. Do you use the personality mechanism in libunwind or do you roll your own mechanism?} The resumption raise uses a list of nodes for its stack traversal. The head of the list is stored in the exception context. The nodes in the list have a pointer to the next node and a pointer to the handler function. A resumption raise traverses this list. At each node the handler function is called, passing the exception by pointer. It returns true if the exception is the stack already examined, is accomplished by updating the front of the list as the search continues. Before the handler at a node is called the head of the list search continues. Before the handler at a node is called, the head of the list is updated to the next node of the current node. After the search is complete, successful or not, the head of the list is reset. the other handler checked up to this point are not checked again. This structure also supports new handler added while the resumption is being This structure also supports new handlers added while the resumption is being handled. These are added to the front of the list, pointing back along the stack -- the first one points over all the checked handlers -- and the ordering is maintained. \PAB{Again, a figure to show how this works would be helpful.} \label{p:zero-cost} % that unwind is required knowledge for that chapter. \PAB{This paragraph needs to be moved to the start of this Section, where I have have my other comment.} \section{Finally} % Uses destructors and GCC nested functions. Finally clauses is placed into a GCC nested-function with a unique name, and no A finally clause is placed into a GCC nested-function with a unique mangled name, and no arguments or return values. This nested function is then set as the cleanup function of an empty object that is declared at the beginning of a block placed around the context of the associated @try@ statement. The rest is handled by GCC. The try block and all handlers are inside the around the context of an associated @try@ statement. The rest is handled by GCC. The try block and all handlers are inside this block. At completion, control exits the block and the empty object is cleaned up, which runs the function that contains the finally code. Cancellation also uses libunwind to do its stack traversal and unwinding, however it uses a different primary function @_Unwind_ForcedUnwind@. Details of its interface can be found in the \vref{s:ForcedUnwind}. however it uses a different primary function, @_Unwind_ForcedUnwind@. Details of its interface can be found in Section~\vref{s:ForcedUnwind}. The first step of cancellation is to find the cancelled stack and its type: coroutine or thread. Fortunately, the thread library stores the main thread pointer and the current thread pointer, and every thread stores a pointer to its main coroutine and the coroutine it is currently executing. So if the active thread's main and current coroutine are the same. If they are then the current stack is a thread stack, otherwise it is a coroutine stack. If it is a thread stack then an equality check with the stored main thread pointer and current thread pointer is enough to tell if the current thread is the main thread or not. coroutine or thread. Fortunately, the thread library stores the program-main thread pointer and the current-thread pointer, and every thread stores a pointer to the current coroutine it is executing. \PAB{I don't know if my corrections in the previous paragraph are correct.} When the active thread and coroutine are the same, the current stack is the thread stack, otherwise it is a coroutine stack. % PAB: repeated? % If it is a thread stack, then an equality check with the stored main % thread pointer and current thread pointer is enough to tell if the current % thread is the main thread or not. However, if the threading library is not linked, the sequential execution is on the main stack. Hence, the entire check is skipped because the weak-symbol Regardless of how the stack is chosen, the stop function and parameter are passed to the forced-unwind function. The general pattern of all three stop functions is the same: they continue unwinding until the end of stack when they do there primary work. functions is the same: continue unwinding until the end of stack. %when they %do there primary work. For main stack cancellation, the transfer is just a program abort. For coroutine cancellation, the exception is stored on the coroutine's stack, For coroutine cancellation, the exception is stored in the coroutine's stack, and the coroutine context switches to its last resumer. The rest is handled on the backside of the resume, which check if the resumed coroutine is
• ## doc/theses/andrew_beach_MMath/uw-ethesis.tex

 r8cd34e9 % cfa macros used in the document \usepackage{cfalab} % allow global and individual modification of spacing \usepackage{enumitem} % Improved reference tools. \usepackage[nospace]{varioref}
• ## libcfa/src/concurrency/invoke.h

 r8cd34e9 struct __thread_desc_link { struct $thread * next; struct$thread * prev; volatile unsigned long long ts; unsigned preferred; }; // context that is switch during a __cfactx_switch struct __stack_context_t context; // Link lists fields // instrusive link field for threads struct __thread_desc_link link; // current execution status for coroutine struct cluster * curr_cluster; // Link lists fields // instrusive link field for threads struct __thread_desc_link link; // preferred ready-queue unsigned preferred; // coroutine body used to store context

• ## libcfa/src/concurrency/kernel/startup.cfa

 r8cd34e9 self_mon_p = &self_mon; link.next = 0p; link.prev = 0p; link.preferred = -1u; link.ts   = 0; preferred = -1u; last_proc = 0p; #if defined( __CFA_WITH_VERIFY__ )
 r8cd34e9 // Intrusives lanes which are used by the relaxed ready queue struct __attribute__((aligned(128))) __intrusive_lane_t { #if defined(USE_MPSC) mpsc_queue(thread) queue; __attribute__((aligned(128))) #else // anchor for the head and the tail of the queue __attribute__((aligned(128))) struct __sentinel_t { // Link lists fields // instrusive link field for threads // must be exactly as inthread __thread_desc_link link; } before, after; #endif struct thread * prev; // spin lock protecting the queue volatile bool lock; // Optional statistic counters #if !defined(__CFA_NO_SCHED_STATS__) struct __attribute__((aligned(64))) { // difference between number of push and pops ssize_t diff; // total number of pushes and pops size_t push; size_t pop ; } stat; #endif __thread_desc_link anchor; }; void ?{}(__intrusive_lane_t & this); void ^?{}(__intrusive_lane_t & this); // Get the head pointer (one before the first element) from the anchor static inlinethread * head(const __intrusive_lane_t & this) { #if defined(USE_MPSC) return this.queue.head; #else $thread * rhead = ($thread *)( (uintptr_t)( &this.before ) - offsetof( $thread, link ) ); /* paranoid */ verify(rhead); return rhead; #endif } // Get the tail pointer (one after the last element) from the anchor static inline$thread * tail(const __intrusive_lane_t & this) { #if defined(USE_MPSC) return this.queue.tail; #else $thread * rtail = ($thread *)( (uintptr_t)( &this.after ) - offsetof( $thread, link ) ); /* paranoid */ verify(rtail); return rtail; #endif static inline$thread * mock_head(const __intrusive_lane_t & this) { $thread * rhead = ($thread *)( (uintptr_t)( &this.anchor ) - __builtin_offsetof( $thread, link ) ); return rhead; } void ?{}( __intrusive_lane_t & this ) { this.lock = false; this.prev = mock_head(this); this.anchor.next = 0p; this.anchor.ts = 0; #if !defined(USE_MPSC) this.before.link.prev = 0p; this.before.link.next = tail(this); this.before.link.ts = 0; this.after .link.prev = head(this); this.after .link.next = 0p; this.after .link.ts = 0; #if !defined(__CFA_NO_SCHED_STATS__) this.stat.diff = 0; this.stat.push = 0; this.stat.pop = 0; #endif // We add a boat-load of assertions here because the anchor code is very fragile /* paranoid */ verify(((uintptr_t)( head(this) ) + offsetof($thread, link )) == (uintptr_t)(&this.before)); /* paranoid */ verify(((uintptr_t)( tail(this) ) + offsetof( thread, link )) == (uintptr_t)(&this.after )); /* paranoid */ verify(head(this)->link.prev == 0p ); /* paranoid */ verify(head(this)->link.next == tail(this) ); /* paranoid */ verify(tail(this)->link.next == 0p ); /* paranoid */ verify(tail(this)->link.prev == head(this) ); /* paranoid */ verify(&head(this)->link.prev == &this.before.link.prev ); /* paranoid */ verify(&head(this)->link.next == &this.before.link.next ); /* paranoid */ verify(&tail(this)->link.prev == &this.after .link.prev ); /* paranoid */ verify(&tail(this)->link.next == &this.after .link.next ); /* paranoid */ verify(__alignof__(__intrusive_lane_t) == 128); /* paranoid */ verify(__alignof__(this) == 128); /* paranoid */ verifyf(((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128)); #endif // We add a boat-load of assertions here because the anchor code is very fragile /* paranoid */ verify( offsetof(thread, link ) == offsetof(__intrusive_lane_t, anchor) ); /* paranoid */ verify( ((uintptr_t)( mock_head(this) ) + offsetof( thread, link )) == (uintptr_t)(&this.anchor) ); /* paranoid */ verify( &mock_head(this)->link.next == &this.anchor.next ); /* paranoid */ verify( &mock_head(this)->link.ts == &this.anchor.ts ); /* paranoid */ verify( mock_head(this)->link.next == 0p ); /* paranoid */ verify( mock_head(this)->link.ts == 0 ); /* paranoid */ verify( mock_head(this) == this.prev ); /* paranoid */ verify( __alignof__(__intrusive_lane_t) == 128 ); /* paranoid */ verify( __alignof__(this) == 128 ); /* paranoid */ verifyf( ((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128) ); } // Dtor is trivial void ^?{}( __intrusive_lane_t & this ) { #if !defined(USE_MPSC) // Make sure the list is empty /* paranoid */ verify(head(this)->link.prev == 0p ); /* paranoid */ verify(head(this)->link.next == tail(this) ); /* paranoid */ verify(tail(this)->link.next == 0p ); /* paranoid */ verify(tail(this)->link.prev == head(this) ); #endif // Make sure the list is empty /* paranoid */ verify( this.anchor.next == 0p ); /* paranoid */ verify( this.anchor.ts == 0 ); /* paranoid */ verify( mock_head(this) == this.prev ); } // Push a thread onto this lane // returns true of lane was empty before push, false otherwise bool push(__intrusive_lane_t & this,thread * node) { #if defined(USE_MPSC) inline $thread * volatile & ?next ($thread * this )  __attribute__((const)) { return this->link.next; } push(this.queue, node); #else #if defined(__CFA_WITH_VERIFY__) /* paranoid */ verify(this.lock); /* paranoid */ verify(node->link.ts != 0); /* paranoid */ verify(node->link.next == 0p); /* paranoid */ verify(node->link.prev == 0p); /* paranoid */ verify(tail(this)->link.next == 0p); /* paranoid */ verify(head(this)->link.prev == 0p); void push( __intrusive_lane_t & this, $thread * node ) { /* paranoid */ verify( node->link.next == 0p ); /* paranoid */ verify( node->link.ts == 0 ); /* paranoid */ verify( this.prev->link.next == 0p ); /* paranoid */ verify( this.prev->link.ts == 0 ); if( this.anchor.next == 0p ) { /* paranoid */ verify( this.anchor.next == 0p ); /* paranoid */ verify( this.anchor.ts == 0 ); /* paranoid */ verify( this.prev == mock_head( this ) ); } else { /* paranoid */ verify( this.anchor.next != 0p ); /* paranoid */ verify( this.anchor.ts != 0 ); /* paranoid */ verify( this.prev != mock_head( this ) ); } if(this.before.link.ts == 0l) { /* paranoid */ verify(tail(this)->link.prev == head(this)); /* paranoid */ verify(head(this)->link.next == tail(this)); } else { /* paranoid */ verify(tail(this)->link.prev != head(this)); /* paranoid */ verify(head(this)->link.next != tail(this)); } #endif // Get the relevant nodes locally$thread * tail = tail(this); $thread * prev = tail->link.prev; // Do the push node->link.next = tail; node->link.prev = prev; prev->link.next = node; tail->link.prev = node; // Update stats #if !defined(__CFA_NO_SCHED_STATS__) this.stat.diff++; this.stat.push++; #endif verify(node->link.next == tail(this)); // Check if the queue used to be empty if(this.before.link.ts == 0l) { this.before.link.ts = node->link.ts; /* paranoid */ verify(node->link.prev == head(this)); return true; } return false; #endif // Get the relevant nodes locally this.prev->link.next = node; this.prev->link.ts = rdtscl(); this.prev = node; } // returns popped // returns true of lane was empty before push, false otherwise$thread * pop(__intrusive_lane_t & this) { /* paranoid */ verify(this.lock); #if defined(USE_MPSC) inline $thread * volatile & ?next ($thread * this )  __attribute__((const)) { return this->link.next; } return pop(this.queue); #else /* paranoid */ verify(this.before.link.ts != 0ul); $thread * pop( __intrusive_lane_t & this ) { /* paranoid */ verify( this.anchor.next != 0p ); /* paranoid */ verify( this.anchor.ts != 0 ); // Get anchors locally$thread * head = head(this); $thread * tail = tail(this); // Get the relevant nodes locally$thread * node = this.anchor.next; this.anchor.next = node->link.next; this.anchor.ts   = node->link.ts; bool is_empty = this.anchor.ts == 0; node->link.next = 0p; node->link.ts   = 0; // Get the relevant nodes locally $thread * node = head->link.next;$thread * next = node->link.next; // Update head time stamp if(is_empty) this.prev = mock_head( this ); /* paranoid */ verify(node != tail); /* paranoid */ verify(node); // Do the pop head->link.next = next; next->link.prev = head; node->link.next = 0p; node->link.prev = 0p; // Update head time stamp this.before.link.ts = next->link.ts; // Update stats #ifndef __CFA_NO_SCHED_STATS__ this.stat.diff--; this.stat.pop ++; #endif // Check if we emptied list and return accordingly /* paranoid */ verify(tail(this)->link.next == 0p); /* paranoid */ verify(head(this)->link.prev == 0p); if(next == tail) { /* paranoid */ verify(this.before.link.ts == 0); /* paranoid */ verify(tail(this)->link.prev == head(this)); /* paranoid */ verify(head(this)->link.next == tail(this)); return node; } else { /* paranoid */ verify(next->link.ts != 0); /* paranoid */ verify(tail(this)->link.prev != head(this)); /* paranoid */ verify(head(this)->link.next != tail(this)); /* paranoid */ verify(this.before.link.ts != 0); return node; } #endif /* paranoid */ verify( node->link.next == 0p ); /* paranoid */ verify( node->link.ts   == 0  ); return node; } // Check whether or not list is empty static inline bool is_empty(__intrusive_lane_t & this) { #if defined(USE_MPSC) return this.queue.head == 0p; #else // Cannot verify here since it may not be locked return this.before.link.ts == 0; #endif return this.anchor.ts == 0; } // Return the timestamp static inline unsigned long long ts(__intrusive_lane_t & this) { #if defined(USE_MPSC) \$thread * tl = this.queue.head; if(!tl) return -1ull; return tl->link.ts; #else // Cannot verify here since it may not be locked return this.before.link.ts; #endif // Cannot verify here since it may not be locked return this.anchor.ts; }
 r8cd34e9 }; Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, ptrdiff_t i ) { // About the choice of integral types offered as subscript overloads: // Intent is to cover these use cases: //    float foo( ptrdiff_t i ) { return a[i]; }           // i : ptrdiff_t //    forall( [N] ) ... for( i; N ) { total += a[i]; }    // i : typeof( sizeof(42) ) //    for( i; 5 ) { total += a[i]; }                      // i : int // It gets complicated by: // -  CFA does overloading on concrete types, like int and unsigned int, not on typedefed //    types like size_t.  So trying to overload on ptrdiff_t vs int works in 64-bit mode //    but not in 32-bit mode. // -  Given bug of Trac #247, CFA gives sizeof expressions type unsigned long int, when it //    should give them type size_t. // //                          gcc -m32         cfa -m32 given bug         gcc -m64 // ptrdiff_t                int              int                        long int // size_t                   unsigned int     unsigned int               unsigned long int // typeof( sizeof(42) )     unsigned int     unsigned long int          unsigned long int // int                      int              int                        int static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) { return (Timmed &) a.strides[i]; } Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) { static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned int i ) { return (Timmed &) a.strides[i]; } Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, size_t i ) { static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, long int i ) { return (Timmed &) a.strides[i]; } size_t ?len( arpk(N, S, Timmed, Tbase) & a ) { static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned long int i ) { return (Timmed &) a.strides[i]; } static inline size_t ?len( arpk(N, S, Timmed, Tbase) & a ) { return z(N); } // workaround #226 (and array relevance thereof demonstrated in mike102/otype-slow-ndims.cfa) void ?{}( arpk(N, S, Timmed, Tbase) & this ) { static inline void ?{}( arpk(N, S, Timmed, Tbase) & this ) { void ?{}( S (&inner)[z(N)] ) {} ?{}(this.strides); } void ^?{}( arpk(N, S, Timmed, Tbase) & this ) { static inline void ^?{}( arpk(N, S, Timmed, Tbase) & this ) { void ^?{}( S (&inner)[z(N)] ) {} ^?{}(this.strides); forall( Te ) Te mkar_( tag(Te) ) {} static inline Te mkar_( tag(Te) ) {} forall( [N], ZTags ... , Trslt &, Tatom & | { Trslt mkar_( tag(Tatom), ZTags ); } ) arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {} static inline arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {} // based on https://stackoverflow.com/questions/1872220/is-it-possible-to-iterate-over-arguments-in-variadic-macros forall( TA &, TB &, TC &, IxAB, IxBC ... | { TB & ?[?]( TA &, IxAB ); TC & ?[?]( TB &, IxBC ); } ) TC & ?[?]( TA & this, IxAB ab, IxBC bc ) { static inline TC & ?[?]( TA & this, IxAB ab, IxBC bc ) { return this[ab][bc]; } forall( TA &, TB &, TC &, IxAB_0, IxBC | { TB & ?[?]( TA &, IxAB_0 ); TC & ?[?]( TB &, IxBC ); } ) TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) { static inline TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) { return this[ab][bc]; } forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1 ); TC & ?[?]( TB &, IxBC ); } ) TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) { static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) { return this[[ab0,ab1]][bc]; } forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxAB_2, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1, IxAB_2 ); TC & ?[?]( TB &, IxBC ); } ) TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) { static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) { return this[[ab0,ab1,ab2]][bc]; } // Base forall( [Nq], [Sq], Tbase & ) tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {} static inline tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {} // Rec forall( [Nq], [Sq], [N], [S], recq &, recr &, Tbase & | { tag(recr) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(recq) ); } ) tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {} static inline tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {} // Wrapper struct all_t {} all; forall( [N], [S], Te &, result &, Tbase & | { tag(result) enq_( tag(Tbase), tag(N), tag(S), tag(Te) ); } ) result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) { static inline result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) { return (result&) this; }