Context Navigation

← Previous Changeset
Next Changeset →

Changeset 1716e1c

Timestamp:

May 4, 2021, 12:31:26 PM (5 years ago)

Author:

Andrew Beach <ajbeach@…>

Branches:

ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum, stuck-waitfor-destruct

Children:

Parents:

8cd34e9 (diff), c0c940a (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' into andrew-mmath, implementation updates.

Files:

: 10 edited
: 2 moved

doc/theses/andrew_beach_MMath/cfalab.sty (modified) (1 diff)
doc/theses/andrew_beach_MMath/implement.tex (modified) (29 diffs)
doc/theses/andrew_beach_MMath/uw-ethesis.tex (modified) (1 diff)
libcfa/src/concurrency/invoke.h (modified) (3 diffs)
libcfa/src/concurrency/kernel.cfa (modified) (2 diffs)
libcfa/src/concurrency/kernel/startup.cfa (modified) (1 diff)
libcfa/src/concurrency/ready_queue.cfa (modified) (9 diffs)
libcfa/src/concurrency/ready_subqueue.hfa (modified) (3 diffs)
libcfa/src/concurrency/thread.cfa (modified) (1 diff)
libcfa/src/containers/array.hfa (modified) (5 diffs)
tests/array-container/.expect/array-basic.txt (moved) (moved from tests/array-container/.expect/array-basic.x64.txt )
tests/array-container/.expect/array-md-sbscr-cases.txt (moved) (moved from tests/array-container/.expect/array-md-sbscr-cases.x64.txt )

Legend:

: Unmodified
: Added
: Removed

doc/theses/andrew_beach_MMath/cfalab.sty

r8cd34e9	r1716e1c
123	123	numberstyle=\footnotesize\sf,
124	124	% Replace/adjust listing characters that look bad in sanserif.
125		literate={-}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.75ex}{0.1ex}}}}1
	125	literate={-}{\makebox[1ex][c]{\raisebox{0.7ex}{\rule{0.75ex}{0.1ex}}}}1
126	126	{^}{\raisebox{0.6ex}{$\scriptscriptstyle\land\,$}}1
127	127	{~}{\raisebox{0.3ex}{$\scriptstyle\sim\,$}}1 {`}{\ttfamily\upshape\hspace*{-0.1ex}`}1

doc/theses/andrew_beach_MMath/implement.tex

-              r8cd34e9
+              r1716e1c
 \subsection{Virtual Type}
+Virtual types only have one change to their structure, the addition of a
+pointer to the virtual table. This is always the first field so that
+if it is cast to a supertype the field's location is still known.
+This field is set as part of all new generated constructors.
+Virtual types only have one change to their structure: the addition of a
+pointer to the virtual table, called the \emph{virtual-table pointer}.
+Internally, the field is called
+@virtual_table@.
+This constant pointer is always the first field of the table so when
+casting to a supertype, the field's location is always known.
+The field is initialized as part of all generated constructors.
 \todo{They only come as part exceptions and don't work.}
+After the object is created the field is constant.
+However it can be read from, internally it is just a regular field called
+@virtual_table@. Dereferencing it gives the virtual table and access to the
+%After the object is created the field is constant.
+Dereferencing it gives the virtual table and access to the
 type's virtual members.
 \subsection{Virtual Table}
+Every time a virtual type is defined the new virtual table type must also be
+defined.
+The unique instance is important because the address of the virtual table
+instance is used as the identifier for the virtual type. So a pointer to the
+virtual table and the ID for the virtual type are interchangable.
+% PAB: These 2 paragraphs are repeated below, and maybe some of the paragraph above, too.
+\begin{comment}
+Every time a virtual type is defined, a new virtual table-type is
+instantiated.
+The uniqueness of the virtual-table
+instance is important because its address
+is used as the identifier for the virtual type. Hence, a pointer to the
+virtual table and the ID for the virtual type are interchangeable.
 \todo{Unique instances might be going so we will have to talk about the new
 system instead.}
+The first step in putting it all together is to create the virtual table type.
+The virtual table type is just a structure and can be described in terms of
+its fields. The first field is always the parent type ID (or a pointer to
+the parent virtual table) or 0 (the null pointer).
+Next are other fields on the parent virtual table are repeated.
+Finally are the fields used to store any new virtual members of the new
+The virtual type
+The virtual system is accessed through a private constant field inserted at the
+beginning of every virtual type, called the virtual-table pointer. This field
+points at a type's virtual table and is assigned during the object's
+construction. The address of a virtual table acts as the unique identifier for
+The first step is creating the virtual-table type.
+The virtual-table type is a structure and is described in terms of
+its fields. The first field contains the parent-type ID (or a pointer to
+the parent virtual-table) or 0 (null pointer).
+Next are repeated fields from on the parent virtual-table.
+Finally, the fields used to store any new virtual members of the new
+the virtual type.
+\end{comment}
+%The virtual system is accessed through a private constant field inserted at the
+%beginning of every virtual type. This field
+The virtual-table pointer
+points at a type's virtual table (see Figure~\vref{f:VirtualTableLayout}).
+%and is assigned during the object's
+%construction.
+The address of a virtual table acts as the unique identifier for
 the virtual type, and the first field of a virtual table is a pointer to the
 parent virtual-table or @0p@. The remaining fields are duplicated from the
+parent virtual-table or @0p@ (null pointer). The remaining fields are duplicated from the
 parent tables in this type's inheritance chain, followed by any fields this type
+introduces. Parent fields are duplicated so they can be changed (all virtual
+members are overridable), so that references to the dispatched type
+introduces. Parent fields are duplicated so they can be changed, \ie all virtual
+members are overridable, while the parent pointer allows access to the original values.
+Hence, references to the dispatched type
 are replaced with the current virtual type.
 % These are always taken by pointer or reference.
+\begin{figure}
 % Simple ascii diragram:
 \begin{verbatim}
 parent_pointer  \
 parent_field0   |
 ...             | Same layout as parent.
 parent_fieldN   /
 child_field0
+\begin{cfa}
+parent_pointer  // \C{parent pointer to access its fields}
+parent_field0   // \C{same layout as parent to allow replacement}
+...
+parent_fieldN
+child_field0    // \C{new types for this virtual table}
 ...
 child_fieldN
+\end{verbatim}
+\todo{Refine the diagram}
+size
+alignment
+\end{cfa}
+%\todo{Refine the diagram}
+\caption{Virtual Table Layout}
+\label{f:VirtualTableLayout}
+\end{figure}
 % For each virtual type, a virtual table is constructed. This is both a new type
 …
 % the type and the instance.
+A virtual table is created when the virtual type is created. The name of the
+\begin{comment}
+PAB: seems to be said already.
+A virtual table is created when a virtual type is created. The name of the
 type is created by mangling the name of the base type. The name of the instance
 is also generated by name mangling. The fields are initialized automatically.
 The parent field is initialized by getting the type of the parent field and
+using that to calculate the mangled name of the parent's virtual table type.
+using that to calculate the mangled name of the parent's virtual-table type.
+\end{comment}
 There are two special fields that are included like normal fields but have
 special initialization rules: the @size@ field is the type's size and is
 …
 These operations are split up into several groups depending on where they take
 place which varies for monomorphic and polymorphic types. The first devision is
+place, which varies for monomorphic and polymorphic types. The first devision is
 between the declarations and the definitions. Declarations, such as a function
 signature or a aggregate's name, must always be visible but may be repeated in
+signature or an aggregate's name, must always be visible but may be repeated in
 the form of forward declarations in headers. Definitions, such as function
 bodies and a aggregate's layout, can be separately compiled but must occur
 exactly once in a source file.
+\begin{sloppypar}
+The declarations include the virtual type definition and forward declarations
+of the virtual table instance, constructor, message function and
+The declarations include the virtual-type definition and forward declarations
+of the virtual-table instance, constructor, message function and
 @get_exception_vtable@. The definition includes the storage and initialization
 of the virtual table instance and the bodies of the three functions.
+\end{sloppypar}
+Monomorphic instances put all of these two groups in one place each.
+Polymorphic instances also split out the core declarations and definitions from
+the per-instance information. The virtual table type and most of the functions
+are polymorphic so they are all part of the core. The virtual table instance
+and the @get_exception_vtable@ function.
+\begin{sloppypar}
+Monomorphic instances put all of these two groups in one place.
+Polymorphic instances split out the core declarations and definitions from
+the per-instance information. The virtual-table type and most of the functions
+are polymorphic so they are all part of the core. The virtual-table instance
+and the @get_exception_vtable@ function \PAB{ are ...}.
 Coroutines and threads need instances of @CoroutineCancelled@ and
 @ThreadCancelled@ respectively to use all of their functionality. When a new
 data type is declared with @coroutine@ or @thread@ the forward declaration for
+data type is declared with @coroutine@ or @thread@, the forward declaration for
 the instance is created as well. The definition of the virtual table is created
 at the definition of the main function.
+\end{sloppypar}
+\PAB{You need an example here to show what happens for this case.}
 \subsection{Virtual Cast}
 …
 The function is
 \begin{cfa}
+void * __cfa__virtual_cast(
+        struct __cfa__parent_vtable const * parent,
+void * __cfa__virtual_cast( struct __cfa__parent_vtable const * parent,
         struct __cfa__parent_vtable const * const * child );
 \end{cfa}
+and it is implemented in the standard library. The structure reperents the
+head of a vtable which is the pointer to the parent virtual table. The
+@parent@ points directly at the parent type virtual table while the @child@
+points at the object of the (possibe) child type.
+In terms of the virtual cast expression, @parent@ comes from looking up the
+and it is implemented in the standard library. The structure represents the
+head of a virtual table, which is the pointer to the parent virtual table. The
+@parent@ points directly at the parent-type virtual-table, while the @child@
+points at the object of the (possible) child type.
+\PAB{Need a figure to show this relationship.}
+In terms of the virtual-cast expression, @parent@ comes from looking up the
 type being cast to and @child@ is the result of the expression being cast.
 Because the complier outputs C code, some type C type casts are also used.
 The last bit of glue is an map that saves every virtual type the compiler
 sees. This is used to check the type used in a virtual cast is a virtual
+Because the complier outputs C code, some C-type casts are also used.
+The last bit of glue is a map that saves every virtual type the compiler
+sees. This table is used to check the type used in a virtual cast is a virtual
 type and to get its virtual table.
 (It also checks for conflicting definitions.)
+Inside the function it is a simple conditional. If the type repersented by
+@parent@ is or is an ancestor of the type repersented by @*child@ (it
+requires one more level of derefence to pass through the object) then @child@
+\PAB{Can this be rolled into the figure above?}
+Inside the function is a simple conditional. If the type represented by
+@parent@ is an ancestor of the type represented by @*child@ (it
+requires one more level of dereference to pass through the object) then @child@
 is returned, otherwise the null pointer is returned.
 The check itself is preformed is a simple linear search. If the child
 virtual table or any of its ancestors (which are retreved through the first
 field of every virtual table) are the same as the parent virtual table then
+The check is a simple linear search (like \Cpp RTTI). If the child
+virtual table or any of its ancestors (which are retrieved through the first
+field of every virtual table) are the same as the parent virtual-table then
 the cast succeeds.
 …
 % resumption doesn't as well.
 % Many modern languages work with an interal stack that function push and pop
+% Many modern languages work with an internal stack that function push and pop
 % their local data to. Stack unwinding removes large sections of the stack,
 % often across functions.
 Stack unwinding is the process of removing stack frames (activations) from the
 stack. On function entry and return, unwinding is handled directly by the code
 embedded in the function. Usually, the stack-frame size is known statically
+stack. On function entry and return, unwinding is handled directly by the call/return code
+embedded in a function. Usually, the stack-frame size is known statically
 based on parameter and local variable declarations. For dynamically-sized
+local variables, a runtime computation is necessary to know the frame
+local variables.
+(Often called a variable-length array or VLA, even when the variable type is an aggregate.)
+For VLAs, a runtime computation is necessary to know the frame
 size. Finally, a function's frame-size may change during execution as local
 variables (static or dynamic sized) go in and out of scope.
+variables (static or dynamic sized) go in and out of scope, which is a form of VLA.
 Allocating/deallocating stack space is usually an $O(1)$ operation achieved by
 bumping the hardware stack-pointer up or down as needed.
 Unwinding across multiple stack frames is more complex because individual stack
 management code associated with each frame is bypassed. That is, the location
+Unwinding across multiple stack frames is more complex because individual stack-management
+code associated with each frame can be bypassed. That is, the location
 of a function's frame-management code is largely unknown and dispersed
 throughout the function, hence the current frame size managed by that code is
 …
 reseting to a snap-shot of an arbitrary but existing function frame on the
 stack. It is up to the programmer to ensure the snap-shot is valid when it is
+reset, making this unwinding approach fragile with potential errors that are
+reset and that unwound frames do not have side-effects.
+Hence, this unwinding approach is fragile with potential errors that are
 difficult to debug because the stack becomes corrupted.
+However, many languages define cleanup actions that must be taken when objects
+are deallocated from the stack or blocks end, such as running a variable's
+destructor or a @try@ statement's @finally@ clause. Handling these mechanisms
+With respect to stack side-effects, many languages define cleanup actions that must be taken when objects
+are deallocated from the stack, when the function of blocks within the function end, such as running a variable's
+destructor or a @try@ statement's @finally@ clause.
+The purpose of these side-effects is to reestablish the global state of the program, such as dynamic memory-allocation or file access.
+Handling these side-effect mechanisms
 requires walking the stack and checking each stack frame for these potential
 actions.
 For exceptions, it must be possible to walk the stack frames in search of @try@
+actions, where a frame can be any block with declarations.
+In languages like \Cpp and Java, it must be possible to walk the stack frames in search of @try@
 statements to match and execute a handler. For termination exceptions, it must
 also be possible to unwind all stack frames from the throw to the matching
 catch, and each of these frames must be checked for cleanup actions. Stack
+catch (including the @try@ block), and each of these frames must be checked for cleanup actions. Stack
 walking is where most of the complexity and expense of exception handling
 appears.
 …
 LSDA can contain any information but conventionally it is a table with entries
 representing regions of the function and what has to be done there during
 unwinding. These regions are bracketed by the instruction pointer. If the
+unwinding. These regions are bracketed by instruction addresses. If the
 instruction pointer is within a region's start/end, then execution is currently
 executing in that region. Regions are used to mark out the scopes of objects
 …
 The GCC compilation flag @-fexceptions@ causes the generation of an LSDA and
+attaches its personality function. However, this
+attaches its personality function.
+It attaches a series of opaque directives (@.cfi_personality@ directive)
+used internally and not part of this work.
+However, this
 flag only handles the cleanup attribute:
-\todo{Peter: What is attached? Andrew: It uses the .cfi\_personality directive
-and that's all I know.}
 \begin{cfa}
 void clean_up( int * var ) { ... }
 int avar __attribute__(( cleanup(clean_up) ));
 \end{cfa}
+which is used on a variable and specifies a function, in this case @clean_up@,
+run when the variable goes out of scope.
+The function is passed a pointer to the object being removed from the stack
+so it can be used to mimic destructors.
+that is used on a variable and specifies a function, in this case @clean_up@,
+run when the variable goes out of scope, which is used to mimic destructors.
 However, this feature cannot be used to mimic @try@ statements as it cannot
 control the unwinding.
 …
 section covers some of the important parts of the interface.
 A personality function can preform different actions depending on how it is
+A personality function can perform different actions depending on how it is
 called.
 \begin{lstlisting}[language=C,{moredelim=**[is][\color{red}]{@}{@}}]
 …
 \end{lstlisting}
 The @action@ argument is a bitmask of possible actions:
 \begin{enumerate}
+\begin{enumerate}[topsep=5pt]
 \item
 @_UA_SEARCH_PHASE@ specifies a search phase and tells the personality function
 …
 @_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs
 the cleanup phase and uses a different means to decide when to stop
 (see \vref{s:ForcedUnwind}).
+(see Section~\vref{s:ForcedUnwind}).
 \end{enumerate}
 The @exception_class@ argument is a copy of the
 \lstinline[language=C]|exception|'s @exception_class@ field.
+\PAB{Say more.}
 The \lstinline[language=C]|exception| argument is a pointer to the user
 provided storage object. It has two public fields, the exception class, which
 is actually just a number, identifying the exception handling mechanism that
+is just a number, identifying the exception handling mechanism that
 created it, and the cleanup function. The cleanup function is called if
 required by the exception.
 …
 that can be passed several places in libunwind. It includes a number of
 messages for special cases (some of which should never be used by the
 personality function) and error codes but unless otherwise noted the
+personality function) and error codes. However, unless otherwise noted, the
 personality function should always return @_URC_CONTINUE_UNWIND@.
 …
 @_URC_END_OF_STACK@.
 Second, when a handler is matched, raise exception continues onto the cleanup
+Second, when a handler is matched, raise exception walks the stack again performing the cleanup
 phase.
 Once again, it calls the personality functions of each stack frame from newest
 …
 Forced Unwind is the other central function in libunwind.
 \begin{cfa}
 _Unwind_Reason_Code _Unwind_ForcedUnwind( _Unwind_Exception *,
+_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *,
         _Unwind_Stop_Fn, void *);
 \end{cfa}
 …
 Each stack must have its own exception context. In a sequential \CFA program,
 there is only one stack with a single global exception-context. However, when
 the library @libcfathread@ is linked, there are multiple stacks where each
+the library @libcfathread@ is linked, there are multiple stacks, where each
 needs its own exception context.
+General access to the exception context is provided by function
 @this_exception_context@. For sequential execution, this function is defined as
+The function @this_exception_context@ provides general access to the exception context.
+For sequential execution, this function is defined as
 a weak symbol in the \CFA system-library, @libcfa@. When a \CFA program is
 concurrent, it links with @libcfathread@, where this function is defined with a
 …
 The sequential @this_exception_context@ returns a hard-coded pointer to the
 global execption context.
+global exception context.
 The concurrent version adds the exception context to the data stored at the
 base of each stack. When @this_exception_context@ is called it retrieves the
+base of each stack. When @this_exception_context@ is called, it retrieves the
 active stack and returns the address of the context saved there.
 …
 % catches. Talk about GCC nested functions.
 Termination exceptions use libunwind heavily because it matches the intended
 use from \Cpp exceptions closely. The main complication for \CFA is that the
+Termination exceptions use libunwind heavily because \CFA termination exceptions match
+\Cpp exceptions closely. The main complication for \CFA is that the
 compiler generates C code, making it very difficult to generate the assembly to
 form the LSDA for try blocks or destructors.
 …
 The first step of a termination raise is to copy the exception into memory
 managed by the exception system. Currently, the system uses @malloc@, rather
 than reserved memory or the stack top. The exception handling mechanism manages
+than reserved memory or the stack top. The exception-handling mechanism manages
 memory for the exception as well as memory for libunwind and the system's own
 per-exception storage.
+[Quick ASCII diagram to get started.]
+\begin{figure}
 \begin{verbatim}
 Fixed Header  | _Unwind_Exception   <- pointer target
 …
               V ...
 \end{verbatim}
+Exceptions are stored in variable-sized blocks.
+The first component is a fixed sized data structure that contains the
+\caption{Exception Layout}
+\label{f:ExceptionLayout}
+\end{figure}
+Exceptions are stored in variable-sized blocks (see Figure~\vref{f:ExceptionLayout}).
+The first component is a fixed-sized data-structure that contains the
 information for libunwind and the exception system. The second component is an
 area of memory big enough to store the exception. Macros with pointer arthritic
 …
 @_Unwind_Exception@ to the entire node.
+All of these nodes are linked together in a list, one list per stack, with the
+Multiple exceptions can exist because handlers can call functions that raise
+exceptions.  Figure~\vref{f:MultipleExceptions} shows a \Cpp program where
+exceptions are handled, and then a function is called from the handler that
+raises a new exception. The previous exception must persist because it is
+unhandled, and hence, control can return to the handler and that exception is
+reraised.
+\begin{figure}
+\centering
+\newsavebox{\myboxA}
+\newsavebox{\myboxB}
+\begin{lrbox}{\myboxA}
+\begin{lstlisting}[language=C++,{moredelim=**[is][\color{red}]{@}{@}}]
+struct E {};
+int cnt = 3;
+void f( int i ) {
+        if ( i == 0 ) @throw E();@
+        try {
+                @f( i - 1 );@
+        } catch( E ) { // handler h
+                cnt -= 1;
+                if ( cnt > 0 ) @f( 2 );@
+        }
+}
+int main() { @f( 2 );@ }
+\end{lstlisting}
+\end{lrbox}
+\begin{lrbox}{\myboxB}
+\begin{lstlisting}
+h  $\makebox[0pt][l]{\textbackslash}f$
+   f
+   f
+h  $\makebox[0pt][l]{\textbackslash}f$  throw E$\(_2\)$
+   f
+   f
+h  $\makebox[0pt][l]{\textbackslash}f$  throw E$\(_1\)$
+   f
+   f
+\end{lstlisting}
+\end{lrbox}
+{\usebox\myboxA}
+\hspace{25pt}
+{\usebox\myboxB}
+\caption{Multiple Exceptions}
+\label{f:MultipleExceptions}
+\end{figure}
+In this case, the exception nodes are linked together in a list, one list per stack, with the
 list head stored in the exception context. Within each linked list, the most
 recently thrown exception is at the head followed by older thrown
 …
 exception, the copy function, and the free function, so they are specific to an
 exception type. The size and copy function are used immediately to copy an
 exception into managed memory. After the exception is handled the free function
+exception into managed memory. After the exception is handled, the free function
 is used to clean up the exception and then the entire node is passed to free
 so the memory can be given back to the heap.
 …
 \subsection{Try Statements and Catch Clauses}
 The try statement with termination handlers is complex because it must
 compensate for the lack of assembly-code generated from \CFA. Libunwind
+compensate for the lack of assembly code generated from \CFA. Libunwind
 requires an LSDA and personality function for control to unwind across a
 function. The LSDA in particular is hard to mimic in generated C code.
 …
 calls them.
 Because this function is known and fixed (and not an arbitrary function that
 happens to contain a try statement) this means the LSDA can be generated ahead
+happens to contain a try statement), this means the LSDA can be generated ahead
 of time.
 Both the LSDA and the personality function are set ahead of time using
 embedded assembly. This is handcrafted using C @asm@ statements and contains
 enough information for the single try statement the function repersents.
+embedded assembly. This assembly code is handcrafted using C @asm@ statements and contains
+enough information for the single try statement the function represents.
 The three functions passed to try terminate are:
 …
 nested functions and all other functions besides @__cfaehm_try_terminate@ in
 \CFA use the GCC personality function and the @-fexceptions@ flag to generate
+the LSDA. This allows destructors to be implemented with the cleanup attribute.
+the LSDA. Through this mechanism, \CFA destructors are implemented via the cleanup attribute.
+\PAB{Try to put together an example try statement illustrating these components.}
 \section{Resumption}
 % The stack-local data, the linked list of nodes.
+Resumption simple to implement because there is no stack unwinding. The
+Resumption is simpler to implement than termination because there is no stack
+unwinding.  \PAB{You need to explain how the \lstinline{catchResume} clauses are
+handled. Do you use the personality mechanism in libunwind or do you roll your
+own mechanism?}
+The
 resumption raise uses a list of nodes for its stack traversal. The head of the
 list is stored in the exception context. The nodes in the list have a pointer
 to the next node and a pointer to the handler function.
 A resumption raise traverses this list. At each node the handler function is
 called, passing the exception by pointer. It returns true if the exception is
 …
 the stack
 already examined, is accomplished by updating the front of the list as the
 search continues. Before the handler at a node is called the head of the list
+search continues. Before the handler at a node is called, the head of the list
 is updated to the next node of the current node. After the search is complete,
 successful or not, the head of the list is reset.
 …
 the other handler checked up to this point are not checked again.
 This structure also supports new handler added while the resumption is being
+This structure also supports new handlers added while the resumption is being
 handled. These are added to the front of the list, pointing back along the
 stack -- the first one points over all the checked handlers -- and the ordering
 is maintained.
+\PAB{Again, a figure to show how this works would be helpful.}
 \label{p:zero-cost}
 …
 % that unwind is required knowledge for that chapter.
+\PAB{This paragraph needs to be moved to the start of this Section, where I have have my other comment.}
 \section{Finally}
 % Uses destructors and GCC nested functions.
 Finally clauses is placed into a GCC nested-function with a unique name, and no
+A finally clause is placed into a GCC nested-function with a unique mangled name, and no
 arguments or return values. This nested function is then set as the cleanup
 function of an empty object that is declared at the beginning of a block placed
 around the context of the associated @try@ statement.
 The rest is handled by GCC. The try block and all handlers are inside the
+around the context of an associated @try@ statement.
+The rest is handled by GCC. The try block and all handlers are inside this
 block. At completion, control exits the block and the empty object is cleaned
 up, which runs the function that contains the finally code.
 …
 Cancellation also uses libunwind to do its stack traversal and unwinding,
 however it uses a different primary function @_Unwind_ForcedUnwind@. Details
 of its interface can be found in the \vref{s:ForcedUnwind}.
+however it uses a different primary function, @_Unwind_ForcedUnwind@. Details
+of its interface can be found in Section~\vref{s:ForcedUnwind}.
 The first step of cancellation is to find the cancelled stack and its type:
+coroutine or thread. Fortunately, the thread library stores the main thread
+pointer and the current thread pointer, and every thread stores a pointer to
+its main coroutine and the coroutine it is currently executing.
+So if the active thread's main and current coroutine are the same. If they
+are then the current stack is a thread stack, otherwise it is a coroutine
+stack. If it is a thread stack then an equality check with the stored main
+thread pointer and current thread pointer is enough to tell if the current
+thread is the main thread or not.
+coroutine or thread. Fortunately, the thread library stores the program-main thread
+pointer and the current-thread pointer, and every thread stores a pointer to
+the current coroutine it is executing.
+\PAB{I don't know if my corrections in the previous paragraph are correct.}
+When the active thread and coroutine are the same, the current stack is the thread stack, otherwise it is a coroutine
+stack.
+% PAB: repeated?
+% If it is a thread stack, then an equality check with the stored main
+% thread pointer and current thread pointer is enough to tell if the current
+% thread is the main thread or not.
 However, if the threading library is not linked, the sequential execution is on
 the main stack. Hence, the entire check is skipped because the weak-symbol
 …
 Regardless of how the stack is chosen, the stop function and parameter are
 passed to the forced-unwind function. The general pattern of all three stop
 functions is the same: they continue unwinding until the end of stack when they
+do there primary work.
+functions is the same: continue unwinding until the end of stack.
+%when they
+%do there primary work.
 For main stack cancellation, the transfer is just a program abort.
 For coroutine cancellation, the exception is stored on the coroutine's stack,
+For coroutine cancellation, the exception is stored in the coroutine's stack,
 and the coroutine context switches to its last resumer. The rest is handled on
 the backside of the resume, which check if the resumed coroutine is

doc/theses/andrew_beach_MMath/uw-ethesis.tex

r8cd34e9	r1716e1c
97	97	% cfa macros used in the document
98	98	\usepackage{cfalab}
	99	% allow global and individual modification of spacing
	100	\usepackage{enumitem}
99	101	% Improved reference tools.
100	102	\usepackage[nospace]{varioref}

libcfa/src/concurrency/invoke.h

-              r8cd34e9
+              r1716e1c
         struct __thread_desc_link {
                 struct $thread * next;
-                struct $thread * prev;
                 volatile unsigned long long ts;
-                unsigned preferred;
         };
 …
                 // context that is switch during a __cfactx_switch
                 struct __stack_context_t context;
+                // Link lists fields
+                // instrusive link field for threads
+                struct __thread_desc_link link;
                 // current execution status for coroutine
 …
                 struct cluster * curr_cluster;
+                // Link lists fields
+                // instrusive link field for threads
+                struct __thread_desc_link link;
+                // preferred ready-queue
+                unsigned preferred;
                 // coroutine body used to store context

libcfa/src/concurrency/kernel.cfa

-              r8cd34e9
+              r1716e1c
                 MAIN_LOOP:
                 for() {
+                        #if 1
                         // Check if there is pending io
                         __maybe_io_drain( this );
 …
+                        }
+                //      SEARCH: {
+                //              /* paranoid */ verify( ! __preemption_enabled() );
+                //              /* paranoid */ verify( kernelTLS().this_proc_id );
+                //              // First, lock the scheduler since we are searching for a thread
+                //              // Try to get the next thread
+                //              ready_schedule_lock();
+                //              readyThread = pop_fast( this->cltr );
+                //              ready_schedule_unlock();
+                //              if(readyThread) {  break SEARCH; }
+                //              // If we can't find a thread, might as well flush any outstanding I/O
+                //              if(this->io.pending) { __cfa_io_flush( this ); }
+                //              // Spin a little on I/O, just in case
+                //              for(25) {
+                //                      __maybe_io_drain( this );
+                //                      ready_schedule_lock();
+                //                      readyThread = pop_fast( this->cltr );
+                //                      ready_schedule_unlock();
+                //                      if(readyThread) {  break SEARCH; }
+                //              }
+                //              // no luck, try stealing a few times
+                //              for(25) {
+                //                      if( __maybe_io_drain( this ) ) {
+                //                              ready_schedule_lock();
+                //                              readyThread = pop_fast( this->cltr );
+                //                      } else {
+                //                              ready_schedule_lock();
+                //                              readyThread = pop_slow( this->cltr );
+                //                      }
+                //                      ready_schedule_unlock();
+                //                      if(readyThread) {  break SEARCH; }
+                //              }
+                //              // still no luck, search for a thread
+                //              ready_schedule_lock();
+                //              readyThread = pop_search( this->cltr );
+                //              ready_schedule_unlock();
+                //              if(readyThread) { break SEARCH; }
+                //              // Don't block if we are done
+                //              if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
+                //              __STATS( __tls_stats()->ready.sleep.halts++; )
+                //              // Push self to idle stack
+                //              mark_idle(this->cltr->procs, * this);
+                //              // Confirm the ready-queue is empty
+                //              __maybe_io_drain( this );
+                //              ready_schedule_lock();
+                //              readyThread = pop_search( this->cltr );
+                //              ready_schedule_unlock();
+                //              if( readyThread ) {
+                //                      // A thread was found, cancel the halt
+                //                      mark_awake(this->cltr->procs, * this);
+                //                      __STATS( __tls_stats()->ready.sleep.cancels++; )
+                //                      // continue the main loop
+                //                      break SEARCH;
+                //              }
+                //              __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->id, rdtscl()); )
+                //              __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle);
+                //              // __disable_interrupts_hard();
+                //              eventfd_t val;
+                //              eventfd_read( this->idle, &val );
+                //              // __enable_interrupts_hard();
+                //              __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->id, rdtscl()); )
+                //              // We were woken up, remove self from idle
+                //              mark_awake(this->cltr->procs, * this);
+                //              // DON'T just proceed, start looking again
+                //              continue MAIN_LOOP;
+                //      }
+                // RUN_THREAD:
+                //      /* paranoid */ verify( kernelTLS().this_proc_id );
+                //      /* paranoid */ verify( ! __preemption_enabled() );
+                //      /* paranoid */ verify( readyThread );
+                //      // Reset io dirty bit
+                //      this->io.dirty = false;
+                //      // We found a thread run it
+                //      __run_thread(this, readyThread);
+                //      // Are we done?
+                //      if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
+                //      #if !defined(__CFA_NO_STATISTICS__)
+                //              unsigned long long curr = rdtscl();
+                //              if(curr > (last_tally + 500000000)) {
+                //                      __tally_stats(this->cltr->stats, __cfaabi_tls.this_stats);
+                //                      last_tally = curr;
+                //              }
+                //      #endif
+                //      if(this->io.pending && !this->io.dirty) {
+                //              __cfa_io_flush( this );
+                //      }
+                //      // Check if there is pending io
+                //      __maybe_io_drain( this );
+                        #else
+                        SEARCH: {
+                                /* paranoid */ verify( ! __preemption_enabled() );
+                                /* paranoid */ verify( kernelTLS().this_proc_id );
+                                // First, lock the scheduler since we are searching for a thread
+                                ready_schedule_lock();
+                                // Try to get the next thread
+                                readyThread = pop_fast( this->cltr );
+                                if(readyThread) { ready_schedule_unlock(); break SEARCH; }
+                                // If we can't find a thread, might as well flush any outstanding I/O
+                                if(this->io.pending) { __cfa_io_flush( this ); }
+                                // Spin a little on I/O, just in case
+                                for(25) {
+                                        __maybe_io_drain( this );
+                                        readyThread = pop_fast( this->cltr );
+                                        if(readyThread) { ready_schedule_unlock(); break SEARCH; }
+                                }
+                                // no luck, try stealing a few times
+                                for(25) {
+                                        if( __maybe_io_drain( this ) ) {
+                                                readyThread = pop_fast( this->cltr );
+                                        } else {
+                                                readyThread = pop_slow( this->cltr );
+                                        }
+                                        if(readyThread) { ready_schedule_unlock(); break SEARCH; }
+                                }
+                                // still no luck, search for a thread
+                                readyThread = pop_search( this->cltr );
+                                if(readyThread) { ready_schedule_unlock(); break SEARCH; }
+                                // Don't block if we are done
+                                if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
+                                __STATS( __tls_stats()->ready.sleep.halts++; )
+                                // Push self to idle stack
+                                ready_schedule_unlock();
+                                mark_idle(this->cltr->procs, * this);
+                                ready_schedule_lock();
+                                // Confirm the ready-queue is empty
+                                __maybe_io_drain( this );
+                                readyThread = pop_search( this->cltr );
+                                ready_schedule_unlock();
+                                if( readyThread ) {
+                                        // A thread was found, cancel the halt
+                                        mark_awake(this->cltr->procs, * this);
+                                        __STATS( __tls_stats()->ready.sleep.cancels++; )
+                                        // continue the main loop
+                                        break SEARCH;
+                                }
+                                __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->id, rdtscl()); )
+                                __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle);
+                                // __disable_interrupts_hard();
+                                eventfd_t val;
+                                eventfd_read( this->idle, &val );
+                                // __enable_interrupts_hard();
+                                __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->id, rdtscl()); )
+                                // We were woken up, remove self from idle
+                                mark_awake(this->cltr->procs, * this);
+                                // DON'T just proceed, start looking again
+                                continue MAIN_LOOP;
+                        }
+                RUN_THREAD:
+                        /* paranoid */ verify( kernelTLS().this_proc_id );
+                        /* paranoid */ verify( ! __preemption_enabled() );
+                        /* paranoid */ verify( readyThread );
+                        // Reset io dirty bit
+                        this->io.dirty = false;
+                        // We found a thread run it
+                        __run_thread(this, readyThread);
+                        // Are we done?
+                        if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
+                        #if !defined(__CFA_NO_STATISTICS__)
+                                unsigned long long curr = rdtscl();
+                                if(curr > (last_tally + 500000000)) {
+                                        __tally_stats(this->cltr->stats, __cfaabi_tls.this_stats);
+                                        last_tally = curr;
+                                }
+                        #endif
+                        if(this->io.pending && !this->io.dirty) {
+                                __cfa_io_flush( this );
+                        }
+                        ready_schedule_lock();
+                        __maybe_io_drain( this );
+                        ready_schedule_unlock();
+                        #endif
+                }

libcfa/src/concurrency/kernel/startup.cfa

-              r8cd34e9
+              r1716e1c
         self_mon_p = &self_mon;
         link.next = 0p;
         link.prev = 0p;
         link.preferred = -1u;
+        link.ts   = 0;
+        preferred = -1u;
         last_proc = 0p;
         #if defined( __CFA_WITH_VERIFY__ )

libcfa/src/concurrency/ready_queue.cfa

-              r8cd34e9
+              r1716e1c
 // #define __CFA_DEBUG_PRINT_READY_QUEUE__
-// #define USE_MPSC
 #define USE_RELAXED_FIFO
 …
                 /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count );
-                // write timestamp
-                thrd->link.ts = rdtscl();
                 bool local;
                 int preferred = external ? -1 : kernelTLS().this_processor->rdq.id;
 …
                         #endif
-                #if defined(USE_MPSC)
-                        // mpsc always succeeds
-                } while( false );
-                #else
                         // If we can't lock it retry
                 } while( !__atomic_try_acquire( &lanes.data[i].lock ) );
-                #endif
                 // Actually push it
                 push(lanes.data[i], thrd);
-                #if !defined(USE_MPSC)
                         // Unlock and return
                         __atomic_unlock( &lanes.data[i].lock );
-                #endif
                 // Mark the current index in the tls rng instance as having an item
 …
                 __cfadbg_print_safe(ready_queue, "Kernel : Pushing %p on cluster %p\n", thrd, cltr);
+                // #define USE_PREFERRED
+                #if !defined(USE_PREFERRED)
                 const bool external = (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr);
                 /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count );
+                // write timestamp
+                thrd->link.ts = rdtscl();
+                #else
+                        unsigned preferred = thrd->preferred;
+                        const bool external = (!kernelTLS().this_processor) || preferred == -1u || thrd->curr_cluster != cltr;
+                        /* paranoid */ verifyf(external || preferred < lanes.count, "Invalid preferred queue %u for %u lanes", preferred, lanes.count );
+                        unsigned r = preferred % READYQ_SHARD_FACTOR;
+                        const unsigned start = preferred - r;
+                #endif
                 // Try to pick a lane and lock it
 …
+                        }
                         else {
+                                #if !defined(USE_PREFERRED)
                                 processor * proc = kernelTLS().this_processor;
                                 unsigned r = proc->rdq.its++;
                                 i =  proc->rdq.id + (r % READYQ_SHARD_FACTOR);
+                #else
+                                        i = start + (r++ % READYQ_SHARD_FACTOR);
+                                #endif
+                        }
-                #if defined(USE_MPSC)
-                        // mpsc always succeeds
-                } while( false );
-                #else
                         // If we can't lock it retry
                 } while( !__atomic_try_acquire( &lanes.data[i].lock ) );
-                #endif
                 // Actually push it
                 push(lanes.data[i], thrd);
-                #if !defined(USE_MPSC)
                         // Unlock and return
                         __atomic_unlock( &lanes.data[i].lock );
-                #endif
                 #if !defined(__CFA_NO_STATISTICS__)
 …
                 lanes.tscs[w].tv = thrd->link.ts;
         #endif
+        thrd->preferred = w;
         // return the popped thread
 …
 // Check that all the intrusive queues in the data structure are still consistent
 static void check( __ready_queue_t & q ) with (q) {
         #if defined(__CFA_WITH_VERIFY__) && !defined(USE_MPSC)
+        #if defined(__CFA_WITH_VERIFY__)
+                {
                         for( idx ; lanes.count ) {
 …
                                 assert(!lanes.data[idx].lock);
+                                assert(head(sl)->link.prev == 0p );
+                                assert(head(sl)->link.next->link.prev == head(sl) );
+                                assert(tail(sl)->link.next == 0p );
+                                assert(tail(sl)->link.prev->link.next == tail(sl) );
+                                if(is_empty(sl)) {
+                                        assert(tail(sl)->link.prev == head(sl));
+                                        assert(head(sl)->link.next == tail(sl));
+                                } else {
+                                        assert(tail(sl)->link.prev != head(sl));
+                                        assert(head(sl)->link.next != tail(sl));
+                                }
+                                        if(is_empty(sl)) {
+                                                assert( sl.anchor.next == 0p );
+                                                assert( sl.anchor.ts   == 0  );
+                                                assert( mock_head(sl)  == sl.prev );
+                                        } else {
+                                                assert( sl.anchor.next != 0p );
+                                                assert( sl.anchor.ts   != 0  );
+                                                assert( mock_head(sl)  != sl.prev );
+                                        }
+                        }
+                }
 …
 // fixes the list so that the pointers back to anchors aren't left dangling
 static inline void fix(__intrusive_lane_t & ll) {
+        #if !defined(USE_MPSC)
+                // if the list is not empty then follow he pointer and fix its reverse
+                if(!is_empty(ll)) {
+                        head(ll)->link.next->link.prev = head(ll);
+                        tail(ll)->link.prev->link.next = tail(ll);
+                }
+                // Otherwise just reset the list
+                else {
+                        verify(tail(ll)->link.next == 0p);
+                        tail(ll)->link.prev = head(ll);
+                        head(ll)->link.next = tail(ll);
+                        verify(head(ll)->link.prev == 0p);
+                }
+        #endif
+                        if(is_empty(ll)) {
+                                verify(ll.anchor.next == 0p);
+                                ll.prev = mock_head(ll);
+                        }
+}

libcfa/src/concurrency/ready_subqueue.hfa

-              r8cd34e9
+              r1716e1c
 // Intrusives lanes which are used by the relaxed ready queue
 struct __attribute__((aligned(128))) __intrusive_lane_t {
+        #if defined(USE_MPSC)
+                mpsc_queue($thread) queue;
+                __attribute__((aligned(128)))
+        #else
+                // anchor for the head and the tail of the queue
+                __attribute__((aligned(128))) struct __sentinel_t {
+                        // Link lists fields
+                        // instrusive link field for threads
+                        // must be exactly as in $thread
+                        __thread_desc_link link;
+                } before, after;
+        #endif
+        struct $thread * prev;
         // spin lock protecting the queue
         volatile bool lock;
+        // Optional statistic counters
+        #if !defined(__CFA_NO_SCHED_STATS__)
+                struct __attribute__((aligned(64))) {
+                        // difference between number of push and pops
+                        ssize_t diff;
+                        // total number of pushes and pops
+                        size_t  push;
+                        size_t  pop ;
+                } stat;
+        #endif
+        __thread_desc_link anchor;
 };
-void  ?{}(__intrusive_lane_t & this);
-void ^?{}(__intrusive_lane_t & this);
 // Get the head pointer (one before the first element) from the anchor
+static inline $thread * head(const __intrusive_lane_t & this) {
+        #if defined(USE_MPSC)
+                return this.queue.head;
+        #else
+                $thread * rhead = ($thread *)(
+                        (uintptr_t)( &this.before ) - offsetof( $thread, link )
+                );
+                /* paranoid */ verify(rhead);
+                return rhead;
+        #endif
+}
+// Get the tail pointer (one after the last element) from the anchor
+static inline $thread * tail(const __intrusive_lane_t & this) {
+        #if defined(USE_MPSC)
+                return this.queue.tail;
+        #else
+                $thread * rtail = ($thread *)(
+                        (uintptr_t)( &this.after ) - offsetof( $thread, link )
+                );
+                /* paranoid */ verify(rtail);
+                return rtail;
+        #endif
+static inline $thread * mock_head(const __intrusive_lane_t & this) {
+        $thread * rhead = ($thread *)(
+                (uintptr_t)( &this.anchor ) - __builtin_offsetof( $thread, link )
+        );
+        return rhead;
+}
 …
 void ?{}( __intrusive_lane_t & this ) {
         this.lock = false;
+        this.prev = mock_head(this);
+        this.anchor.next = 0p;
+        this.anchor.ts   = 0;
+        #if !defined(USE_MPSC)
+                this.before.link.prev = 0p;
+                this.before.link.next = tail(this);
+                this.before.link.ts   = 0;
+                this.after .link.prev = head(this);
+                this.after .link.next = 0p;
+                this.after .link.ts   = 0;
+                #if !defined(__CFA_NO_SCHED_STATS__)
+                        this.stat.diff = 0;
+                        this.stat.push = 0;
+                        this.stat.pop  = 0;
+                #endif
+                // We add a boat-load of assertions here because the anchor code is very fragile
+                /* paranoid */ verify(((uintptr_t)( head(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.before));
+                /* paranoid */ verify(((uintptr_t)( tail(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.after ));
+                /* paranoid */ verify(head(this)->link.prev == 0p );
+                /* paranoid */ verify(head(this)->link.next == tail(this) );
+                /* paranoid */ verify(tail(this)->link.next == 0p );
+                /* paranoid */ verify(tail(this)->link.prev == head(this) );
+                /* paranoid */ verify(&head(this)->link.prev == &this.before.link.prev );
+                /* paranoid */ verify(&head(this)->link.next == &this.before.link.next );
+                /* paranoid */ verify(&tail(this)->link.prev == &this.after .link.prev );
+                /* paranoid */ verify(&tail(this)->link.next == &this.after .link.next );
+                /* paranoid */ verify(__alignof__(__intrusive_lane_t) == 128);
+                /* paranoid */ verify(__alignof__(this) == 128);
+                /* paranoid */ verifyf(((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128));
+        #endif
+        // We add a boat-load of assertions here because the anchor code is very fragile
+        /* paranoid */ verify( offsetof( $thread, link ) == offsetof(__intrusive_lane_t, anchor) );
+        /* paranoid */ verify( ((uintptr_t)( mock_head(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.anchor) );
+        /* paranoid */ verify( &mock_head(this)->link.next == &this.anchor.next );
+        /* paranoid */ verify( &mock_head(this)->link.ts   == &this.anchor.ts   );
+        /* paranoid */ verify( mock_head(this)->link.next == 0p );
+        /* paranoid */ verify( mock_head(this)->link.ts   == 0  );
+        /* paranoid */ verify( mock_head(this) == this.prev );
+        /* paranoid */ verify( __alignof__(__intrusive_lane_t) == 128 );
+        /* paranoid */ verify( __alignof__(this) == 128 );
+        /* paranoid */ verifyf( ((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128) );
+}
 // Dtor is trivial
 void ^?{}( __intrusive_lane_t & this ) {
+        #if !defined(USE_MPSC)
+                // Make sure the list is empty
+                /* paranoid */ verify(head(this)->link.prev == 0p );
+                /* paranoid */ verify(head(this)->link.next == tail(this) );
+                /* paranoid */ verify(tail(this)->link.next == 0p );
+                /* paranoid */ verify(tail(this)->link.prev == head(this) );
+        #endif
+        // Make sure the list is empty
+        /* paranoid */ verify( this.anchor.next == 0p );
+        /* paranoid */ verify( this.anchor.ts   == 0  );
+        /* paranoid */ verify( mock_head(this)  == this.prev );
+}
 // Push a thread onto this lane
 // returns true of lane was empty before push, false otherwise
 bool push(__intrusive_lane_t & this, $thread * node) {
         #if defined(USE_MPSC)
                 inline $thread * volatile & ?`next ( $thread * this )  __attribute__((const)) {
                         return this->link.next;
+                }
                 push(this.queue, node);
         #else
                 #if defined(__CFA_WITH_VERIFY__)
                         /* paranoid */ verify(this.lock);
                         /* paranoid */ verify(node->link.ts != 0);
                         /* paranoid */ verify(node->link.next == 0p);
                         /* paranoid */ verify(node->link.prev == 0p);
                         /* paranoid */ verify(tail(this)->link.next == 0p);
                         /* paranoid */ verify(head(this)->link.prev == 0p);
+void push( __intrusive_lane_t & this, $thread * node ) {
+        /* paranoid */ verify( node->link.next == 0p );
+        /* paranoid */ verify( node->link.ts   == 0  );
+        /* paranoid */ verify( this.prev->link.next == 0p );
+        /* paranoid */ verify( this.prev->link.ts   == 0  );
+        if( this.anchor.next == 0p ) {
+                /* paranoid */ verify( this.anchor.next == 0p );
+                /* paranoid */ verify( this.anchor.ts   == 0  );
+                /* paranoid */ verify( this.prev == mock_head( this ) );
+        } else {
+                /* paranoid */ verify( this.anchor.next != 0p );
+                /* paranoid */ verify( this.anchor.ts   != 0  );
+                /* paranoid */ verify( this.prev != mock_head( this ) );
+        }
+                        if(this.before.link.ts == 0l) {
+                                /* paranoid */ verify(tail(this)->link.prev == head(this));
+                                /* paranoid */ verify(head(this)->link.next == tail(this));
+                        } else {
+                                /* paranoid */ verify(tail(this)->link.prev != head(this));
+                                /* paranoid */ verify(head(this)->link.next != tail(this));
+                        }
+                #endif
+                // Get the relevant nodes locally
+                $thread * tail = tail(this);
+                $thread * prev = tail->link.prev;
+                // Do the push
+                node->link.next = tail;
+                node->link.prev = prev;
+                prev->link.next = node;
+                tail->link.prev = node;
+                // Update stats
+                #if !defined(__CFA_NO_SCHED_STATS__)
+                        this.stat.diff++;
+                        this.stat.push++;
+                #endif
+                verify(node->link.next == tail(this));
+                // Check if the queue used to be empty
+                if(this.before.link.ts == 0l) {
+                        this.before.link.ts = node->link.ts;
+                        /* paranoid */ verify(node->link.prev == head(this));
+                        return true;
+                }
+                return false;
+        #endif
+        // Get the relevant nodes locally
+        this.prev->link.next = node;
+        this.prev->link.ts   = rdtscl();
+        this.prev = node;
+}
 …
 // returns popped
 // returns true of lane was empty before push, false otherwise
+$thread * pop(__intrusive_lane_t & this) {
+        /* paranoid */ verify(this.lock);
+        #if defined(USE_MPSC)
+                inline $thread * volatile & ?`next ( $thread * this )  __attribute__((const)) {
+                        return this->link.next;
+                }
+                return pop(this.queue);
+        #else
+                /* paranoid */ verify(this.before.link.ts != 0ul);
+$thread * pop( __intrusive_lane_t & this ) {
+        /* paranoid */ verify( this.anchor.next != 0p );
+        /* paranoid */ verify( this.anchor.ts   != 0  );
+                // Get anchors locally
+                $thread * head = head(this);
+                $thread * tail = tail(this);
+        // Get the relevant nodes locally
+        $thread * node = this.anchor.next;
+        this.anchor.next = node->link.next;
+        this.anchor.ts   = node->link.ts;
+        bool is_empty = this.anchor.ts == 0;
+        node->link.next = 0p;
+        node->link.ts   = 0;
+                // Get the relevant nodes locally
+                $thread * node = head->link.next;
+                $thread * next = node->link.next;
+        // Update head time stamp
+        if(is_empty) this.prev = mock_head( this );
+                /* paranoid */ verify(node != tail);
+                /* paranoid */ verify(node);
+                // Do the pop
+                head->link.next = next;
+                next->link.prev = head;
+                node->link.next = 0p;
+                node->link.prev = 0p;
+                // Update head time stamp
+                this.before.link.ts = next->link.ts;
+                // Update stats
+                #ifndef __CFA_NO_SCHED_STATS__
+                        this.stat.diff--;
+                        this.stat.pop ++;
+                #endif
+                // Check if we emptied list and return accordingly
+                /* paranoid */ verify(tail(this)->link.next == 0p);
+                /* paranoid */ verify(head(this)->link.prev == 0p);
+                if(next == tail) {
+                        /* paranoid */ verify(this.before.link.ts == 0);
+                        /* paranoid */ verify(tail(this)->link.prev == head(this));
+                        /* paranoid */ verify(head(this)->link.next == tail(this));
+                        return node;
+                }
+                else {
+                        /* paranoid */ verify(next->link.ts != 0);
+                        /* paranoid */ verify(tail(this)->link.prev != head(this));
+                        /* paranoid */ verify(head(this)->link.next != tail(this));
+                        /* paranoid */ verify(this.before.link.ts != 0);
+                        return node;
+                }
+        #endif
+        /* paranoid */ verify( node->link.next == 0p );
+        /* paranoid */ verify( node->link.ts   == 0  );
+        return node;
+}
 // Check whether or not list is empty
 static inline bool is_empty(__intrusive_lane_t & this) {
+        #if defined(USE_MPSC)
+                return this.queue.head == 0p;
+        #else
+                // Cannot verify here since it may not be locked
+                return this.before.link.ts == 0;
+        #endif
+        return this.anchor.ts == 0;
+}
 // Return the timestamp
 static inline unsigned long long ts(__intrusive_lane_t & this) {
+        #if defined(USE_MPSC)
+                $thread * tl = this.queue.head;
+                if(!tl) return -1ull;
+                return tl->link.ts;
+        #else
+                // Cannot verify here since it may not be locked
+                return this.before.link.ts;
+        #endif
+        // Cannot verify here since it may not be locked
+        return this.anchor.ts;
+}

libcfa/src/concurrency/thread.cfa

-              r8cd34e9
+              r1716e1c
         curr_cluster = &cl;
         link.next = 0p;
         link.prev = 0p;
         link.preferred = -1u;
+        link.ts   = 0;
+        preferred = -1u;
         last_proc = 0p;
         #if defined( __CFA_WITH_VERIFY__ )

libcfa/src/containers/array.hfa

-              r8cd34e9
+              r1716e1c
     };
+    Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, ptrdiff_t i ) {
+    // About the choice of integral types offered as subscript overloads:
+    // Intent is to cover these use cases:
+    //    float foo( ptrdiff_t i ) { return a[i]; }           // i : ptrdiff_t
+    //    forall( [N] ) ... for( i; N ) { total += a[i]; }    // i : typeof( sizeof(42) )
+    //    for( i; 5 ) { total += a[i]; }                      // i : int
+    // It gets complicated by:
+    // -  CFA does overloading on concrete types, like int and unsigned int, not on typedefed
+    //    types like size_t.  So trying to overload on ptrdiff_t vs int works in 64-bit mode
+    //    but not in 32-bit mode.
+    // -  Given bug of Trac #247, CFA gives sizeof expressions type unsigned long int, when it
+    //    should give them type size_t.
+    //
+    //                          gcc -m32         cfa -m32 given bug         gcc -m64
+    // ptrdiff_t                int              int                        long int
+    // size_t                   unsigned int     unsigned int               unsigned long int
+    // typeof( sizeof(42) )     unsigned int     unsigned long int          unsigned long int
+    // int                      int              int                        int
+    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) {
         return (Timmed &) a.strides[i];
+    }
     Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) {
+    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned int i ) {
         return (Timmed &) a.strides[i];
+    }
     Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, size_t i ) {
+    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, long int i ) {
         return (Timmed &) a.strides[i];
+    }
+    size_t ?`len( arpk(N, S, Timmed, Tbase) & a ) {
+    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned long int i ) {
+        return (Timmed &) a.strides[i];
+    }
+    static inline size_t ?`len( arpk(N, S, Timmed, Tbase) & a ) {
         return z(N);
+    }
     // workaround #226 (and array relevance thereof demonstrated in mike102/otype-slow-ndims.cfa)
     void ?{}( arpk(N, S, Timmed, Tbase) & this ) {
+    static inline void ?{}( arpk(N, S, Timmed, Tbase) & this ) {
         void ?{}( S (&inner)[z(N)] ) {}
         ?{}(this.strides);
+    }
     void ^?{}( arpk(N, S, Timmed, Tbase) & this ) {
+    static inline void ^?{}( arpk(N, S, Timmed, Tbase) & this ) {
         void ^?{}( S (&inner)[z(N)] ) {}
         ^?{}(this.strides);
 …
 forall( Te )
 Te mkar_( tag(Te) ) {}
+static inline Te mkar_( tag(Te) ) {}
 forall( [N], ZTags ... , Trslt &, Tatom & | { Trslt mkar_( tag(Tatom), ZTags ); } )
 arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {}
+static inline arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {}
 // based on https://stackoverflow.com/questions/1872220/is-it-possible-to-iterate-over-arguments-in-variadic-macros
 …
 forall( TA &, TB &, TC &, IxAB, IxBC ... | { TB & ?[?]( TA &, IxAB ); TC & ?[?]( TB &, IxBC ); } )
 TC & ?[?]( TA & this, IxAB ab, IxBC bc ) {
+static inline TC & ?[?]( TA & this, IxAB ab, IxBC bc ) {
     return this[ab][bc];
+}
 …
 forall( TA &, TB &, TC &, IxAB_0, IxBC | { TB & ?[?]( TA &, IxAB_0 ); TC & ?[?]( TB &, IxBC ); } )
 TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) {
+static inline TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) {
     return this[ab][bc];
+}
 forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1 ); TC & ?[?]( TB &, IxBC ); } )
 TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) {
+static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) {
     return this[[ab0,ab1]][bc];
+}
 forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxAB_2, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1, IxAB_2 ); TC & ?[?]( TB &, IxBC ); } )
 TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) {
+static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) {
     return this[[ab0,ab1,ab2]][bc];
+}
 …
 // Base
 forall( [Nq], [Sq], Tbase & )
 tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {}
+static inline tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {}
 // Rec
 forall( [Nq], [Sq], [N], [S], recq &, recr &, Tbase & | { tag(recr) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(recq) ); } )
 tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {}
+static inline tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {}
 // Wrapper
 struct all_t {} all;
 forall( [N], [S], Te &, result &, Tbase & | { tag(result) enq_( tag(Tbase), tag(N), tag(S), tag(Te) ); } )
 result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) {
+static inline result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) {
     return (result&) this;
+}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: