Changeset 1716e1c


Ignore:
Timestamp:
May 4, 2021, 12:31:26 PM (4 years ago)
Author:
Andrew Beach <ajbeach@…>
Branches:
ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
0c4df43
Parents:
8cd34e9 (diff), c0c940a (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into andrew-mmath, implementation updates.

Files:
10 edited
2 moved

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/cfalab.sty

    r8cd34e9 r1716e1c  
    123123  numberstyle=\footnotesize\sf,
    124124  % Replace/adjust listing characters that look bad in sanserif.
    125   literate={-}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.75ex}{0.1ex}}}}1
     125  literate={-}{\makebox[1ex][c]{\raisebox{0.7ex}{\rule{0.75ex}{0.1ex}}}}1
    126126    {^}{\raisebox{0.6ex}{$\scriptscriptstyle\land\,$}}1
    127127    {~}{\raisebox{0.3ex}{$\scriptstyle\sim\,$}}1 {`}{\ttfamily\upshape\hspace*{-0.1ex}`}1
  • doc/theses/andrew_beach_MMath/implement.tex

    r8cd34e9 r1716e1c  
    1414
    1515\subsection{Virtual Type}
    16 Virtual types only have one change to their structure, the addition of a
    17 pointer to the virtual table. This is always the first field so that
    18 if it is cast to a supertype the field's location is still known.
    19 
    20 This field is set as part of all new generated constructors.
     16Virtual types only have one change to their structure: the addition of a
     17pointer to the virtual table, called the \emph{virtual-table pointer}.
     18Internally, the field is called
     19@virtual_table@.
     20This constant pointer is always the first field of the table so when
     21casting to a supertype, the field's location is always known.
     22The field is initialized as part of all generated constructors.
    2123\todo{They only come as part exceptions and don't work.}
    22 After the object is created the field is constant.
    23 
    24 However it can be read from, internally it is just a regular field called
    25 @virtual_table@. Dereferencing it gives the virtual table and access to the
     24%After the object is created the field is constant.
     25Dereferencing it gives the virtual table and access to the
    2626type's virtual members.
    2727
    2828\subsection{Virtual Table}
    29 Every time a virtual type is defined the new virtual table type must also be
    30 defined.
    31 
    32 The unique instance is important because the address of the virtual table
    33 instance is used as the identifier for the virtual type. So a pointer to the
    34 virtual table and the ID for the virtual type are interchangable.
     29% PAB: These 2 paragraphs are repeated below, and maybe some of the paragraph above, too.
     30\begin{comment}
     31Every time a virtual type is defined, a new virtual table-type is
     32instantiated.
     33The uniqueness of the virtual-table
     34instance is important because its address
     35is used as the identifier for the virtual type. Hence, a pointer to the
     36virtual table and the ID for the virtual type are interchangeable.
    3537\todo{Unique instances might be going so we will have to talk about the new
    3638system instead.}
    3739
    38 The first step in putting it all together is to create the virtual table type.
    39 The virtual table type is just a structure and can be described in terms of
    40 its fields. The first field is always the parent type ID (or a pointer to
    41 the parent virtual table) or 0 (the null pointer).
    42 Next are other fields on the parent virtual table are repeated.
    43 Finally are the fields used to store any new virtual members of the new
    44 The virtual type
    45 
    46 The virtual system is accessed through a private constant field inserted at the
    47 beginning of every virtual type, called the virtual-table pointer. This field
    48 points at a type's virtual table and is assigned during the object's
    49 construction. The address of a virtual table acts as the unique identifier for
     40The first step is creating the virtual-table type.
     41The virtual-table type is a structure and is described in terms of
     42its fields. The first field contains the parent-type ID (or a pointer to
     43the parent virtual-table) or 0 (null pointer).
     44Next are repeated fields from on the parent virtual-table.
     45Finally, the fields used to store any new virtual members of the new
     46the virtual type.
     47\end{comment}
     48
     49%The virtual system is accessed through a private constant field inserted at the
     50%beginning of every virtual type. This field
     51The virtual-table pointer
     52points at a type's virtual table (see Figure~\vref{f:VirtualTableLayout}).
     53%and is assigned during the object's
     54%construction.
     55The address of a virtual table acts as the unique identifier for
    5056the virtual type, and the first field of a virtual table is a pointer to the
    51 parent virtual-table or @0p@. The remaining fields are duplicated from the
     57parent virtual-table or @0p@ (null pointer). The remaining fields are duplicated from the
    5258parent tables in this type's inheritance chain, followed by any fields this type
    53 introduces. Parent fields are duplicated so they can be changed (all virtual
    54 members are overridable), so that references to the dispatched type
     59introduces. Parent fields are duplicated so they can be changed, \ie all virtual
     60members are overridable, while the parent pointer allows access to the original values.
     61Hence, references to the dispatched type
    5562are replaced with the current virtual type.
    5663% These are always taken by pointer or reference.
    5764
     65\begin{figure}
    5866% Simple ascii diragram:
    59 \begin{verbatim}
    60 parent_pointer  \
    61 parent_field0   |
    62 ...             | Same layout as parent.
    63 parent_fieldN   /
    64 child_field0
     67\begin{cfa}
     68parent_pointer  // \C{parent pointer to access its fields}
     69parent_field0   // \C{same layout as parent to allow replacement}
     70...
     71parent_fieldN
     72child_field0    // \C{new types for this virtual table}
    6573...
    6674child_fieldN
    67 \end{verbatim}
    68 \todo{Refine the diagram}
     75size
     76alignment
     77\end{cfa}
     78%\todo{Refine the diagram}
     79\caption{Virtual Table Layout}
     80\label{f:VirtualTableLayout}
     81\end{figure}
    6982
    7083% For each virtual type, a virtual table is constructed. This is both a new type
     
    7386% the type and the instance.
    7487
    75 A virtual table is created when the virtual type is created. The name of the
     88\begin{comment}
     89PAB: seems to be said already.
     90A virtual table is created when a virtual type is created. The name of the
    7691type is created by mangling the name of the base type. The name of the instance
    7792is also generated by name mangling. The fields are initialized automatically.
    7893The parent field is initialized by getting the type of the parent field and
    79 using that to calculate the mangled name of the parent's virtual table type.
     94using that to calculate the mangled name of the parent's virtual-table type.
     95\end{comment}
    8096There are two special fields that are included like normal fields but have
    8197special initialization rules: the @size@ field is the type's size and is
     
    86102
    87103These operations are split up into several groups depending on where they take
    88 place which varies for monomorphic and polymorphic types. The first devision is
     104place, which varies for monomorphic and polymorphic types. The first devision is
    89105between the declarations and the definitions. Declarations, such as a function
    90 signature or a aggregate's name, must always be visible but may be repeated in
     106signature or an aggregate's name, must always be visible but may be repeated in
    91107the form of forward declarations in headers. Definitions, such as function
    92108bodies and a aggregate's layout, can be separately compiled but must occur
    93109exactly once in a source file.
    94110
    95 \begin{sloppypar}
    96 The declarations include the virtual type definition and forward declarations
    97 of the virtual table instance, constructor, message function and
     111The declarations include the virtual-type definition and forward declarations
     112of the virtual-table instance, constructor, message function and
    98113@get_exception_vtable@. The definition includes the storage and initialization
    99114of the virtual table instance and the bodies of the three functions.
    100 \end{sloppypar}
    101 
    102 Monomorphic instances put all of these two groups in one place each.
    103 Polymorphic instances also split out the core declarations and definitions from
    104 the per-instance information. The virtual table type and most of the functions
    105 are polymorphic so they are all part of the core. The virtual table instance
    106 and the @get_exception_vtable@ function.
    107 
    108 \begin{sloppypar}
     115
     116Monomorphic instances put all of these two groups in one place.
     117Polymorphic instances split out the core declarations and definitions from
     118the per-instance information. The virtual-table type and most of the functions
     119are polymorphic so they are all part of the core. The virtual-table instance
     120and the @get_exception_vtable@ function \PAB{ are ...}.
     121
    109122Coroutines and threads need instances of @CoroutineCancelled@ and
    110123@ThreadCancelled@ respectively to use all of their functionality. When a new
    111 data type is declared with @coroutine@ or @thread@ the forward declaration for
     124data type is declared with @coroutine@ or @thread@, the forward declaration for
    112125the instance is created as well. The definition of the virtual table is created
    113126at the definition of the main function.
    114 \end{sloppypar}
     127
     128\PAB{You need an example here to show what happens for this case.}
     129
    115130
    116131\subsection{Virtual Cast}
     
    121136The function is
    122137\begin{cfa}
    123 void * __cfa__virtual_cast(
    124         struct __cfa__parent_vtable const * parent,
     138void * __cfa__virtual_cast( struct __cfa__parent_vtable const * parent,
    125139        struct __cfa__parent_vtable const * const * child );
    126140\end{cfa}
    127 and it is implemented in the standard library. The structure reperents the
    128 head of a vtable which is the pointer to the parent virtual table. The
    129 @parent@ points directly at the parent type virtual table while the @child@
    130 points at the object of the (possibe) child type.
    131 
    132 In terms of the virtual cast expression, @parent@ comes from looking up the
     141and it is implemented in the standard library. The structure represents the
     142head of a virtual table, which is the pointer to the parent virtual table. The
     143@parent@ points directly at the parent-type virtual-table, while the @child@
     144points at the object of the (possible) child type.
     145
     146\PAB{Need a figure to show this relationship.}
     147
     148In terms of the virtual-cast expression, @parent@ comes from looking up the
    133149type being cast to and @child@ is the result of the expression being cast.
    134 Because the complier outputs C code, some type C type casts are also used.
    135 The last bit of glue is an map that saves every virtual type the compiler
    136 sees. This is used to check the type used in a virtual cast is a virtual
     150Because the complier outputs C code, some C-type casts are also used.
     151The last bit of glue is a map that saves every virtual type the compiler
     152sees. This table is used to check the type used in a virtual cast is a virtual
    137153type and to get its virtual table.
    138154(It also checks for conflicting definitions.)
    139155
    140 Inside the function it is a simple conditional. If the type repersented by
    141 @parent@ is or is an ancestor of the type repersented by @*child@ (it
    142 requires one more level of derefence to pass through the object) then @child@
     156\PAB{Can this be rolled into the figure above?}
     157
     158Inside the function is a simple conditional. If the type represented by
     159@parent@ is an ancestor of the type represented by @*child@ (it
     160requires one more level of dereference to pass through the object) then @child@
    143161is returned, otherwise the null pointer is returned.
    144162
    145 The check itself is preformed is a simple linear search. If the child
    146 virtual table or any of its ancestors (which are retreved through the first
    147 field of every virtual table) are the same as the parent virtual table then
     163The check is a simple linear search (like \Cpp RTTI). If the child
     164virtual table or any of its ancestors (which are retrieved through the first
     165field of every virtual table) are the same as the parent virtual-table then
    148166the cast succeeds.
    149167
     
    156174% resumption doesn't as well.
    157175
    158 % Many modern languages work with an interal stack that function push and pop
     176% Many modern languages work with an internal stack that function push and pop
    159177% their local data to. Stack unwinding removes large sections of the stack,
    160178% often across functions.
    161179
    162180Stack unwinding is the process of removing stack frames (activations) from the
    163 stack. On function entry and return, unwinding is handled directly by the code
    164 embedded in the function. Usually, the stack-frame size is known statically
     181stack. On function entry and return, unwinding is handled directly by the call/return code
     182embedded in a function. Usually, the stack-frame size is known statically
    165183based on parameter and local variable declarations. For dynamically-sized
    166 local variables, a runtime computation is necessary to know the frame
     184local variables.
     185(Often called a variable-length array or VLA, even when the variable type is an aggregate.)
     186For VLAs, a runtime computation is necessary to know the frame
    167187size. Finally, a function's frame-size may change during execution as local
    168 variables (static or dynamic sized) go in and out of scope.
     188variables (static or dynamic sized) go in and out of scope, which is a form of VLA.
    169189Allocating/deallocating stack space is usually an $O(1)$ operation achieved by
    170190bumping the hardware stack-pointer up or down as needed.
    171191
    172 Unwinding across multiple stack frames is more complex because individual stack
    173 management code associated with each frame is bypassed. That is, the location
     192Unwinding across multiple stack frames is more complex because individual stack-management
     193code associated with each frame can be bypassed. That is, the location
    174194of a function's frame-management code is largely unknown and dispersed
    175195throughout the function, hence the current frame size managed by that code is
     
    191211reseting to a snap-shot of an arbitrary but existing function frame on the
    192212stack. It is up to the programmer to ensure the snap-shot is valid when it is
    193 reset, making this unwinding approach fragile with potential errors that are
     213reset and that unwound frames do not have side-effects.
     214Hence, this unwinding approach is fragile with potential errors that are
    194215difficult to debug because the stack becomes corrupted.
    195216
    196 However, many languages define cleanup actions that must be taken when objects
    197 are deallocated from the stack or blocks end, such as running a variable's
    198 destructor or a @try@ statement's @finally@ clause. Handling these mechanisms
     217With respect to stack side-effects, many languages define cleanup actions that must be taken when objects
     218are deallocated from the stack, when the function of blocks within the function end, such as running a variable's
     219destructor or a @try@ statement's @finally@ clause.
     220The purpose of these side-effects is to reestablish the global state of the program, such as dynamic memory-allocation or file access.
     221Handling these side-effect mechanisms
    199222requires walking the stack and checking each stack frame for these potential
    200 actions.
    201 
    202 For exceptions, it must be possible to walk the stack frames in search of @try@
     223actions, where a frame can be any block with declarations.
     224
     225In languages like \Cpp and Java, it must be possible to walk the stack frames in search of @try@
    203226statements to match and execute a handler. For termination exceptions, it must
    204227also be possible to unwind all stack frames from the throw to the matching
    205 catch, and each of these frames must be checked for cleanup actions. Stack
     228catch (including the @try@ block), and each of these frames must be checked for cleanup actions. Stack
    206229walking is where most of the complexity and expense of exception handling
    207230appears.
     
    226249LSDA can contain any information but conventionally it is a table with entries
    227250representing regions of the function and what has to be done there during
    228 unwinding. These regions are bracketed by the instruction pointer. If the
     251unwinding. These regions are bracketed by instruction addresses. If the
    229252instruction pointer is within a region's start/end, then execution is currently
    230253executing in that region. Regions are used to mark out the scopes of objects
     
    238261
    239262The GCC compilation flag @-fexceptions@ causes the generation of an LSDA and
    240 attaches its personality function. However, this
     263attaches its personality function.
     264It attaches a series of opaque directives (@.cfi_personality@ directive)
     265used internally and not part of this work.
     266However, this
    241267flag only handles the cleanup attribute:
    242 \todo{Peter: What is attached? Andrew: It uses the .cfi\_personality directive
    243 and that's all I know.}
    244268\begin{cfa}
    245269void clean_up( int * var ) { ... }
    246270int avar __attribute__(( cleanup(clean_up) ));
    247271\end{cfa}
    248 which is used on a variable and specifies a function, in this case @clean_up@,
    249 run when the variable goes out of scope.
    250 The function is passed a pointer to the object being removed from the stack
    251 so it can be used to mimic destructors.
     272that is used on a variable and specifies a function, in this case @clean_up@,
     273run when the variable goes out of scope, which is used to mimic destructors.
    252274However, this feature cannot be used to mimic @try@ statements as it cannot
    253275control the unwinding.
     
    257279section covers some of the important parts of the interface.
    258280
    259 A personality function can preform different actions depending on how it is
     281A personality function can perform different actions depending on how it is
    260282called.
    261283\begin{lstlisting}[language=C,{moredelim=**[is][\color{red}]{@}{@}}]
     
    268290\end{lstlisting}
    269291The @action@ argument is a bitmask of possible actions:
    270 \begin{enumerate}
     292\begin{enumerate}[topsep=5pt]
    271293\item
    272294@_UA_SEARCH_PHASE@ specifies a search phase and tells the personality function
     
    291313@_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs
    292314the cleanup phase and uses a different means to decide when to stop
    293 (see \vref{s:ForcedUnwind}).
     315(see Section~\vref{s:ForcedUnwind}).
    294316\end{enumerate}
    295317
    296318The @exception_class@ argument is a copy of the
    297319\lstinline[language=C]|exception|'s @exception_class@ field.
     320\PAB{Say more.}
    298321
    299322The \lstinline[language=C]|exception| argument is a pointer to the user
    300323provided storage object. It has two public fields, the exception class, which
    301 is actually just a number, identifying the exception handling mechanism that
     324is just a number, identifying the exception handling mechanism that
    302325created it, and the cleanup function. The cleanup function is called if
    303326required by the exception.
     
    309332that can be passed several places in libunwind. It includes a number of
    310333messages for special cases (some of which should never be used by the
    311 personality function) and error codes but unless otherwise noted the
     334personality function) and error codes. However, unless otherwise noted, the
    312335personality function should always return @_URC_CONTINUE_UNWIND@.
    313336
     
    324347@_URC_END_OF_STACK@.
    325348
    326 Second, when a handler is matched, raise exception continues onto the cleanup
     349Second, when a handler is matched, raise exception walks the stack again performing the cleanup
    327350phase.
    328351Once again, it calls the personality functions of each stack frame from newest
     
    338361Forced Unwind is the other central function in libunwind.
    339362\begin{cfa}
    340 _Unwind_Reason_Code _Unwind_ForcedUnwind( _Unwind_Exception *,
     363_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *,
    341364        _Unwind_Stop_Fn, void *);
    342365\end{cfa}
     
    380403Each stack must have its own exception context. In a sequential \CFA program,
    381404there is only one stack with a single global exception-context. However, when
    382 the library @libcfathread@ is linked, there are multiple stacks where each
     405the library @libcfathread@ is linked, there are multiple stacks, where each
    383406needs its own exception context.
    384407
    385 General access to the exception context is provided by function
    386 @this_exception_context@. For sequential execution, this function is defined as
     408The function @this_exception_context@ provides general access to the exception context.
     409For sequential execution, this function is defined as
    387410a weak symbol in the \CFA system-library, @libcfa@. When a \CFA program is
    388411concurrent, it links with @libcfathread@, where this function is defined with a
     
    390413
    391414The sequential @this_exception_context@ returns a hard-coded pointer to the
    392 global execption context.
     415global exception context.
    393416The concurrent version adds the exception context to the data stored at the
    394 base of each stack. When @this_exception_context@ is called it retrieves the
     417base of each stack. When @this_exception_context@ is called, it retrieves the
    395418active stack and returns the address of the context saved there.
    396419
     
    399422% catches. Talk about GCC nested functions.
    400423
    401 Termination exceptions use libunwind heavily because it matches the intended
    402 use from \Cpp exceptions closely. The main complication for \CFA is that the
     424Termination exceptions use libunwind heavily because \CFA termination exceptions match
     425\Cpp exceptions closely. The main complication for \CFA is that the
    403426compiler generates C code, making it very difficult to generate the assembly to
    404427form the LSDA for try blocks or destructors.
     
    407430The first step of a termination raise is to copy the exception into memory
    408431managed by the exception system. Currently, the system uses @malloc@, rather
    409 than reserved memory or the stack top. The exception handling mechanism manages
     432than reserved memory or the stack top. The exception-handling mechanism manages
    410433memory for the exception as well as memory for libunwind and the system's own
    411434per-exception storage.
    412435
    413 [Quick ASCII diagram to get started.]
     436\begin{figure}
    414437\begin{verbatim}
    415438Fixed Header  | _Unwind_Exception   <- pointer target
     
    420443              V ...
    421444\end{verbatim}
    422 
    423 Exceptions are stored in variable-sized blocks.
    424 The first component is a fixed sized data structure that contains the
     445\caption{Exception Layout}
     446\label{f:ExceptionLayout}
     447\end{figure}
     448
     449Exceptions are stored in variable-sized blocks (see Figure~\vref{f:ExceptionLayout}).
     450The first component is a fixed-sized data-structure that contains the
    425451information for libunwind and the exception system. The second component is an
    426452area of memory big enough to store the exception. Macros with pointer arthritic
     
    428454@_Unwind_Exception@ to the entire node.
    429455
    430 All of these nodes are linked together in a list, one list per stack, with the
     456Multiple exceptions can exist because handlers can call functions that raise
     457exceptions.  Figure~\vref{f:MultipleExceptions} shows a \Cpp program where
     458exceptions are handled, and then a function is called from the handler that
     459raises a new exception. The previous exception must persist because it is
     460unhandled, and hence, control can return to the handler and that exception is
     461reraised.
     462
     463\begin{figure}
     464\centering
     465\newsavebox{\myboxA}
     466\newsavebox{\myboxB}
     467\begin{lrbox}{\myboxA}
     468\begin{lstlisting}[language=C++,{moredelim=**[is][\color{red}]{@}{@}}]
     469struct E {};
     470int cnt = 3;
     471void f( int i ) {
     472        if ( i == 0 ) @throw E();@
     473        try {
     474                @f( i - 1 );@
     475        } catch( E ) { // handler h
     476                cnt -= 1;
     477                if ( cnt > 0 ) @f( 2 );@
     478        }
     479}
     480int main() { @f( 2 );@ }
     481\end{lstlisting}
     482\end{lrbox}
     483
     484\begin{lrbox}{\myboxB}
     485\begin{lstlisting}
     486h  $\makebox[0pt][l]{\textbackslash}f$
     487   f
     488   f
     489h  $\makebox[0pt][l]{\textbackslash}f$  throw E$\(_2\)$
     490   f
     491   f
     492h  $\makebox[0pt][l]{\textbackslash}f$  throw E$\(_1\)$
     493   f
     494   f
     495\end{lstlisting}
     496\end{lrbox}
     497
     498{\usebox\myboxA}
     499\hspace{25pt}
     500{\usebox\myboxB}
     501
     502\caption{Multiple Exceptions}
     503\label{f:MultipleExceptions}
     504\end{figure}
     505
     506In this case, the exception nodes are linked together in a list, one list per stack, with the
    431507list head stored in the exception context. Within each linked list, the most
    432508recently thrown exception is at the head followed by older thrown
     
    439515exception, the copy function, and the free function, so they are specific to an
    440516exception type. The size and copy function are used immediately to copy an
    441 exception into managed memory. After the exception is handled the free function
     517exception into managed memory. After the exception is handled, the free function
    442518is used to clean up the exception and then the entire node is passed to free
    443519so the memory can be given back to the heap.
     
    445521\subsection{Try Statements and Catch Clauses}
    446522The try statement with termination handlers is complex because it must
    447 compensate for the lack of assembly-code generated from \CFA. Libunwind
     523compensate for the lack of assembly code generated from \CFA. Libunwind
    448524requires an LSDA and personality function for control to unwind across a
    449525function. The LSDA in particular is hard to mimic in generated C code.
     
    454530calls them.
    455531Because this function is known and fixed (and not an arbitrary function that
    456 happens to contain a try statement) this means the LSDA can be generated ahead
     532happens to contain a try statement), this means the LSDA can be generated ahead
    457533of time.
    458534
    459535Both the LSDA and the personality function are set ahead of time using
    460 embedded assembly. This is handcrafted using C @asm@ statements and contains
    461 enough information for the single try statement the function repersents.
     536embedded assembly. This assembly code is handcrafted using C @asm@ statements and contains
     537enough information for the single try statement the function represents.
    462538
    463539The three functions passed to try terminate are:
     
    487563nested functions and all other functions besides @__cfaehm_try_terminate@ in
    488564\CFA use the GCC personality function and the @-fexceptions@ flag to generate
    489 the LSDA. This allows destructors to be implemented with the cleanup attribute.
     565the LSDA. Through this mechanism, \CFA destructors are implemented via the cleanup attribute.
     566
     567\PAB{Try to put together an example try statement illustrating these components.}
    490568
    491569\section{Resumption}
    492570% The stack-local data, the linked list of nodes.
    493571
    494 Resumption simple to implement because there is no stack unwinding. The
     572Resumption is simpler to implement than termination because there is no stack
     573unwinding.  \PAB{You need to explain how the \lstinline{catchResume} clauses are
     574handled. Do you use the personality mechanism in libunwind or do you roll your
     575own mechanism?}
     576
     577The
    495578resumption raise uses a list of nodes for its stack traversal. The head of the
    496579list is stored in the exception context. The nodes in the list have a pointer
    497580to the next node and a pointer to the handler function.
    498 
    499581A resumption raise traverses this list. At each node the handler function is
    500582called, passing the exception by pointer. It returns true if the exception is
     
    512594the stack
    513595already examined, is accomplished by updating the front of the list as the
    514 search continues. Before the handler at a node is called the head of the list
     596search continues. Before the handler at a node is called, the head of the list
    515597is updated to the next node of the current node. After the search is complete,
    516598successful or not, the head of the list is reset.
     
    521603the other handler checked up to this point are not checked again.
    522604
    523 This structure also supports new handler added while the resumption is being
     605This structure also supports new handlers added while the resumption is being
    524606handled. These are added to the front of the list, pointing back along the
    525607stack -- the first one points over all the checked handlers -- and the ordering
    526608is maintained.
     609
     610\PAB{Again, a figure to show how this works would be helpful.}
    527611
    528612\label{p:zero-cost}
     
    539623% that unwind is required knowledge for that chapter.
    540624
     625\PAB{This paragraph needs to be moved to the start of this Section, where I have have my other comment.}
     626
    541627\section{Finally}
    542628% Uses destructors and GCC nested functions.
    543 Finally clauses is placed into a GCC nested-function with a unique name, and no
     629A finally clause is placed into a GCC nested-function with a unique mangled name, and no
    544630arguments or return values. This nested function is then set as the cleanup
    545631function of an empty object that is declared at the beginning of a block placed
    546 around the context of the associated @try@ statement.
    547 
    548 The rest is handled by GCC. The try block and all handlers are inside the
     632around the context of an associated @try@ statement.
     633
     634The rest is handled by GCC. The try block and all handlers are inside this
    549635block. At completion, control exits the block and the empty object is cleaned
    550636up, which runs the function that contains the finally code.
     
    554640
    555641Cancellation also uses libunwind to do its stack traversal and unwinding,
    556 however it uses a different primary function @_Unwind_ForcedUnwind@. Details
    557 of its interface can be found in the \vref{s:ForcedUnwind}.
     642however it uses a different primary function, @_Unwind_ForcedUnwind@. Details
     643of its interface can be found in Section~\vref{s:ForcedUnwind}.
    558644
    559645The first step of cancellation is to find the cancelled stack and its type:
    560 coroutine or thread. Fortunately, the thread library stores the main thread
    561 pointer and the current thread pointer, and every thread stores a pointer to
    562 its main coroutine and the coroutine it is currently executing.
    563 
    564 So if the active thread's main and current coroutine are the same. If they
    565 are then the current stack is a thread stack, otherwise it is a coroutine
    566 stack. If it is a thread stack then an equality check with the stored main
    567 thread pointer and current thread pointer is enough to tell if the current
    568 thread is the main thread or not.
    569 
     646coroutine or thread. Fortunately, the thread library stores the program-main thread
     647pointer and the current-thread pointer, and every thread stores a pointer to
     648the current coroutine it is executing.
     649
     650\PAB{I don't know if my corrections in the previous paragraph are correct.}
     651
     652When the active thread and coroutine are the same, the current stack is the thread stack, otherwise it is a coroutine
     653stack.
     654% PAB: repeated?
     655% If it is a thread stack, then an equality check with the stored main
     656% thread pointer and current thread pointer is enough to tell if the current
     657% thread is the main thread or not.
    570658However, if the threading library is not linked, the sequential execution is on
    571659the main stack. Hence, the entire check is skipped because the weak-symbol
     
    575663Regardless of how the stack is chosen, the stop function and parameter are
    576664passed to the forced-unwind function. The general pattern of all three stop
    577 functions is the same: they continue unwinding until the end of stack when they
    578 do there primary work.
    579 
     665functions is the same: continue unwinding until the end of stack.
     666%when they
     667%do there primary work.
    580668For main stack cancellation, the transfer is just a program abort.
    581669
    582 For coroutine cancellation, the exception is stored on the coroutine's stack,
     670For coroutine cancellation, the exception is stored in the coroutine's stack,
    583671and the coroutine context switches to its last resumer. The rest is handled on
    584672the backside of the resume, which check if the resumed coroutine is
  • doc/theses/andrew_beach_MMath/uw-ethesis.tex

    r8cd34e9 r1716e1c  
    9797% cfa macros used in the document
    9898\usepackage{cfalab}
     99% allow global and individual modification of spacing
     100\usepackage{enumitem}
    99101% Improved reference tools.
    100102\usepackage[nospace]{varioref}
  • libcfa/src/concurrency/invoke.h

    r8cd34e9 r1716e1c  
    146146        struct __thread_desc_link {
    147147                struct $thread * next;
    148                 struct $thread * prev;
    149148                volatile unsigned long long ts;
    150                 unsigned preferred;
    151149        };
    152150
     
    155153                // context that is switch during a __cfactx_switch
    156154                struct __stack_context_t context;
     155
     156                // Link lists fields
     157                // instrusive link field for threads
     158                struct __thread_desc_link link;
    157159
    158160                // current execution status for coroutine
     
    170172                struct cluster * curr_cluster;
    171173
    172                 // Link lists fields
    173                 // instrusive link field for threads
    174                 struct __thread_desc_link link;
     174                // preferred ready-queue
     175                unsigned preferred;
    175176
    176177                // coroutine body used to store context
  • libcfa/src/concurrency/kernel.cfa

    r8cd34e9 r1716e1c  
    184184                MAIN_LOOP:
    185185                for() {
     186                        #if 1
    186187                        // Check if there is pending io
    187188                        __maybe_io_drain( this );
     
    270271                        }
    271272
    272                 //      SEARCH: {
    273                 //              /* paranoid */ verify( ! __preemption_enabled() );
    274                 //              /* paranoid */ verify( kernelTLS().this_proc_id );
    275 
    276                 //              // First, lock the scheduler since we are searching for a thread
    277 
    278                 //              // Try to get the next thread
    279                 //              ready_schedule_lock();
    280                 //              readyThread = pop_fast( this->cltr );
    281                 //              ready_schedule_unlock();
    282                 //              if(readyThread) {  break SEARCH; }
    283 
    284                 //              // If we can't find a thread, might as well flush any outstanding I/O
    285                 //              if(this->io.pending) { __cfa_io_flush( this ); }
    286 
    287                 //              // Spin a little on I/O, just in case
    288                 //              for(25) {
    289                 //                      __maybe_io_drain( this );
    290                 //                      ready_schedule_lock();
    291                 //                      readyThread = pop_fast( this->cltr );
    292                 //                      ready_schedule_unlock();
    293                 //                      if(readyThread) {  break SEARCH; }
    294                 //              }
    295 
    296                 //              // no luck, try stealing a few times
    297                 //              for(25) {
    298                 //                      if( __maybe_io_drain( this ) ) {
    299                 //                              ready_schedule_lock();
    300                 //                              readyThread = pop_fast( this->cltr );
    301                 //                      } else {
    302                 //                              ready_schedule_lock();
    303                 //                              readyThread = pop_slow( this->cltr );
    304                 //                      }
    305                 //                      ready_schedule_unlock();
    306                 //                      if(readyThread) {  break SEARCH; }
    307                 //              }
    308 
    309                 //              // still no luck, search for a thread
    310                 //              ready_schedule_lock();
    311                 //              readyThread = pop_search( this->cltr );
    312                 //              ready_schedule_unlock();
    313                 //              if(readyThread) { break SEARCH; }
    314 
    315                 //              // Don't block if we are done
    316                 //              if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
    317 
    318                 //              __STATS( __tls_stats()->ready.sleep.halts++; )
    319 
    320                 //              // Push self to idle stack
    321                 //              mark_idle(this->cltr->procs, * this);
    322 
    323                 //              // Confirm the ready-queue is empty
    324                 //              __maybe_io_drain( this );
    325                 //              ready_schedule_lock();
    326                 //              readyThread = pop_search( this->cltr );
    327                 //              ready_schedule_unlock();
    328 
    329                 //              if( readyThread ) {
    330                 //                      // A thread was found, cancel the halt
    331                 //                      mark_awake(this->cltr->procs, * this);
    332 
    333                 //                      __STATS( __tls_stats()->ready.sleep.cancels++; )
    334 
    335                 //                      // continue the main loop
    336                 //                      break SEARCH;
    337                 //              }
    338 
    339                 //              __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->id, rdtscl()); )
    340                 //              __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle);
    341 
    342                 //              // __disable_interrupts_hard();
    343                 //              eventfd_t val;
    344                 //              eventfd_read( this->idle, &val );
    345                 //              // __enable_interrupts_hard();
    346 
    347                 //              __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->id, rdtscl()); )
    348 
    349                 //              // We were woken up, remove self from idle
    350                 //              mark_awake(this->cltr->procs, * this);
    351 
    352                 //              // DON'T just proceed, start looking again
    353                 //              continue MAIN_LOOP;
    354                 //      }
    355 
    356                 // RUN_THREAD:
    357                 //      /* paranoid */ verify( kernelTLS().this_proc_id );
    358                 //      /* paranoid */ verify( ! __preemption_enabled() );
    359                 //      /* paranoid */ verify( readyThread );
    360 
    361                 //      // Reset io dirty bit
    362                 //      this->io.dirty = false;
    363 
    364                 //      // We found a thread run it
    365                 //      __run_thread(this, readyThread);
    366 
    367                 //      // Are we done?
    368                 //      if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
    369 
    370                 //      #if !defined(__CFA_NO_STATISTICS__)
    371                 //              unsigned long long curr = rdtscl();
    372                 //              if(curr > (last_tally + 500000000)) {
    373                 //                      __tally_stats(this->cltr->stats, __cfaabi_tls.this_stats);
    374                 //                      last_tally = curr;
    375                 //              }
    376                 //      #endif
    377 
    378                 //      if(this->io.pending && !this->io.dirty) {
    379                 //              __cfa_io_flush( this );
    380                 //      }
    381 
    382                 //      // Check if there is pending io
    383                 //      __maybe_io_drain( this );
     273                        #else
     274
     275                        SEARCH: {
     276                                /* paranoid */ verify( ! __preemption_enabled() );
     277                                /* paranoid */ verify( kernelTLS().this_proc_id );
     278
     279                                // First, lock the scheduler since we are searching for a thread
     280                                ready_schedule_lock();
     281
     282                                // Try to get the next thread
     283                                readyThread = pop_fast( this->cltr );
     284                                if(readyThread) { ready_schedule_unlock(); break SEARCH; }
     285
     286                                // If we can't find a thread, might as well flush any outstanding I/O
     287                                if(this->io.pending) { __cfa_io_flush( this ); }
     288
     289                                // Spin a little on I/O, just in case
     290                                for(25) {
     291                                        __maybe_io_drain( this );
     292                                        readyThread = pop_fast( this->cltr );
     293                                        if(readyThread) { ready_schedule_unlock(); break SEARCH; }
     294                                }
     295
     296                                // no luck, try stealing a few times
     297                                for(25) {
     298                                        if( __maybe_io_drain( this ) ) {
     299                                                readyThread = pop_fast( this->cltr );
     300                                        } else {
     301                                                readyThread = pop_slow( this->cltr );
     302                                        }
     303                                        if(readyThread) { ready_schedule_unlock(); break SEARCH; }
     304                                }
     305
     306                                // still no luck, search for a thread
     307                                readyThread = pop_search( this->cltr );
     308                                if(readyThread) { ready_schedule_unlock(); break SEARCH; }
     309
     310                                // Don't block if we are done
     311                                if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
     312
     313                                __STATS( __tls_stats()->ready.sleep.halts++; )
     314
     315                                // Push self to idle stack
     316                                ready_schedule_unlock();
     317                                mark_idle(this->cltr->procs, * this);
     318                                ready_schedule_lock();
     319
     320                                // Confirm the ready-queue is empty
     321                                __maybe_io_drain( this );
     322                                readyThread = pop_search( this->cltr );
     323                                ready_schedule_unlock();
     324
     325                                if( readyThread ) {
     326                                        // A thread was found, cancel the halt
     327                                        mark_awake(this->cltr->procs, * this);
     328
     329                                        __STATS( __tls_stats()->ready.sleep.cancels++; )
     330
     331                                        // continue the main loop
     332                                        break SEARCH;
     333                                }
     334
     335                                __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->id, rdtscl()); )
     336                                __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle);
     337
     338                                // __disable_interrupts_hard();
     339                                eventfd_t val;
     340                                eventfd_read( this->idle, &val );
     341                                // __enable_interrupts_hard();
     342
     343                                __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->id, rdtscl()); )
     344
     345                                // We were woken up, remove self from idle
     346                                mark_awake(this->cltr->procs, * this);
     347
     348                                // DON'T just proceed, start looking again
     349                                continue MAIN_LOOP;
     350                        }
     351
     352                RUN_THREAD:
     353                        /* paranoid */ verify( kernelTLS().this_proc_id );
     354                        /* paranoid */ verify( ! __preemption_enabled() );
     355                        /* paranoid */ verify( readyThread );
     356
     357                        // Reset io dirty bit
     358                        this->io.dirty = false;
     359
     360                        // We found a thread run it
     361                        __run_thread(this, readyThread);
     362
     363                        // Are we done?
     364                        if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP;
     365
     366                        #if !defined(__CFA_NO_STATISTICS__)
     367                                unsigned long long curr = rdtscl();
     368                                if(curr > (last_tally + 500000000)) {
     369                                        __tally_stats(this->cltr->stats, __cfaabi_tls.this_stats);
     370                                        last_tally = curr;
     371                                }
     372                        #endif
     373
     374                        if(this->io.pending && !this->io.dirty) {
     375                                __cfa_io_flush( this );
     376                        }
     377
     378                        ready_schedule_lock();
     379                        __maybe_io_drain( this );
     380                        ready_schedule_unlock();
     381                        #endif
    384382                }
    385383
  • libcfa/src/concurrency/kernel/startup.cfa

    r8cd34e9 r1716e1c  
    461461        self_mon_p = &self_mon;
    462462        link.next = 0p;
    463         link.prev = 0p;
    464         link.preferred = -1u;
     463        link.ts   = 0;
     464        preferred = -1u;
    465465        last_proc = 0p;
    466466        #if defined( __CFA_WITH_VERIFY__ )
  • libcfa/src/concurrency/ready_queue.cfa

    r8cd34e9 r1716e1c  
    1717// #define __CFA_DEBUG_PRINT_READY_QUEUE__
    1818
    19 // #define USE_MPSC
    2019
    2120#define USE_RELAXED_FIFO
     
    256255                /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count );
    257256
    258                 // write timestamp
    259                 thrd->link.ts = rdtscl();
    260 
    261257                bool local;
    262258                int preferred = external ? -1 : kernelTLS().this_processor->rdq.id;
     
    277273                        #endif
    278274
    279                 #if defined(USE_MPSC)
    280                         // mpsc always succeeds
    281                 } while( false );
    282                 #else
    283275                        // If we can't lock it retry
    284276                } while( !__atomic_try_acquire( &lanes.data[i].lock ) );
    285                 #endif
    286277
    287278                // Actually push it
    288279                push(lanes.data[i], thrd);
    289280
    290                 #if !defined(USE_MPSC)
    291281                        // Unlock and return
    292282                        __atomic_unlock( &lanes.data[i].lock );
    293                 #endif
    294283
    295284                // Mark the current index in the tls rng instance as having an item
     
    350339                __cfadbg_print_safe(ready_queue, "Kernel : Pushing %p on cluster %p\n", thrd, cltr);
    351340
     341                // #define USE_PREFERRED
     342                #if !defined(USE_PREFERRED)
    352343                const bool external = (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr);
    353344                /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count );
    354 
    355                 // write timestamp
    356                 thrd->link.ts = rdtscl();
     345                #else
     346                        unsigned preferred = thrd->preferred;
     347                        const bool external = (!kernelTLS().this_processor) || preferred == -1u || thrd->curr_cluster != cltr;
     348                        /* paranoid */ verifyf(external || preferred < lanes.count, "Invalid preferred queue %u for %u lanes", preferred, lanes.count );
     349
     350                        unsigned r = preferred % READYQ_SHARD_FACTOR;
     351                        const unsigned start = preferred - r;
     352                #endif
    357353
    358354                // Try to pick a lane and lock it
     
    368364                        }
    369365                        else {
     366                                #if !defined(USE_PREFERRED)
    370367                                processor * proc = kernelTLS().this_processor;
    371368                                unsigned r = proc->rdq.its++;
    372369                                i =  proc->rdq.id + (r % READYQ_SHARD_FACTOR);
     370                #else
     371                                        i = start + (r++ % READYQ_SHARD_FACTOR);
     372                                #endif
    373373                        }
    374 
    375 
    376                 #if defined(USE_MPSC)
    377                         // mpsc always succeeds
    378                 } while( false );
    379                 #else
    380374                        // If we can't lock it retry
    381375                } while( !__atomic_try_acquire( &lanes.data[i].lock ) );
    382                 #endif
    383376
    384377                // Actually push it
    385378                push(lanes.data[i], thrd);
    386379
    387                 #if !defined(USE_MPSC)
    388380                        // Unlock and return
    389381                        __atomic_unlock( &lanes.data[i].lock );
    390                 #endif
    391382
    392383                #if !defined(__CFA_NO_STATISTICS__)
     
    492483                lanes.tscs[w].tv = thrd->link.ts;
    493484        #endif
     485
     486        thrd->preferred = w;
    494487
    495488        // return the popped thread
     
    519512// Check that all the intrusive queues in the data structure are still consistent
    520513static void check( __ready_queue_t & q ) with (q) {
    521         #if defined(__CFA_WITH_VERIFY__) && !defined(USE_MPSC)
     514        #if defined(__CFA_WITH_VERIFY__)
    522515                {
    523516                        for( idx ; lanes.count ) {
     
    525518                                assert(!lanes.data[idx].lock);
    526519
    527                                 assert(head(sl)->link.prev == 0p );
    528                                 assert(head(sl)->link.next->link.prev == head(sl) );
    529                                 assert(tail(sl)->link.next == 0p );
    530                                 assert(tail(sl)->link.prev->link.next == tail(sl) );
    531 
    532                                 if(is_empty(sl)) {
    533                                         assert(tail(sl)->link.prev == head(sl));
    534                                         assert(head(sl)->link.next == tail(sl));
    535                                 } else {
    536                                         assert(tail(sl)->link.prev != head(sl));
    537                                         assert(head(sl)->link.next != tail(sl));
    538                                 }
     520                                        if(is_empty(sl)) {
     521                                                assert( sl.anchor.next == 0p );
     522                                                assert( sl.anchor.ts   == 0  );
     523                                                assert( mock_head(sl)  == sl.prev );
     524                                        } else {
     525                                                assert( sl.anchor.next != 0p );
     526                                                assert( sl.anchor.ts   != 0  );
     527                                                assert( mock_head(sl)  != sl.prev );
     528                                        }
    539529                        }
    540530                }
     
    557547// fixes the list so that the pointers back to anchors aren't left dangling
    558548static inline void fix(__intrusive_lane_t & ll) {
    559         #if !defined(USE_MPSC)
    560                 // if the list is not empty then follow he pointer and fix its reverse
    561                 if(!is_empty(ll)) {
    562                         head(ll)->link.next->link.prev = head(ll);
    563                         tail(ll)->link.prev->link.next = tail(ll);
    564                 }
    565                 // Otherwise just reset the list
    566                 else {
    567                         verify(tail(ll)->link.next == 0p);
    568                         tail(ll)->link.prev = head(ll);
    569                         head(ll)->link.next = tail(ll);
    570                         verify(head(ll)->link.prev == 0p);
    571                 }
    572         #endif
     549                        if(is_empty(ll)) {
     550                                verify(ll.anchor.next == 0p);
     551                                ll.prev = mock_head(ll);
     552                        }
    573553}
    574554
  • libcfa/src/concurrency/ready_subqueue.hfa

    r8cd34e9 r1716e1c  
    77// Intrusives lanes which are used by the relaxed ready queue
    88struct __attribute__((aligned(128))) __intrusive_lane_t {
    9 
    10         #if defined(USE_MPSC)
    11                 mpsc_queue($thread) queue;
    12                 __attribute__((aligned(128)))
    13         #else
    14                 // anchor for the head and the tail of the queue
    15                 __attribute__((aligned(128))) struct __sentinel_t {
    16                         // Link lists fields
    17                         // instrusive link field for threads
    18                         // must be exactly as in $thread
    19                         __thread_desc_link link;
    20                 } before, after;
    21         #endif
     9        struct $thread * prev;
    2210
    2311        // spin lock protecting the queue
    2412        volatile bool lock;
    2513
    26         // Optional statistic counters
    27         #if !defined(__CFA_NO_SCHED_STATS__)
    28                 struct __attribute__((aligned(64))) {
    29                         // difference between number of push and pops
    30                         ssize_t diff;
    31 
    32                         // total number of pushes and pops
    33                         size_t  push;
    34                         size_t  pop ;
    35                 } stat;
    36         #endif
     14        __thread_desc_link anchor;
    3715};
    3816
    39 void  ?{}(__intrusive_lane_t & this);
    40 void ^?{}(__intrusive_lane_t & this);
    41 
    4217// Get the head pointer (one before the first element) from the anchor
    43 static inline $thread * head(const __intrusive_lane_t & this) {
    44         #if defined(USE_MPSC)
    45                 return this.queue.head;
    46         #else
    47                 $thread * rhead = ($thread *)(
    48                         (uintptr_t)( &this.before ) - offsetof( $thread, link )
    49                 );
    50                 /* paranoid */ verify(rhead);
    51                 return rhead;
    52         #endif
    53 }
    54 
    55 // Get the tail pointer (one after the last element) from the anchor
    56 static inline $thread * tail(const __intrusive_lane_t & this) {
    57         #if defined(USE_MPSC)
    58                 return this.queue.tail;
    59         #else
    60                 $thread * rtail = ($thread *)(
    61                         (uintptr_t)( &this.after ) - offsetof( $thread, link )
    62                 );
    63                 /* paranoid */ verify(rtail);
    64                 return rtail;
    65         #endif
     18static inline $thread * mock_head(const __intrusive_lane_t & this) {
     19        $thread * rhead = ($thread *)(
     20                (uintptr_t)( &this.anchor ) - __builtin_offsetof( $thread, link )
     21        );
     22        return rhead;
    6623}
    6724
     
    6926void ?{}( __intrusive_lane_t & this ) {
    7027        this.lock = false;
     28        this.prev = mock_head(this);
     29        this.anchor.next = 0p;
     30        this.anchor.ts   = 0;
    7131
    72         #if !defined(USE_MPSC)
    73                 this.before.link.prev = 0p;
    74                 this.before.link.next = tail(this);
    75                 this.before.link.ts   = 0;
    76 
    77                 this.after .link.prev = head(this);
    78                 this.after .link.next = 0p;
    79                 this.after .link.ts   = 0;
    80 
    81                 #if !defined(__CFA_NO_SCHED_STATS__)
    82                         this.stat.diff = 0;
    83                         this.stat.push = 0;
    84                         this.stat.pop  = 0;
    85                 #endif
    86 
    87                 // We add a boat-load of assertions here because the anchor code is very fragile
    88                 /* paranoid */ verify(((uintptr_t)( head(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.before));
    89                 /* paranoid */ verify(((uintptr_t)( tail(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.after ));
    90                 /* paranoid */ verify(head(this)->link.prev == 0p );
    91                 /* paranoid */ verify(head(this)->link.next == tail(this) );
    92                 /* paranoid */ verify(tail(this)->link.next == 0p );
    93                 /* paranoid */ verify(tail(this)->link.prev == head(this) );
    94                 /* paranoid */ verify(&head(this)->link.prev == &this.before.link.prev );
    95                 /* paranoid */ verify(&head(this)->link.next == &this.before.link.next );
    96                 /* paranoid */ verify(&tail(this)->link.prev == &this.after .link.prev );
    97                 /* paranoid */ verify(&tail(this)->link.next == &this.after .link.next );
    98                 /* paranoid */ verify(__alignof__(__intrusive_lane_t) == 128);
    99                 /* paranoid */ verify(__alignof__(this) == 128);
    100                 /* paranoid */ verifyf(((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128));
    101         #endif
     32        // We add a boat-load of assertions here because the anchor code is very fragile
     33        /* paranoid */ verify( offsetof( $thread, link ) == offsetof(__intrusive_lane_t, anchor) );
     34        /* paranoid */ verify( ((uintptr_t)( mock_head(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.anchor) );
     35        /* paranoid */ verify( &mock_head(this)->link.next == &this.anchor.next );
     36        /* paranoid */ verify( &mock_head(this)->link.ts   == &this.anchor.ts   );
     37        /* paranoid */ verify( mock_head(this)->link.next == 0p );
     38        /* paranoid */ verify( mock_head(this)->link.ts   == 0  );
     39        /* paranoid */ verify( mock_head(this) == this.prev );
     40        /* paranoid */ verify( __alignof__(__intrusive_lane_t) == 128 );
     41        /* paranoid */ verify( __alignof__(this) == 128 );
     42        /* paranoid */ verifyf( ((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128) );
    10243}
    10344
    10445// Dtor is trivial
    10546void ^?{}( __intrusive_lane_t & this ) {
    106         #if !defined(USE_MPSC)
    107                 // Make sure the list is empty
    108                 /* paranoid */ verify(head(this)->link.prev == 0p );
    109                 /* paranoid */ verify(head(this)->link.next == tail(this) );
    110                 /* paranoid */ verify(tail(this)->link.next == 0p );
    111                 /* paranoid */ verify(tail(this)->link.prev == head(this) );
    112         #endif
     47        // Make sure the list is empty
     48        /* paranoid */ verify( this.anchor.next == 0p );
     49        /* paranoid */ verify( this.anchor.ts   == 0  );
     50        /* paranoid */ verify( mock_head(this)  == this.prev );
    11351}
    11452
    11553// Push a thread onto this lane
    11654// returns true of lane was empty before push, false otherwise
    117 bool push(__intrusive_lane_t & this, $thread * node) {
    118         #if defined(USE_MPSC)
    119                 inline $thread * volatile & ?`next ( $thread * this )  __attribute__((const)) {
    120                         return this->link.next;
    121                 }
    122                 push(this.queue, node);
    123         #else
    124                 #if defined(__CFA_WITH_VERIFY__)
    125                         /* paranoid */ verify(this.lock);
    126                         /* paranoid */ verify(node->link.ts != 0);
    127                         /* paranoid */ verify(node->link.next == 0p);
    128                         /* paranoid */ verify(node->link.prev == 0p);
    129                         /* paranoid */ verify(tail(this)->link.next == 0p);
    130                         /* paranoid */ verify(head(this)->link.prev == 0p);
     55void push( __intrusive_lane_t & this, $thread * node ) {
     56        /* paranoid */ verify( node->link.next == 0p );
     57        /* paranoid */ verify( node->link.ts   == 0  );
     58        /* paranoid */ verify( this.prev->link.next == 0p );
     59        /* paranoid */ verify( this.prev->link.ts   == 0  );
     60        if( this.anchor.next == 0p ) {
     61                /* paranoid */ verify( this.anchor.next == 0p );
     62                /* paranoid */ verify( this.anchor.ts   == 0  );
     63                /* paranoid */ verify( this.prev == mock_head( this ) );
     64        } else {
     65                /* paranoid */ verify( this.anchor.next != 0p );
     66                /* paranoid */ verify( this.anchor.ts   != 0  );
     67                /* paranoid */ verify( this.prev != mock_head( this ) );
     68        }
    13169
    132                         if(this.before.link.ts == 0l) {
    133                                 /* paranoid */ verify(tail(this)->link.prev == head(this));
    134                                 /* paranoid */ verify(head(this)->link.next == tail(this));
    135                         } else {
    136                                 /* paranoid */ verify(tail(this)->link.prev != head(this));
    137                                 /* paranoid */ verify(head(this)->link.next != tail(this));
    138                         }
    139                 #endif
    140 
    141                 // Get the relevant nodes locally
    142                 $thread * tail = tail(this);
    143                 $thread * prev = tail->link.prev;
    144 
    145                 // Do the push
    146                 node->link.next = tail;
    147                 node->link.prev = prev;
    148                 prev->link.next = node;
    149                 tail->link.prev = node;
    150 
    151                 // Update stats
    152                 #if !defined(__CFA_NO_SCHED_STATS__)
    153                         this.stat.diff++;
    154                         this.stat.push++;
    155                 #endif
    156 
    157                 verify(node->link.next == tail(this));
    158 
    159                 // Check if the queue used to be empty
    160                 if(this.before.link.ts == 0l) {
    161                         this.before.link.ts = node->link.ts;
    162                         /* paranoid */ verify(node->link.prev == head(this));
    163                         return true;
    164                 }
    165                 return false;
    166         #endif
     70        // Get the relevant nodes locally
     71        this.prev->link.next = node;
     72        this.prev->link.ts   = rdtscl();
     73        this.prev = node;
    16774}
    16875
     
    17077// returns popped
    17178// returns true of lane was empty before push, false otherwise
    172 $thread * pop(__intrusive_lane_t & this) {
    173         /* paranoid */ verify(this.lock);
    174         #if defined(USE_MPSC)
    175                 inline $thread * volatile & ?`next ( $thread * this )  __attribute__((const)) {
    176                         return this->link.next;
    177                 }
    178                 return pop(this.queue);
    179         #else
    180                 /* paranoid */ verify(this.before.link.ts != 0ul);
     79$thread * pop( __intrusive_lane_t & this ) {
     80        /* paranoid */ verify( this.anchor.next != 0p );
     81        /* paranoid */ verify( this.anchor.ts   != 0  );
    18182
    182                 // Get anchors locally
    183                 $thread * head = head(this);
    184                 $thread * tail = tail(this);
     83        // Get the relevant nodes locally
     84        $thread * node = this.anchor.next;
     85        this.anchor.next = node->link.next;
     86        this.anchor.ts   = node->link.ts;
     87        bool is_empty = this.anchor.ts == 0;
     88        node->link.next = 0p;
     89        node->link.ts   = 0;
    18590
    186                 // Get the relevant nodes locally
    187                 $thread * node = head->link.next;
    188                 $thread * next = node->link.next;
     91        // Update head time stamp
     92        if(is_empty) this.prev = mock_head( this );
    18993
    190                 /* paranoid */ verify(node != tail);
    191                 /* paranoid */ verify(node);
    192 
    193                 // Do the pop
    194                 head->link.next = next;
    195                 next->link.prev = head;
    196                 node->link.next = 0p;
    197                 node->link.prev = 0p;
    198 
    199                 // Update head time stamp
    200                 this.before.link.ts = next->link.ts;
    201 
    202                 // Update stats
    203                 #ifndef __CFA_NO_SCHED_STATS__
    204                         this.stat.diff--;
    205                         this.stat.pop ++;
    206                 #endif
    207 
    208                 // Check if we emptied list and return accordingly
    209                 /* paranoid */ verify(tail(this)->link.next == 0p);
    210                 /* paranoid */ verify(head(this)->link.prev == 0p);
    211                 if(next == tail) {
    212                         /* paranoid */ verify(this.before.link.ts == 0);
    213                         /* paranoid */ verify(tail(this)->link.prev == head(this));
    214                         /* paranoid */ verify(head(this)->link.next == tail(this));
    215                         return node;
    216                 }
    217                 else {
    218                         /* paranoid */ verify(next->link.ts != 0);
    219                         /* paranoid */ verify(tail(this)->link.prev != head(this));
    220                         /* paranoid */ verify(head(this)->link.next != tail(this));
    221                         /* paranoid */ verify(this.before.link.ts != 0);
    222                         return node;
    223                 }
    224         #endif
     94        /* paranoid */ verify( node->link.next == 0p );
     95        /* paranoid */ verify( node->link.ts   == 0  );
     96        return node;
    22597}
    22698
    22799// Check whether or not list is empty
    228100static inline bool is_empty(__intrusive_lane_t & this) {
    229         #if defined(USE_MPSC)
    230                 return this.queue.head == 0p;
    231         #else
    232                 // Cannot verify here since it may not be locked
    233                 return this.before.link.ts == 0;
    234         #endif
     101        return this.anchor.ts == 0;
    235102}
    236103
    237104// Return the timestamp
    238105static inline unsigned long long ts(__intrusive_lane_t & this) {
    239         #if defined(USE_MPSC)
    240                 $thread * tl = this.queue.head;
    241                 if(!tl) return -1ull;
    242                 return tl->link.ts;
    243         #else
    244                 // Cannot verify here since it may not be locked
    245                 return this.before.link.ts;
    246         #endif
     106        // Cannot verify here since it may not be locked
     107        return this.anchor.ts;
    247108}
    248109
  • libcfa/src/concurrency/thread.cfa

    r8cd34e9 r1716e1c  
    3838        curr_cluster = &cl;
    3939        link.next = 0p;
    40         link.prev = 0p;
    41         link.preferred = -1u;
     40        link.ts   = 0;
     41        preferred = -1u;
    4242        last_proc = 0p;
    4343        #if defined( __CFA_WITH_VERIFY__ )
  • libcfa/src/containers/array.hfa

    r8cd34e9 r1716e1c  
    2121    };
    2222
    23     Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, ptrdiff_t i ) {
     23    // About the choice of integral types offered as subscript overloads:
     24    // Intent is to cover these use cases:
     25    //    float foo( ptrdiff_t i ) { return a[i]; }           // i : ptrdiff_t
     26    //    forall( [N] ) ... for( i; N ) { total += a[i]; }    // i : typeof( sizeof(42) )
     27    //    for( i; 5 ) { total += a[i]; }                      // i : int
     28    // It gets complicated by:
     29    // -  CFA does overloading on concrete types, like int and unsigned int, not on typedefed
     30    //    types like size_t.  So trying to overload on ptrdiff_t vs int works in 64-bit mode
     31    //    but not in 32-bit mode.
     32    // -  Given bug of Trac #247, CFA gives sizeof expressions type unsigned long int, when it
     33    //    should give them type size_t.
     34    //   
     35    //                          gcc -m32         cfa -m32 given bug         gcc -m64
     36    // ptrdiff_t                int              int                        long int
     37    // size_t                   unsigned int     unsigned int               unsigned long int
     38    // typeof( sizeof(42) )     unsigned int     unsigned long int          unsigned long int
     39    // int                      int              int                        int
     40
     41    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) {
    2442        return (Timmed &) a.strides[i];
    2543    }
    2644
    27     Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) {
     45    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned int i ) {
    2846        return (Timmed &) a.strides[i];
    2947    }
    3048
    31     Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, size_t i ) {
     49    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, long int i ) {
    3250        return (Timmed &) a.strides[i];
    3351    }
    3452
    35     size_t ?`len( arpk(N, S, Timmed, Tbase) & a ) {
     53    static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned long int i ) {
     54        return (Timmed &) a.strides[i];
     55    }
     56
     57    static inline size_t ?`len( arpk(N, S, Timmed, Tbase) & a ) {
    3658        return z(N);
    3759    }
    3860
    3961    // workaround #226 (and array relevance thereof demonstrated in mike102/otype-slow-ndims.cfa)
    40     void ?{}( arpk(N, S, Timmed, Tbase) & this ) {
     62    static inline void ?{}( arpk(N, S, Timmed, Tbase) & this ) {
    4163        void ?{}( S (&inner)[z(N)] ) {}
    4264        ?{}(this.strides);
    4365    }
    44     void ^?{}( arpk(N, S, Timmed, Tbase) & this ) {
     66    static inline void ^?{}( arpk(N, S, Timmed, Tbase) & this ) {
    4567        void ^?{}( S (&inner)[z(N)] ) {}
    4668        ^?{}(this.strides);
     
    5375
    5476forall( Te )
    55 Te mkar_( tag(Te) ) {}
     77static inline Te mkar_( tag(Te) ) {}
    5678
    5779forall( [N], ZTags ... , Trslt &, Tatom & | { Trslt mkar_( tag(Tatom), ZTags ); } )
    58 arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {}
     80static inline arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {}
    5981
    6082// based on https://stackoverflow.com/questions/1872220/is-it-possible-to-iterate-over-arguments-in-variadic-macros
     
    90112
    91113forall( TA &, TB &, TC &, IxAB, IxBC ... | { TB & ?[?]( TA &, IxAB ); TC & ?[?]( TB &, IxBC ); } )
    92 TC & ?[?]( TA & this, IxAB ab, IxBC bc ) {
     114static inline TC & ?[?]( TA & this, IxAB ab, IxBC bc ) {
    93115    return this[ab][bc];
    94116}
     
    99121
    100122forall( TA &, TB &, TC &, IxAB_0, IxBC | { TB & ?[?]( TA &, IxAB_0 ); TC & ?[?]( TB &, IxBC ); } )
    101 TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) {
     123static inline TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) {
    102124    return this[ab][bc];
    103125}
    104126
    105127forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1 ); TC & ?[?]( TB &, IxBC ); } )
    106 TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) {
     128static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) {
    107129    return this[[ab0,ab1]][bc];
    108130}
    109131
    110132forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxAB_2, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1, IxAB_2 ); TC & ?[?]( TB &, IxBC ); } )
    111 TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) {
     133static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) {
    112134    return this[[ab0,ab1,ab2]][bc];
    113135}
     
    121143// Base
    122144forall( [Nq], [Sq], Tbase & )
    123 tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {}
     145static inline tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {}
    124146
    125147// Rec
    126148forall( [Nq], [Sq], [N], [S], recq &, recr &, Tbase & | { tag(recr) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(recq) ); } )
    127 tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {}
     149static inline tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {}
    128150
    129151// Wrapper
    130152struct all_t {} all;
    131153forall( [N], [S], Te &, result &, Tbase & | { tag(result) enq_( tag(Tbase), tag(N), tag(S), tag(Te) ); } )
    132 result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) {
     154static inline result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) {
    133155    return (result&) this;
    134156}
Note: See TracChangeset for help on using the changeset viewer.