Changeset 1716e1c
- Timestamp:
- May 4, 2021, 12:31:26 PM (4 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- 0c4df43
- Parents:
- 8cd34e9 (diff), c0c940a (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Files:
-
- 10 edited
- 2 moved
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/cfalab.sty
r8cd34e9 r1716e1c 123 123 numberstyle=\footnotesize\sf, 124 124 % Replace/adjust listing characters that look bad in sanserif. 125 literate={-}{\makebox[1ex][c]{\raisebox{0. 4ex}{\rule{0.75ex}{0.1ex}}}}1125 literate={-}{\makebox[1ex][c]{\raisebox{0.7ex}{\rule{0.75ex}{0.1ex}}}}1 126 126 {^}{\raisebox{0.6ex}{$\scriptscriptstyle\land\,$}}1 127 127 {~}{\raisebox{0.3ex}{$\scriptstyle\sim\,$}}1 {`}{\ttfamily\upshape\hspace*{-0.1ex}`}1 -
doc/theses/andrew_beach_MMath/implement.tex
r8cd34e9 r1716e1c 14 14 15 15 \subsection{Virtual Type} 16 Virtual types only have one change to their structure, the addition of a 17 pointer to the virtual table. This is always the first field so that 18 if it is cast to a supertype the field's location is still known. 19 20 This field is set as part of all new generated constructors. 16 Virtual types only have one change to their structure: the addition of a 17 pointer to the virtual table, called the \emph{virtual-table pointer}. 18 Internally, the field is called 19 @virtual_table@. 20 This constant pointer is always the first field of the table so when 21 casting to a supertype, the field's location is always known. 22 The field is initialized as part of all generated constructors. 21 23 \todo{They only come as part exceptions and don't work.} 22 After the object is created the field is constant. 23 24 However it can be read from, internally it is just a regular field called 25 @virtual_table@. Dereferencing it gives the virtual table and access to the 24 %After the object is created the field is constant. 25 Dereferencing it gives the virtual table and access to the 26 26 type's virtual members. 27 27 28 28 \subsection{Virtual Table} 29 Every time a virtual type is defined the new virtual table type must also be 30 defined. 31 32 The unique instance is important because the address of the virtual table 33 instance is used as the identifier for the virtual type. So a pointer to the 34 virtual table and the ID for the virtual type are interchangable. 29 % PAB: These 2 paragraphs are repeated below, and maybe some of the paragraph above, too. 30 \begin{comment} 31 Every time a virtual type is defined, a new virtual table-type is 32 instantiated. 33 The uniqueness of the virtual-table 34 instance is important because its address 35 is used as the identifier for the virtual type. Hence, a pointer to the 36 virtual table and the ID for the virtual type are interchangeable. 35 37 \todo{Unique instances might be going so we will have to talk about the new 36 38 system instead.} 37 39 38 The first step in putting it all together is to create the virtual table type. 39 The virtual table type is just a structure and can be described in terms of 40 its fields. The first field is always the parent type ID (or a pointer to 41 the parent virtual table) or 0 (the null pointer). 42 Next are other fields on the parent virtual table are repeated. 43 Finally are the fields used to store any new virtual members of the new 44 The virtual type 45 46 The virtual system is accessed through a private constant field inserted at the 47 beginning of every virtual type, called the virtual-table pointer. This field 48 points at a type's virtual table and is assigned during the object's 49 construction. The address of a virtual table acts as the unique identifier for 40 The first step is creating the virtual-table type. 41 The virtual-table type is a structure and is described in terms of 42 its fields. The first field contains the parent-type ID (or a pointer to 43 the parent virtual-table) or 0 (null pointer). 44 Next are repeated fields from on the parent virtual-table. 45 Finally, the fields used to store any new virtual members of the new 46 the virtual type. 47 \end{comment} 48 49 %The virtual system is accessed through a private constant field inserted at the 50 %beginning of every virtual type. This field 51 The virtual-table pointer 52 points at a type's virtual table (see Figure~\vref{f:VirtualTableLayout}). 53 %and is assigned during the object's 54 %construction. 55 The address of a virtual table acts as the unique identifier for 50 56 the virtual type, and the first field of a virtual table is a pointer to the 51 parent virtual-table or @0p@ . The remaining fields are duplicated from the57 parent virtual-table or @0p@ (null pointer). The remaining fields are duplicated from the 52 58 parent tables in this type's inheritance chain, followed by any fields this type 53 introduces. Parent fields are duplicated so they can be changed (all virtual 54 members are overridable), so that references to the dispatched type 59 introduces. Parent fields are duplicated so they can be changed, \ie all virtual 60 members are overridable, while the parent pointer allows access to the original values. 61 Hence, references to the dispatched type 55 62 are replaced with the current virtual type. 56 63 % These are always taken by pointer or reference. 57 64 65 \begin{figure} 58 66 % Simple ascii diragram: 59 \begin{ verbatim}60 parent_pointer \61 parent_field0 |62 ... | Same layout as parent.63 parent_fieldN /64 child_field0 67 \begin{cfa} 68 parent_pointer // \C{parent pointer to access its fields} 69 parent_field0 // \C{same layout as parent to allow replacement} 70 ... 71 parent_fieldN 72 child_field0 // \C{new types for this virtual table} 65 73 ... 66 74 child_fieldN 67 \end{verbatim} 68 \todo{Refine the diagram} 75 size 76 alignment 77 \end{cfa} 78 %\todo{Refine the diagram} 79 \caption{Virtual Table Layout} 80 \label{f:VirtualTableLayout} 81 \end{figure} 69 82 70 83 % For each virtual type, a virtual table is constructed. This is both a new type … … 73 86 % the type and the instance. 74 87 75 A virtual table is created when the virtual type is created. The name of the 88 \begin{comment} 89 PAB: seems to be said already. 90 A virtual table is created when a virtual type is created. The name of the 76 91 type is created by mangling the name of the base type. The name of the instance 77 92 is also generated by name mangling. The fields are initialized automatically. 78 93 The parent field is initialized by getting the type of the parent field and 79 using that to calculate the mangled name of the parent's virtual table type. 94 using that to calculate the mangled name of the parent's virtual-table type. 95 \end{comment} 80 96 There are two special fields that are included like normal fields but have 81 97 special initialization rules: the @size@ field is the type's size and is … … 86 102 87 103 These operations are split up into several groups depending on where they take 88 place which varies for monomorphic and polymorphic types. The first devision is104 place, which varies for monomorphic and polymorphic types. The first devision is 89 105 between the declarations and the definitions. Declarations, such as a function 90 signature or a aggregate's name, must always be visible but may be repeated in106 signature or an aggregate's name, must always be visible but may be repeated in 91 107 the form of forward declarations in headers. Definitions, such as function 92 108 bodies and a aggregate's layout, can be separately compiled but must occur 93 109 exactly once in a source file. 94 110 95 \begin{sloppypar} 96 The declarations include the virtual type definition and forward declarations 97 of the virtual table instance, constructor, message function and 111 The declarations include the virtual-type definition and forward declarations 112 of the virtual-table instance, constructor, message function and 98 113 @get_exception_vtable@. The definition includes the storage and initialization 99 114 of the virtual table instance and the bodies of the three functions. 100 \end{sloppypar} 101 102 Monomorphic instances put all of these two groups in one place each. 103 Polymorphic instances also split out the core declarations and definitions from 104 the per-instance information. The virtual table type and most of the functions 105 are polymorphic so they are all part of the core. The virtual table instance 106 and the @get_exception_vtable@ function. 107 108 \begin{sloppypar} 115 116 Monomorphic instances put all of these two groups in one place. 117 Polymorphic instances split out the core declarations and definitions from 118 the per-instance information. The virtual-table type and most of the functions 119 are polymorphic so they are all part of the core. The virtual-table instance 120 and the @get_exception_vtable@ function \PAB{ are ...}. 121 109 122 Coroutines and threads need instances of @CoroutineCancelled@ and 110 123 @ThreadCancelled@ respectively to use all of their functionality. When a new 111 data type is declared with @coroutine@ or @thread@ the forward declaration for124 data type is declared with @coroutine@ or @thread@, the forward declaration for 112 125 the instance is created as well. The definition of the virtual table is created 113 126 at the definition of the main function. 114 \end{sloppypar} 127 128 \PAB{You need an example here to show what happens for this case.} 129 115 130 116 131 \subsection{Virtual Cast} … … 121 136 The function is 122 137 \begin{cfa} 123 void * __cfa__virtual_cast( 124 struct __cfa__parent_vtable const * parent, 138 void * __cfa__virtual_cast( struct __cfa__parent_vtable const * parent, 125 139 struct __cfa__parent_vtable const * const * child ); 126 140 \end{cfa} 127 and it is implemented in the standard library. The structure reperents the 128 head of a vtable which is the pointer to the parent virtual table. The 129 @parent@ points directly at the parent type virtual table while the @child@ 130 points at the object of the (possibe) child type. 131 132 In terms of the virtual cast expression, @parent@ comes from looking up the 141 and it is implemented in the standard library. The structure represents the 142 head of a virtual table, which is the pointer to the parent virtual table. The 143 @parent@ points directly at the parent-type virtual-table, while the @child@ 144 points at the object of the (possible) child type. 145 146 \PAB{Need a figure to show this relationship.} 147 148 In terms of the virtual-cast expression, @parent@ comes from looking up the 133 149 type being cast to and @child@ is the result of the expression being cast. 134 Because the complier outputs C code, some type Ctype casts are also used.135 The last bit of glue is a nmap that saves every virtual type the compiler136 sees. This is used to check the type used in a virtual cast is a virtual150 Because the complier outputs C code, some C-type casts are also used. 151 The last bit of glue is a map that saves every virtual type the compiler 152 sees. This table is used to check the type used in a virtual cast is a virtual 137 153 type and to get its virtual table. 138 154 (It also checks for conflicting definitions.) 139 155 140 Inside the function it is a simple conditional. If the type repersented by 141 @parent@ is or is an ancestor of the type repersented by @*child@ (it 142 requires one more level of derefence to pass through the object) then @child@ 156 \PAB{Can this be rolled into the figure above?} 157 158 Inside the function is a simple conditional. If the type represented by 159 @parent@ is an ancestor of the type represented by @*child@ (it 160 requires one more level of dereference to pass through the object) then @child@ 143 161 is returned, otherwise the null pointer is returned. 144 162 145 The check i tself is preformed is a simple linear search. If the child146 virtual table or any of its ancestors (which are retr eved through the first147 field of every virtual table) are the same as the parent virtual 163 The check is a simple linear search (like \Cpp RTTI). If the child 164 virtual table or any of its ancestors (which are retrieved through the first 165 field of every virtual table) are the same as the parent virtual-table then 148 166 the cast succeeds. 149 167 … … 156 174 % resumption doesn't as well. 157 175 158 % Many modern languages work with an inter al stack that function push and pop176 % Many modern languages work with an internal stack that function push and pop 159 177 % their local data to. Stack unwinding removes large sections of the stack, 160 178 % often across functions. 161 179 162 180 Stack unwinding is the process of removing stack frames (activations) from the 163 stack. On function entry and return, unwinding is handled directly by the c ode164 embedded in thefunction. Usually, the stack-frame size is known statically181 stack. On function entry and return, unwinding is handled directly by the call/return code 182 embedded in a function. Usually, the stack-frame size is known statically 165 183 based on parameter and local variable declarations. For dynamically-sized 166 local variables, a runtime computation is necessary to know the frame 184 local variables. 185 (Often called a variable-length array or VLA, even when the variable type is an aggregate.) 186 For VLAs, a runtime computation is necessary to know the frame 167 187 size. Finally, a function's frame-size may change during execution as local 168 variables (static or dynamic sized) go in and out of scope .188 variables (static or dynamic sized) go in and out of scope, which is a form of VLA. 169 189 Allocating/deallocating stack space is usually an $O(1)$ operation achieved by 170 190 bumping the hardware stack-pointer up or down as needed. 171 191 172 Unwinding across multiple stack frames is more complex because individual stack 173 management code associated with each frame isbypassed. That is, the location192 Unwinding across multiple stack frames is more complex because individual stack-management 193 code associated with each frame can be bypassed. That is, the location 174 194 of a function's frame-management code is largely unknown and dispersed 175 195 throughout the function, hence the current frame size managed by that code is … … 191 211 reseting to a snap-shot of an arbitrary but existing function frame on the 192 212 stack. It is up to the programmer to ensure the snap-shot is valid when it is 193 reset, making this unwinding approach fragile with potential errors that are 213 reset and that unwound frames do not have side-effects. 214 Hence, this unwinding approach is fragile with potential errors that are 194 215 difficult to debug because the stack becomes corrupted. 195 216 196 However, many languages define cleanup actions that must be taken when objects 197 are deallocated from the stack or blocks end, such as running a variable's 198 destructor or a @try@ statement's @finally@ clause. Handling these mechanisms 217 With respect to stack side-effects, many languages define cleanup actions that must be taken when objects 218 are deallocated from the stack, when the function of blocks within the function end, such as running a variable's 219 destructor or a @try@ statement's @finally@ clause. 220 The purpose of these side-effects is to reestablish the global state of the program, such as dynamic memory-allocation or file access. 221 Handling these side-effect mechanisms 199 222 requires walking the stack and checking each stack frame for these potential 200 actions .201 202 For exceptions, it must be possible to walk the stack frames in search of @try@223 actions, where a frame can be any block with declarations. 224 225 In languages like \Cpp and Java, it must be possible to walk the stack frames in search of @try@ 203 226 statements to match and execute a handler. For termination exceptions, it must 204 227 also be possible to unwind all stack frames from the throw to the matching 205 catch , and each of these frames must be checked for cleanup actions. Stack228 catch (including the @try@ block), and each of these frames must be checked for cleanup actions. Stack 206 229 walking is where most of the complexity and expense of exception handling 207 230 appears. … … 226 249 LSDA can contain any information but conventionally it is a table with entries 227 250 representing regions of the function and what has to be done there during 228 unwinding. These regions are bracketed by the instruction pointer. If the251 unwinding. These regions are bracketed by instruction addresses. If the 229 252 instruction pointer is within a region's start/end, then execution is currently 230 253 executing in that region. Regions are used to mark out the scopes of objects … … 238 261 239 262 The GCC compilation flag @-fexceptions@ causes the generation of an LSDA and 240 attaches its personality function. However, this 263 attaches its personality function. 264 It attaches a series of opaque directives (@.cfi_personality@ directive) 265 used internally and not part of this work. 266 However, this 241 267 flag only handles the cleanup attribute: 242 \todo{Peter: What is attached? Andrew: It uses the .cfi\_personality directive243 and that's all I know.}244 268 \begin{cfa} 245 269 void clean_up( int * var ) { ... } 246 270 int avar __attribute__(( cleanup(clean_up) )); 247 271 \end{cfa} 248 which is used on a variable and specifies a function, in this case @clean_up@, 249 run when the variable goes out of scope. 250 The function is passed a pointer to the object being removed from the stack 251 so it can be used to mimic destructors. 272 that is used on a variable and specifies a function, in this case @clean_up@, 273 run when the variable goes out of scope, which is used to mimic destructors. 252 274 However, this feature cannot be used to mimic @try@ statements as it cannot 253 275 control the unwinding. … … 257 279 section covers some of the important parts of the interface. 258 280 259 A personality function can p reform different actions depending on how it is281 A personality function can perform different actions depending on how it is 260 282 called. 261 283 \begin{lstlisting}[language=C,{moredelim=**[is][\color{red}]{@}{@}}] … … 268 290 \end{lstlisting} 269 291 The @action@ argument is a bitmask of possible actions: 270 \begin{enumerate} 292 \begin{enumerate}[topsep=5pt] 271 293 \item 272 294 @_UA_SEARCH_PHASE@ specifies a search phase and tells the personality function … … 291 313 @_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs 292 314 the cleanup phase and uses a different means to decide when to stop 293 (see \vref{s:ForcedUnwind}).315 (see Section~\vref{s:ForcedUnwind}). 294 316 \end{enumerate} 295 317 296 318 The @exception_class@ argument is a copy of the 297 319 \lstinline[language=C]|exception|'s @exception_class@ field. 320 \PAB{Say more.} 298 321 299 322 The \lstinline[language=C]|exception| argument is a pointer to the user 300 323 provided storage object. It has two public fields, the exception class, which 301 is actuallyjust a number, identifying the exception handling mechanism that324 is just a number, identifying the exception handling mechanism that 302 325 created it, and the cleanup function. The cleanup function is called if 303 326 required by the exception. … … 309 332 that can be passed several places in libunwind. It includes a number of 310 333 messages for special cases (some of which should never be used by the 311 personality function) and error codes but unless otherwise notedthe334 personality function) and error codes. However, unless otherwise noted, the 312 335 personality function should always return @_URC_CONTINUE_UNWIND@. 313 336 … … 324 347 @_URC_END_OF_STACK@. 325 348 326 Second, when a handler is matched, raise exception continues ontothe cleanup349 Second, when a handler is matched, raise exception walks the stack again performing the cleanup 327 350 phase. 328 351 Once again, it calls the personality functions of each stack frame from newest … … 338 361 Forced Unwind is the other central function in libunwind. 339 362 \begin{cfa} 340 _Unwind_Reason_Code _Unwind_ForcedUnwind( 363 _Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *, 341 364 _Unwind_Stop_Fn, void *); 342 365 \end{cfa} … … 380 403 Each stack must have its own exception context. In a sequential \CFA program, 381 404 there is only one stack with a single global exception-context. However, when 382 the library @libcfathread@ is linked, there are multiple stacks where each405 the library @libcfathread@ is linked, there are multiple stacks, where each 383 406 needs its own exception context. 384 407 385 General access to the exception context is provided by function 386 @this_exception_context@.For sequential execution, this function is defined as408 The function @this_exception_context@ provides general access to the exception context. 409 For sequential execution, this function is defined as 387 410 a weak symbol in the \CFA system-library, @libcfa@. When a \CFA program is 388 411 concurrent, it links with @libcfathread@, where this function is defined with a … … 390 413 391 414 The sequential @this_exception_context@ returns a hard-coded pointer to the 392 global ex ecption context.415 global exception context. 393 416 The concurrent version adds the exception context to the data stored at the 394 base of each stack. When @this_exception_context@ is called it retrieves the417 base of each stack. When @this_exception_context@ is called, it retrieves the 395 418 active stack and returns the address of the context saved there. 396 419 … … 399 422 % catches. Talk about GCC nested functions. 400 423 401 Termination exceptions use libunwind heavily because it matches the intended402 use from\Cpp exceptions closely. The main complication for \CFA is that the424 Termination exceptions use libunwind heavily because \CFA termination exceptions match 425 \Cpp exceptions closely. The main complication for \CFA is that the 403 426 compiler generates C code, making it very difficult to generate the assembly to 404 427 form the LSDA for try blocks or destructors. … … 407 430 The first step of a termination raise is to copy the exception into memory 408 431 managed by the exception system. Currently, the system uses @malloc@, rather 409 than reserved memory or the stack top. The exception 432 than reserved memory or the stack top. The exception-handling mechanism manages 410 433 memory for the exception as well as memory for libunwind and the system's own 411 434 per-exception storage. 412 435 413 [Quick ASCII diagram to get started.] 436 \begin{figure} 414 437 \begin{verbatim} 415 438 Fixed Header | _Unwind_Exception <- pointer target … … 420 443 V ... 421 444 \end{verbatim} 422 423 Exceptions are stored in variable-sized blocks. 424 The first component is a fixed sized data structure that contains the 445 \caption{Exception Layout} 446 \label{f:ExceptionLayout} 447 \end{figure} 448 449 Exceptions are stored in variable-sized blocks (see Figure~\vref{f:ExceptionLayout}). 450 The first component is a fixed-sized data-structure that contains the 425 451 information for libunwind and the exception system. The second component is an 426 452 area of memory big enough to store the exception. Macros with pointer arthritic … … 428 454 @_Unwind_Exception@ to the entire node. 429 455 430 All of these nodes are linked together in a list, one list per stack, with the 456 Multiple exceptions can exist because handlers can call functions that raise 457 exceptions. Figure~\vref{f:MultipleExceptions} shows a \Cpp program where 458 exceptions are handled, and then a function is called from the handler that 459 raises a new exception. The previous exception must persist because it is 460 unhandled, and hence, control can return to the handler and that exception is 461 reraised. 462 463 \begin{figure} 464 \centering 465 \newsavebox{\myboxA} 466 \newsavebox{\myboxB} 467 \begin{lrbox}{\myboxA} 468 \begin{lstlisting}[language=C++,{moredelim=**[is][\color{red}]{@}{@}}] 469 struct E {}; 470 int cnt = 3; 471 void f( int i ) { 472 if ( i == 0 ) @throw E();@ 473 try { 474 @f( i - 1 );@ 475 } catch( E ) { // handler h 476 cnt -= 1; 477 if ( cnt > 0 ) @f( 2 );@ 478 } 479 } 480 int main() { @f( 2 );@ } 481 \end{lstlisting} 482 \end{lrbox} 483 484 \begin{lrbox}{\myboxB} 485 \begin{lstlisting} 486 h $\makebox[0pt][l]{\textbackslash}f$ 487 f 488 f 489 h $\makebox[0pt][l]{\textbackslash}f$ throw E$\(_2\)$ 490 f 491 f 492 h $\makebox[0pt][l]{\textbackslash}f$ throw E$\(_1\)$ 493 f 494 f 495 \end{lstlisting} 496 \end{lrbox} 497 498 {\usebox\myboxA} 499 \hspace{25pt} 500 {\usebox\myboxB} 501 502 \caption{Multiple Exceptions} 503 \label{f:MultipleExceptions} 504 \end{figure} 505 506 In this case, the exception nodes are linked together in a list, one list per stack, with the 431 507 list head stored in the exception context. Within each linked list, the most 432 508 recently thrown exception is at the head followed by older thrown … … 439 515 exception, the copy function, and the free function, so they are specific to an 440 516 exception type. The size and copy function are used immediately to copy an 441 exception into managed memory. After the exception is handled the free function517 exception into managed memory. After the exception is handled, the free function 442 518 is used to clean up the exception and then the entire node is passed to free 443 519 so the memory can be given back to the heap. … … 445 521 \subsection{Try Statements and Catch Clauses} 446 522 The try statement with termination handlers is complex because it must 447 compensate for the lack of assembly -code generated from \CFA. Libunwind523 compensate for the lack of assembly code generated from \CFA. Libunwind 448 524 requires an LSDA and personality function for control to unwind across a 449 525 function. The LSDA in particular is hard to mimic in generated C code. … … 454 530 calls them. 455 531 Because this function is known and fixed (and not an arbitrary function that 456 happens to contain a try statement) this means the LSDA can be generated ahead532 happens to contain a try statement), this means the LSDA can be generated ahead 457 533 of time. 458 534 459 535 Both the LSDA and the personality function are set ahead of time using 460 embedded assembly. This is handcrafted using C @asm@ statements and contains461 enough information for the single try statement the function rep ersents.536 embedded assembly. This assembly code is handcrafted using C @asm@ statements and contains 537 enough information for the single try statement the function represents. 462 538 463 539 The three functions passed to try terminate are: … … 487 563 nested functions and all other functions besides @__cfaehm_try_terminate@ in 488 564 \CFA use the GCC personality function and the @-fexceptions@ flag to generate 489 the LSDA. This allows destructors to be implemented with the cleanup attribute. 565 the LSDA. Through this mechanism, \CFA destructors are implemented via the cleanup attribute. 566 567 \PAB{Try to put together an example try statement illustrating these components.} 490 568 491 569 \section{Resumption} 492 570 % The stack-local data, the linked list of nodes. 493 571 494 Resumption simple to implement because there is no stack unwinding. The 572 Resumption is simpler to implement than termination because there is no stack 573 unwinding. \PAB{You need to explain how the \lstinline{catchResume} clauses are 574 handled. Do you use the personality mechanism in libunwind or do you roll your 575 own mechanism?} 576 577 The 495 578 resumption raise uses a list of nodes for its stack traversal. The head of the 496 579 list is stored in the exception context. The nodes in the list have a pointer 497 580 to the next node and a pointer to the handler function. 498 499 581 A resumption raise traverses this list. At each node the handler function is 500 582 called, passing the exception by pointer. It returns true if the exception is … … 512 594 the stack 513 595 already examined, is accomplished by updating the front of the list as the 514 search continues. Before the handler at a node is called the head of the list596 search continues. Before the handler at a node is called, the head of the list 515 597 is updated to the next node of the current node. After the search is complete, 516 598 successful or not, the head of the list is reset. … … 521 603 the other handler checked up to this point are not checked again. 522 604 523 This structure also supports new handler added while the resumption is being605 This structure also supports new handlers added while the resumption is being 524 606 handled. These are added to the front of the list, pointing back along the 525 607 stack -- the first one points over all the checked handlers -- and the ordering 526 608 is maintained. 609 610 \PAB{Again, a figure to show how this works would be helpful.} 527 611 528 612 \label{p:zero-cost} … … 539 623 % that unwind is required knowledge for that chapter. 540 624 625 \PAB{This paragraph needs to be moved to the start of this Section, where I have have my other comment.} 626 541 627 \section{Finally} 542 628 % Uses destructors and GCC nested functions. 543 Finally clauses is placed into a GCC nested-function with a uniquename, and no629 A finally clause is placed into a GCC nested-function with a unique mangled name, and no 544 630 arguments or return values. This nested function is then set as the cleanup 545 631 function of an empty object that is declared at the beginning of a block placed 546 around the context of theassociated @try@ statement.547 548 The rest is handled by GCC. The try block and all handlers are inside th e632 around the context of an associated @try@ statement. 633 634 The rest is handled by GCC. The try block and all handlers are inside this 549 635 block. At completion, control exits the block and the empty object is cleaned 550 636 up, which runs the function that contains the finally code. … … 554 640 555 641 Cancellation also uses libunwind to do its stack traversal and unwinding, 556 however it uses a different primary function @_Unwind_ForcedUnwind@. Details557 of its interface can be found in the\vref{s:ForcedUnwind}.642 however it uses a different primary function, @_Unwind_ForcedUnwind@. Details 643 of its interface can be found in Section~\vref{s:ForcedUnwind}. 558 644 559 645 The first step of cancellation is to find the cancelled stack and its type: 560 coroutine or thread. Fortunately, the thread library stores the main thread 561 pointer and the current thread pointer, and every thread stores a pointer to 562 its main coroutine and the coroutine it is currently executing. 563 564 So if the active thread's main and current coroutine are the same. If they 565 are then the current stack is a thread stack, otherwise it is a coroutine 566 stack. If it is a thread stack then an equality check with the stored main 567 thread pointer and current thread pointer is enough to tell if the current 568 thread is the main thread or not. 569 646 coroutine or thread. Fortunately, the thread library stores the program-main thread 647 pointer and the current-thread pointer, and every thread stores a pointer to 648 the current coroutine it is executing. 649 650 \PAB{I don't know if my corrections in the previous paragraph are correct.} 651 652 When the active thread and coroutine are the same, the current stack is the thread stack, otherwise it is a coroutine 653 stack. 654 % PAB: repeated? 655 % If it is a thread stack, then an equality check with the stored main 656 % thread pointer and current thread pointer is enough to tell if the current 657 % thread is the main thread or not. 570 658 However, if the threading library is not linked, the sequential execution is on 571 659 the main stack. Hence, the entire check is skipped because the weak-symbol … … 575 663 Regardless of how the stack is chosen, the stop function and parameter are 576 664 passed to the forced-unwind function. The general pattern of all three stop 577 functions is the same: they continue unwinding until the end of stack when they578 do there primary work. 579 665 functions is the same: continue unwinding until the end of stack. 666 %when they 667 %do there primary work. 580 668 For main stack cancellation, the transfer is just a program abort. 581 669 582 For coroutine cancellation, the exception is stored on the coroutine's stack,670 For coroutine cancellation, the exception is stored in the coroutine's stack, 583 671 and the coroutine context switches to its last resumer. The rest is handled on 584 672 the backside of the resume, which check if the resumed coroutine is -
doc/theses/andrew_beach_MMath/uw-ethesis.tex
r8cd34e9 r1716e1c 97 97 % cfa macros used in the document 98 98 \usepackage{cfalab} 99 % allow global and individual modification of spacing 100 \usepackage{enumitem} 99 101 % Improved reference tools. 100 102 \usepackage[nospace]{varioref} -
libcfa/src/concurrency/invoke.h
r8cd34e9 r1716e1c 146 146 struct __thread_desc_link { 147 147 struct $thread * next; 148 struct $thread * prev;149 148 volatile unsigned long long ts; 150 unsigned preferred;151 149 }; 152 150 … … 155 153 // context that is switch during a __cfactx_switch 156 154 struct __stack_context_t context; 155 156 // Link lists fields 157 // instrusive link field for threads 158 struct __thread_desc_link link; 157 159 158 160 // current execution status for coroutine … … 170 172 struct cluster * curr_cluster; 171 173 172 // Link lists fields 173 // instrusive link field for threads 174 struct __thread_desc_link link; 174 // preferred ready-queue 175 unsigned preferred; 175 176 176 177 // coroutine body used to store context -
libcfa/src/concurrency/kernel.cfa
r8cd34e9 r1716e1c 184 184 MAIN_LOOP: 185 185 for() { 186 #if 1 186 187 // Check if there is pending io 187 188 __maybe_io_drain( this ); … … 270 271 } 271 272 272 // SEARCH: { 273 // /* paranoid */ verify( ! __preemption_enabled() ); 274 // /* paranoid */ verify( kernelTLS().this_proc_id ); 275 276 // // First, lock the scheduler since we are searching for a thread 277 278 // // Try to get the next thread 279 // ready_schedule_lock(); 280 // readyThread = pop_fast( this->cltr ); 281 // ready_schedule_unlock(); 282 // if(readyThread) { break SEARCH; } 283 284 // // If we can't find a thread, might as well flush any outstanding I/O 285 // if(this->io.pending) { __cfa_io_flush( this ); } 286 287 // // Spin a little on I/O, just in case 288 // for(25) { 289 // __maybe_io_drain( this ); 290 // ready_schedule_lock(); 291 // readyThread = pop_fast( this->cltr ); 292 // ready_schedule_unlock(); 293 // if(readyThread) { break SEARCH; } 294 // } 295 296 // // no luck, try stealing a few times 297 // for(25) { 298 // if( __maybe_io_drain( this ) ) { 299 // ready_schedule_lock(); 300 // readyThread = pop_fast( this->cltr ); 301 // } else { 302 // ready_schedule_lock(); 303 // readyThread = pop_slow( this->cltr ); 304 // } 305 // ready_schedule_unlock(); 306 // if(readyThread) { break SEARCH; } 307 // } 308 309 // // still no luck, search for a thread 310 // ready_schedule_lock(); 311 // readyThread = pop_search( this->cltr ); 312 // ready_schedule_unlock(); 313 // if(readyThread) { break SEARCH; } 314 315 // // Don't block if we are done 316 // if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP; 317 318 // __STATS( __tls_stats()->ready.sleep.halts++; ) 319 320 // // Push self to idle stack 321 // mark_idle(this->cltr->procs, * this); 322 323 // // Confirm the ready-queue is empty 324 // __maybe_io_drain( this ); 325 // ready_schedule_lock(); 326 // readyThread = pop_search( this->cltr ); 327 // ready_schedule_unlock(); 328 329 // if( readyThread ) { 330 // // A thread was found, cancel the halt 331 // mark_awake(this->cltr->procs, * this); 332 333 // __STATS( __tls_stats()->ready.sleep.cancels++; ) 334 335 // // continue the main loop 336 // break SEARCH; 337 // } 338 339 // __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->id, rdtscl()); ) 340 // __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle); 341 342 // // __disable_interrupts_hard(); 343 // eventfd_t val; 344 // eventfd_read( this->idle, &val ); 345 // // __enable_interrupts_hard(); 346 347 // __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->id, rdtscl()); ) 348 349 // // We were woken up, remove self from idle 350 // mark_awake(this->cltr->procs, * this); 351 352 // // DON'T just proceed, start looking again 353 // continue MAIN_LOOP; 354 // } 355 356 // RUN_THREAD: 357 // /* paranoid */ verify( kernelTLS().this_proc_id ); 358 // /* paranoid */ verify( ! __preemption_enabled() ); 359 // /* paranoid */ verify( readyThread ); 360 361 // // Reset io dirty bit 362 // this->io.dirty = false; 363 364 // // We found a thread run it 365 // __run_thread(this, readyThread); 366 367 // // Are we done? 368 // if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP; 369 370 // #if !defined(__CFA_NO_STATISTICS__) 371 // unsigned long long curr = rdtscl(); 372 // if(curr > (last_tally + 500000000)) { 373 // __tally_stats(this->cltr->stats, __cfaabi_tls.this_stats); 374 // last_tally = curr; 375 // } 376 // #endif 377 378 // if(this->io.pending && !this->io.dirty) { 379 // __cfa_io_flush( this ); 380 // } 381 382 // // Check if there is pending io 383 // __maybe_io_drain( this ); 273 #else 274 275 SEARCH: { 276 /* paranoid */ verify( ! __preemption_enabled() ); 277 /* paranoid */ verify( kernelTLS().this_proc_id ); 278 279 // First, lock the scheduler since we are searching for a thread 280 ready_schedule_lock(); 281 282 // Try to get the next thread 283 readyThread = pop_fast( this->cltr ); 284 if(readyThread) { ready_schedule_unlock(); break SEARCH; } 285 286 // If we can't find a thread, might as well flush any outstanding I/O 287 if(this->io.pending) { __cfa_io_flush( this ); } 288 289 // Spin a little on I/O, just in case 290 for(25) { 291 __maybe_io_drain( this ); 292 readyThread = pop_fast( this->cltr ); 293 if(readyThread) { ready_schedule_unlock(); break SEARCH; } 294 } 295 296 // no luck, try stealing a few times 297 for(25) { 298 if( __maybe_io_drain( this ) ) { 299 readyThread = pop_fast( this->cltr ); 300 } else { 301 readyThread = pop_slow( this->cltr ); 302 } 303 if(readyThread) { ready_schedule_unlock(); break SEARCH; } 304 } 305 306 // still no luck, search for a thread 307 readyThread = pop_search( this->cltr ); 308 if(readyThread) { ready_schedule_unlock(); break SEARCH; } 309 310 // Don't block if we are done 311 if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP; 312 313 __STATS( __tls_stats()->ready.sleep.halts++; ) 314 315 // Push self to idle stack 316 ready_schedule_unlock(); 317 mark_idle(this->cltr->procs, * this); 318 ready_schedule_lock(); 319 320 // Confirm the ready-queue is empty 321 __maybe_io_drain( this ); 322 readyThread = pop_search( this->cltr ); 323 ready_schedule_unlock(); 324 325 if( readyThread ) { 326 // A thread was found, cancel the halt 327 mark_awake(this->cltr->procs, * this); 328 329 __STATS( __tls_stats()->ready.sleep.cancels++; ) 330 331 // continue the main loop 332 break SEARCH; 333 } 334 335 __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->id, rdtscl()); ) 336 __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle); 337 338 // __disable_interrupts_hard(); 339 eventfd_t val; 340 eventfd_read( this->idle, &val ); 341 // __enable_interrupts_hard(); 342 343 __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->id, rdtscl()); ) 344 345 // We were woken up, remove self from idle 346 mark_awake(this->cltr->procs, * this); 347 348 // DON'T just proceed, start looking again 349 continue MAIN_LOOP; 350 } 351 352 RUN_THREAD: 353 /* paranoid */ verify( kernelTLS().this_proc_id ); 354 /* paranoid */ verify( ! __preemption_enabled() ); 355 /* paranoid */ verify( readyThread ); 356 357 // Reset io dirty bit 358 this->io.dirty = false; 359 360 // We found a thread run it 361 __run_thread(this, readyThread); 362 363 // Are we done? 364 if( __atomic_load_n(&this->do_terminate, __ATOMIC_SEQ_CST) ) break MAIN_LOOP; 365 366 #if !defined(__CFA_NO_STATISTICS__) 367 unsigned long long curr = rdtscl(); 368 if(curr > (last_tally + 500000000)) { 369 __tally_stats(this->cltr->stats, __cfaabi_tls.this_stats); 370 last_tally = curr; 371 } 372 #endif 373 374 if(this->io.pending && !this->io.dirty) { 375 __cfa_io_flush( this ); 376 } 377 378 ready_schedule_lock(); 379 __maybe_io_drain( this ); 380 ready_schedule_unlock(); 381 #endif 384 382 } 385 383 -
libcfa/src/concurrency/kernel/startup.cfa
r8cd34e9 r1716e1c 461 461 self_mon_p = &self_mon; 462 462 link.next = 0p; 463 link. prev = 0p;464 link.preferred = -1u;463 link.ts = 0; 464 preferred = -1u; 465 465 last_proc = 0p; 466 466 #if defined( __CFA_WITH_VERIFY__ ) -
libcfa/src/concurrency/ready_queue.cfa
r8cd34e9 r1716e1c 17 17 // #define __CFA_DEBUG_PRINT_READY_QUEUE__ 18 18 19 // #define USE_MPSC20 19 21 20 #define USE_RELAXED_FIFO … … 256 255 /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count ); 257 256 258 // write timestamp259 thrd->link.ts = rdtscl();260 261 257 bool local; 262 258 int preferred = external ? -1 : kernelTLS().this_processor->rdq.id; … … 277 273 #endif 278 274 279 #if defined(USE_MPSC)280 // mpsc always succeeds281 } while( false );282 #else283 275 // If we can't lock it retry 284 276 } while( !__atomic_try_acquire( &lanes.data[i].lock ) ); 285 #endif286 277 287 278 // Actually push it 288 279 push(lanes.data[i], thrd); 289 280 290 #if !defined(USE_MPSC)291 281 // Unlock and return 292 282 __atomic_unlock( &lanes.data[i].lock ); 293 #endif294 283 295 284 // Mark the current index in the tls rng instance as having an item … … 350 339 __cfadbg_print_safe(ready_queue, "Kernel : Pushing %p on cluster %p\n", thrd, cltr); 351 340 341 // #define USE_PREFERRED 342 #if !defined(USE_PREFERRED) 352 343 const bool external = (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr); 353 344 /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count ); 354 355 // write timestamp 356 thrd->link.ts = rdtscl(); 345 #else 346 unsigned preferred = thrd->preferred; 347 const bool external = (!kernelTLS().this_processor) || preferred == -1u || thrd->curr_cluster != cltr; 348 /* paranoid */ verifyf(external || preferred < lanes.count, "Invalid preferred queue %u for %u lanes", preferred, lanes.count ); 349 350 unsigned r = preferred % READYQ_SHARD_FACTOR; 351 const unsigned start = preferred - r; 352 #endif 357 353 358 354 // Try to pick a lane and lock it … … 368 364 } 369 365 else { 366 #if !defined(USE_PREFERRED) 370 367 processor * proc = kernelTLS().this_processor; 371 368 unsigned r = proc->rdq.its++; 372 369 i = proc->rdq.id + (r % READYQ_SHARD_FACTOR); 370 #else 371 i = start + (r++ % READYQ_SHARD_FACTOR); 372 #endif 373 373 } 374 375 376 #if defined(USE_MPSC)377 // mpsc always succeeds378 } while( false );379 #else380 374 // If we can't lock it retry 381 375 } while( !__atomic_try_acquire( &lanes.data[i].lock ) ); 382 #endif383 376 384 377 // Actually push it 385 378 push(lanes.data[i], thrd); 386 379 387 #if !defined(USE_MPSC)388 380 // Unlock and return 389 381 __atomic_unlock( &lanes.data[i].lock ); 390 #endif391 382 392 383 #if !defined(__CFA_NO_STATISTICS__) … … 492 483 lanes.tscs[w].tv = thrd->link.ts; 493 484 #endif 485 486 thrd->preferred = w; 494 487 495 488 // return the popped thread … … 519 512 // Check that all the intrusive queues in the data structure are still consistent 520 513 static void check( __ready_queue_t & q ) with (q) { 521 #if defined(__CFA_WITH_VERIFY__) && !defined(USE_MPSC)514 #if defined(__CFA_WITH_VERIFY__) 522 515 { 523 516 for( idx ; lanes.count ) { … … 525 518 assert(!lanes.data[idx].lock); 526 519 527 assert(head(sl)->link.prev == 0p ); 528 assert(head(sl)->link.next->link.prev == head(sl) ); 529 assert(tail(sl)->link.next == 0p ); 530 assert(tail(sl)->link.prev->link.next == tail(sl) ); 531 532 if(is_empty(sl)) { 533 assert(tail(sl)->link.prev == head(sl)); 534 assert(head(sl)->link.next == tail(sl)); 535 } else { 536 assert(tail(sl)->link.prev != head(sl)); 537 assert(head(sl)->link.next != tail(sl)); 538 } 520 if(is_empty(sl)) { 521 assert( sl.anchor.next == 0p ); 522 assert( sl.anchor.ts == 0 ); 523 assert( mock_head(sl) == sl.prev ); 524 } else { 525 assert( sl.anchor.next != 0p ); 526 assert( sl.anchor.ts != 0 ); 527 assert( mock_head(sl) != sl.prev ); 528 } 539 529 } 540 530 } … … 557 547 // fixes the list so that the pointers back to anchors aren't left dangling 558 548 static inline void fix(__intrusive_lane_t & ll) { 559 #if !defined(USE_MPSC) 560 // if the list is not empty then follow he pointer and fix its reverse 561 if(!is_empty(ll)) { 562 head(ll)->link.next->link.prev = head(ll); 563 tail(ll)->link.prev->link.next = tail(ll); 564 } 565 // Otherwise just reset the list 566 else { 567 verify(tail(ll)->link.next == 0p); 568 tail(ll)->link.prev = head(ll); 569 head(ll)->link.next = tail(ll); 570 verify(head(ll)->link.prev == 0p); 571 } 572 #endif 549 if(is_empty(ll)) { 550 verify(ll.anchor.next == 0p); 551 ll.prev = mock_head(ll); 552 } 573 553 } 574 554 -
libcfa/src/concurrency/ready_subqueue.hfa
r8cd34e9 r1716e1c 7 7 // Intrusives lanes which are used by the relaxed ready queue 8 8 struct __attribute__((aligned(128))) __intrusive_lane_t { 9 10 #if defined(USE_MPSC) 11 mpsc_queue($thread) queue; 12 __attribute__((aligned(128))) 13 #else 14 // anchor for the head and the tail of the queue 15 __attribute__((aligned(128))) struct __sentinel_t { 16 // Link lists fields 17 // instrusive link field for threads 18 // must be exactly as in $thread 19 __thread_desc_link link; 20 } before, after; 21 #endif 9 struct $thread * prev; 22 10 23 11 // spin lock protecting the queue 24 12 volatile bool lock; 25 13 26 // Optional statistic counters 27 #if !defined(__CFA_NO_SCHED_STATS__) 28 struct __attribute__((aligned(64))) { 29 // difference between number of push and pops 30 ssize_t diff; 31 32 // total number of pushes and pops 33 size_t push; 34 size_t pop ; 35 } stat; 36 #endif 14 __thread_desc_link anchor; 37 15 }; 38 16 39 void ?{}(__intrusive_lane_t & this);40 void ^?{}(__intrusive_lane_t & this);41 42 17 // Get the head pointer (one before the first element) from the anchor 43 static inline $thread * head(const __intrusive_lane_t & this) { 44 #if defined(USE_MPSC) 45 return this.queue.head; 46 #else 47 $thread * rhead = ($thread *)( 48 (uintptr_t)( &this.before ) - offsetof( $thread, link ) 49 ); 50 /* paranoid */ verify(rhead); 51 return rhead; 52 #endif 53 } 54 55 // Get the tail pointer (one after the last element) from the anchor 56 static inline $thread * tail(const __intrusive_lane_t & this) { 57 #if defined(USE_MPSC) 58 return this.queue.tail; 59 #else 60 $thread * rtail = ($thread *)( 61 (uintptr_t)( &this.after ) - offsetof( $thread, link ) 62 ); 63 /* paranoid */ verify(rtail); 64 return rtail; 65 #endif 18 static inline $thread * mock_head(const __intrusive_lane_t & this) { 19 $thread * rhead = ($thread *)( 20 (uintptr_t)( &this.anchor ) - __builtin_offsetof( $thread, link ) 21 ); 22 return rhead; 66 23 } 67 24 … … 69 26 void ?{}( __intrusive_lane_t & this ) { 70 27 this.lock = false; 28 this.prev = mock_head(this); 29 this.anchor.next = 0p; 30 this.anchor.ts = 0; 71 31 72 #if !defined(USE_MPSC) 73 this.before.link.prev = 0p; 74 this.before.link.next = tail(this); 75 this.before.link.ts = 0; 76 77 this.after .link.prev = head(this); 78 this.after .link.next = 0p; 79 this.after .link.ts = 0; 80 81 #if !defined(__CFA_NO_SCHED_STATS__) 82 this.stat.diff = 0; 83 this.stat.push = 0; 84 this.stat.pop = 0; 85 #endif 86 87 // We add a boat-load of assertions here because the anchor code is very fragile 88 /* paranoid */ verify(((uintptr_t)( head(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.before)); 89 /* paranoid */ verify(((uintptr_t)( tail(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.after )); 90 /* paranoid */ verify(head(this)->link.prev == 0p ); 91 /* paranoid */ verify(head(this)->link.next == tail(this) ); 92 /* paranoid */ verify(tail(this)->link.next == 0p ); 93 /* paranoid */ verify(tail(this)->link.prev == head(this) ); 94 /* paranoid */ verify(&head(this)->link.prev == &this.before.link.prev ); 95 /* paranoid */ verify(&head(this)->link.next == &this.before.link.next ); 96 /* paranoid */ verify(&tail(this)->link.prev == &this.after .link.prev ); 97 /* paranoid */ verify(&tail(this)->link.next == &this.after .link.next ); 98 /* paranoid */ verify(__alignof__(__intrusive_lane_t) == 128); 99 /* paranoid */ verify(__alignof__(this) == 128); 100 /* paranoid */ verifyf(((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128)); 101 #endif 32 // We add a boat-load of assertions here because the anchor code is very fragile 33 /* paranoid */ verify( offsetof( $thread, link ) == offsetof(__intrusive_lane_t, anchor) ); 34 /* paranoid */ verify( ((uintptr_t)( mock_head(this) ) + offsetof( $thread, link )) == (uintptr_t)(&this.anchor) ); 35 /* paranoid */ verify( &mock_head(this)->link.next == &this.anchor.next ); 36 /* paranoid */ verify( &mock_head(this)->link.ts == &this.anchor.ts ); 37 /* paranoid */ verify( mock_head(this)->link.next == 0p ); 38 /* paranoid */ verify( mock_head(this)->link.ts == 0 ); 39 /* paranoid */ verify( mock_head(this) == this.prev ); 40 /* paranoid */ verify( __alignof__(__intrusive_lane_t) == 128 ); 41 /* paranoid */ verify( __alignof__(this) == 128 ); 42 /* paranoid */ verifyf( ((intptr_t)(&this) % 128) == 0, "Expected address to be aligned %p %% 128 == %zd", &this, ((intptr_t)(&this) % 128) ); 102 43 } 103 44 104 45 // Dtor is trivial 105 46 void ^?{}( __intrusive_lane_t & this ) { 106 #if !defined(USE_MPSC) 107 // Make sure the list is empty 108 /* paranoid */ verify(head(this)->link.prev == 0p ); 109 /* paranoid */ verify(head(this)->link.next == tail(this) ); 110 /* paranoid */ verify(tail(this)->link.next == 0p ); 111 /* paranoid */ verify(tail(this)->link.prev == head(this) ); 112 #endif 47 // Make sure the list is empty 48 /* paranoid */ verify( this.anchor.next == 0p ); 49 /* paranoid */ verify( this.anchor.ts == 0 ); 50 /* paranoid */ verify( mock_head(this) == this.prev ); 113 51 } 114 52 115 53 // Push a thread onto this lane 116 54 // returns true of lane was empty before push, false otherwise 117 bool push(__intrusive_lane_t & this, $thread * node) {118 #if defined(USE_MPSC)119 inline $thread * volatile & ?`next ( $thread * this ) __attribute__((const)) {120 return this->link.next;121 }122 push(this.queue, node);123 #else124 #if defined(__CFA_WITH_VERIFY__)125 /* paranoid */ verify(this.lock);126 /* paranoid */ verify(node->link.ts != 0);127 /* paranoid */ verify(node->link.next == 0p);128 /* paranoid */ verify(node->link.prev == 0p);129 /* paranoid */ verify(tail(this)->link.next == 0p);130 /* paranoid */ verify(head(this)->link.prev == 0p);55 void push( __intrusive_lane_t & this, $thread * node ) { 56 /* paranoid */ verify( node->link.next == 0p ); 57 /* paranoid */ verify( node->link.ts == 0 ); 58 /* paranoid */ verify( this.prev->link.next == 0p ); 59 /* paranoid */ verify( this.prev->link.ts == 0 ); 60 if( this.anchor.next == 0p ) { 61 /* paranoid */ verify( this.anchor.next == 0p ); 62 /* paranoid */ verify( this.anchor.ts == 0 ); 63 /* paranoid */ verify( this.prev == mock_head( this ) ); 64 } else { 65 /* paranoid */ verify( this.anchor.next != 0p ); 66 /* paranoid */ verify( this.anchor.ts != 0 ); 67 /* paranoid */ verify( this.prev != mock_head( this ) ); 68 } 131 69 132 if(this.before.link.ts == 0l) { 133 /* paranoid */ verify(tail(this)->link.prev == head(this)); 134 /* paranoid */ verify(head(this)->link.next == tail(this)); 135 } else { 136 /* paranoid */ verify(tail(this)->link.prev != head(this)); 137 /* paranoid */ verify(head(this)->link.next != tail(this)); 138 } 139 #endif 140 141 // Get the relevant nodes locally 142 $thread * tail = tail(this); 143 $thread * prev = tail->link.prev; 144 145 // Do the push 146 node->link.next = tail; 147 node->link.prev = prev; 148 prev->link.next = node; 149 tail->link.prev = node; 150 151 // Update stats 152 #if !defined(__CFA_NO_SCHED_STATS__) 153 this.stat.diff++; 154 this.stat.push++; 155 #endif 156 157 verify(node->link.next == tail(this)); 158 159 // Check if the queue used to be empty 160 if(this.before.link.ts == 0l) { 161 this.before.link.ts = node->link.ts; 162 /* paranoid */ verify(node->link.prev == head(this)); 163 return true; 164 } 165 return false; 166 #endif 70 // Get the relevant nodes locally 71 this.prev->link.next = node; 72 this.prev->link.ts = rdtscl(); 73 this.prev = node; 167 74 } 168 75 … … 170 77 // returns popped 171 78 // returns true of lane was empty before push, false otherwise 172 $thread * pop(__intrusive_lane_t & this) { 173 /* paranoid */ verify(this.lock); 174 #if defined(USE_MPSC) 175 inline $thread * volatile & ?`next ( $thread * this ) __attribute__((const)) { 176 return this->link.next; 177 } 178 return pop(this.queue); 179 #else 180 /* paranoid */ verify(this.before.link.ts != 0ul); 79 $thread * pop( __intrusive_lane_t & this ) { 80 /* paranoid */ verify( this.anchor.next != 0p ); 81 /* paranoid */ verify( this.anchor.ts != 0 ); 181 82 182 // Get anchors locally 183 $thread * head = head(this); 184 $thread * tail = tail(this); 83 // Get the relevant nodes locally 84 $thread * node = this.anchor.next; 85 this.anchor.next = node->link.next; 86 this.anchor.ts = node->link.ts; 87 bool is_empty = this.anchor.ts == 0; 88 node->link.next = 0p; 89 node->link.ts = 0; 185 90 186 // Get the relevant nodes locally 187 $thread * node = head->link.next; 188 $thread * next = node->link.next; 91 // Update head time stamp 92 if(is_empty) this.prev = mock_head( this ); 189 93 190 /* paranoid */ verify(node != tail); 191 /* paranoid */ verify(node); 192 193 // Do the pop 194 head->link.next = next; 195 next->link.prev = head; 196 node->link.next = 0p; 197 node->link.prev = 0p; 198 199 // Update head time stamp 200 this.before.link.ts = next->link.ts; 201 202 // Update stats 203 #ifndef __CFA_NO_SCHED_STATS__ 204 this.stat.diff--; 205 this.stat.pop ++; 206 #endif 207 208 // Check if we emptied list and return accordingly 209 /* paranoid */ verify(tail(this)->link.next == 0p); 210 /* paranoid */ verify(head(this)->link.prev == 0p); 211 if(next == tail) { 212 /* paranoid */ verify(this.before.link.ts == 0); 213 /* paranoid */ verify(tail(this)->link.prev == head(this)); 214 /* paranoid */ verify(head(this)->link.next == tail(this)); 215 return node; 216 } 217 else { 218 /* paranoid */ verify(next->link.ts != 0); 219 /* paranoid */ verify(tail(this)->link.prev != head(this)); 220 /* paranoid */ verify(head(this)->link.next != tail(this)); 221 /* paranoid */ verify(this.before.link.ts != 0); 222 return node; 223 } 224 #endif 94 /* paranoid */ verify( node->link.next == 0p ); 95 /* paranoid */ verify( node->link.ts == 0 ); 96 return node; 225 97 } 226 98 227 99 // Check whether or not list is empty 228 100 static inline bool is_empty(__intrusive_lane_t & this) { 229 #if defined(USE_MPSC) 230 return this.queue.head == 0p; 231 #else 232 // Cannot verify here since it may not be locked 233 return this.before.link.ts == 0; 234 #endif 101 return this.anchor.ts == 0; 235 102 } 236 103 237 104 // Return the timestamp 238 105 static inline unsigned long long ts(__intrusive_lane_t & this) { 239 #if defined(USE_MPSC) 240 $thread * tl = this.queue.head; 241 if(!tl) return -1ull; 242 return tl->link.ts; 243 #else 244 // Cannot verify here since it may not be locked 245 return this.before.link.ts; 246 #endif 106 // Cannot verify here since it may not be locked 107 return this.anchor.ts; 247 108 } 248 109 -
libcfa/src/concurrency/thread.cfa
r8cd34e9 r1716e1c 38 38 curr_cluster = &cl; 39 39 link.next = 0p; 40 link. prev = 0p;41 link.preferred = -1u;40 link.ts = 0; 41 preferred = -1u; 42 42 last_proc = 0p; 43 43 #if defined( __CFA_WITH_VERIFY__ ) -
libcfa/src/containers/array.hfa
r8cd34e9 r1716e1c 21 21 }; 22 22 23 Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, ptrdiff_t i ) { 23 // About the choice of integral types offered as subscript overloads: 24 // Intent is to cover these use cases: 25 // float foo( ptrdiff_t i ) { return a[i]; } // i : ptrdiff_t 26 // forall( [N] ) ... for( i; N ) { total += a[i]; } // i : typeof( sizeof(42) ) 27 // for( i; 5 ) { total += a[i]; } // i : int 28 // It gets complicated by: 29 // - CFA does overloading on concrete types, like int and unsigned int, not on typedefed 30 // types like size_t. So trying to overload on ptrdiff_t vs int works in 64-bit mode 31 // but not in 32-bit mode. 32 // - Given bug of Trac #247, CFA gives sizeof expressions type unsigned long int, when it 33 // should give them type size_t. 34 // 35 // gcc -m32 cfa -m32 given bug gcc -m64 36 // ptrdiff_t int int long int 37 // size_t unsigned int unsigned int unsigned long int 38 // typeof( sizeof(42) ) unsigned int unsigned long int unsigned long int 39 // int int int int 40 41 static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, int i ) { 24 42 return (Timmed &) a.strides[i]; 25 43 } 26 44 27 Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a,int i ) {45 static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned int i ) { 28 46 return (Timmed &) a.strides[i]; 29 47 } 30 48 31 Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, size_t i ) {49 static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, long int i ) { 32 50 return (Timmed &) a.strides[i]; 33 51 } 34 52 35 size_t ?`len( arpk(N, S, Timmed, Tbase) & a ) { 53 static inline Timmed & ?[?]( arpk(N, S, Timmed, Tbase) & a, unsigned long int i ) { 54 return (Timmed &) a.strides[i]; 55 } 56 57 static inline size_t ?`len( arpk(N, S, Timmed, Tbase) & a ) { 36 58 return z(N); 37 59 } 38 60 39 61 // workaround #226 (and array relevance thereof demonstrated in mike102/otype-slow-ndims.cfa) 40 void ?{}( arpk(N, S, Timmed, Tbase) & this ) {62 static inline void ?{}( arpk(N, S, Timmed, Tbase) & this ) { 41 63 void ?{}( S (&inner)[z(N)] ) {} 42 64 ?{}(this.strides); 43 65 } 44 void ^?{}( arpk(N, S, Timmed, Tbase) & this ) {66 static inline void ^?{}( arpk(N, S, Timmed, Tbase) & this ) { 45 67 void ^?{}( S (&inner)[z(N)] ) {} 46 68 ^?{}(this.strides); … … 53 75 54 76 forall( Te ) 55 Te mkar_( tag(Te) ) {}77 static inline Te mkar_( tag(Te) ) {} 56 78 57 79 forall( [N], ZTags ... , Trslt &, Tatom & | { Trslt mkar_( tag(Tatom), ZTags ); } ) 58 arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {}80 static inline arpk(N, Trslt, Trslt, Tatom) mkar_( tag(Tatom), tag(N), ZTags ) {} 59 81 60 82 // based on https://stackoverflow.com/questions/1872220/is-it-possible-to-iterate-over-arguments-in-variadic-macros … … 90 112 91 113 forall( TA &, TB &, TC &, IxAB, IxBC ... | { TB & ?[?]( TA &, IxAB ); TC & ?[?]( TB &, IxBC ); } ) 92 TC & ?[?]( TA & this, IxAB ab, IxBC bc ) {114 static inline TC & ?[?]( TA & this, IxAB ab, IxBC bc ) { 93 115 return this[ab][bc]; 94 116 } … … 99 121 100 122 forall( TA &, TB &, TC &, IxAB_0, IxBC | { TB & ?[?]( TA &, IxAB_0 ); TC & ?[?]( TB &, IxBC ); } ) 101 TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) {123 static inline TC & ?[?]( TA & this, IxAB_0 ab, IxBC bc ) { 102 124 return this[ab][bc]; 103 125 } 104 126 105 127 forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1 ); TC & ?[?]( TB &, IxBC ); } ) 106 TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) {128 static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxBC bc ) { 107 129 return this[[ab0,ab1]][bc]; 108 130 } 109 131 110 132 forall( TA &, TB &, TC &, IxAB_0, IxAB_1, IxAB_2, IxBC | { TB & ?[?]( TA &, IxAB_0, IxAB_1, IxAB_2 ); TC & ?[?]( TB &, IxBC ); } ) 111 TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) {133 static inline TC & ?[?]( TA & this, IxAB_0 ab0, IxAB_1 ab1, IxAB_2 ab2, IxBC bc ) { 112 134 return this[[ab0,ab1,ab2]][bc]; 113 135 } … … 121 143 // Base 122 144 forall( [Nq], [Sq], Tbase & ) 123 tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {}145 static inline tag(arpk(Nq, Sq, Tbase, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(Tbase) ) {} 124 146 125 147 // Rec 126 148 forall( [Nq], [Sq], [N], [S], recq &, recr &, Tbase & | { tag(recr) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(recq) ); } ) 127 tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {}149 static inline tag(arpk(N, S, recr, Tbase)) enq_( tag(Tbase), tag(Nq), tag(Sq), tag(arpk(N, S, recq, Tbase)) ) {} 128 150 129 151 // Wrapper 130 152 struct all_t {} all; 131 153 forall( [N], [S], Te &, result &, Tbase & | { tag(result) enq_( tag(Tbase), tag(N), tag(S), tag(Te) ); } ) 132 result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) {154 static inline result & ?[?]( arpk(N, S, Te, Tbase) & this, all_t ) { 133 155 return (result&) this; 134 156 }
Note: See TracChangeset
for help on using the changeset viewer.