Changes in / [210c737:4852232]
- Files:
-
- 11 deleted
- 8 edited
-
doc/theses/colby_parsons_MMAth/local.bib (modified) (2 diffs)
-
doc/theses/colby_parsons_MMAth/text/actors.tex (modified) (2 diffs)
-
doc/theses/colby_parsons_MMAth/text/channels.tex (modified) (1 diff)
-
doc/theses/colby_parsons_MMAth/text/frontpgs.tex (modified) (1 diff)
-
doc/theses/colby_parsons_MMAth/text/mutex_stmt.tex (modified) (3 diffs)
-
doc/theses/colby_parsons_MMAth/text/waituntil.tex (modified) (35 diffs)
-
doc/theses/fangren_yu_MMath/background.tex (deleted)
-
doc/theses/fangren_yu_MMath/benchmarks.tex (deleted)
-
doc/theses/fangren_yu_MMath/conclusion.tex (deleted)
-
doc/theses/fangren_yu_MMath/content1.tex (deleted)
-
doc/theses/fangren_yu_MMath/content2.tex (deleted)
-
doc/theses/fangren_yu_MMath/glossary.tex (deleted)
-
doc/theses/fangren_yu_MMath/intro.tex (deleted)
-
doc/theses/fangren_yu_MMath/performance.tex (deleted)
-
doc/theses/fangren_yu_MMath/uw-ethesis-frontpgs.tex (deleted)
-
doc/theses/fangren_yu_MMath/uw-ethesis.bib (deleted)
-
doc/theses/fangren_yu_MMath/uw-ethesis.tex (deleted)
-
libcfa/src/heap.cfa (modified) (8 diffs)
-
libcfa/src/iostream.cfa (modified) (2 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/colby_parsons_MMAth/local.bib
r210c737 r4852232 108 108 } 109 109 110 @misc{java:allof:anyof,111 author = "Java Util Concurrent",112 title = "Class CompletableFuture",113 howpublished = {\url{https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html}},114 note = "[Online; accessed 23-May-2023]"115 }116 117 110 @misc{ocaml:channel, 118 111 author = "The OCaml Manual", … … 209 202 version = {Version 080}, 210 203 organization= {Intel}, 211 month = mar ,204 month = march, 212 205 year = 2023, 213 206 } -
doc/theses/colby_parsons_MMAth/text/actors.tex
r210c737 r4852232 1365 1365 \end{figure} 1366 1366 1367 \begin{figure}1368 \centering1369 \subfloat[AMD \CFA Matrix Benchmark]{1370 \resizebox{0.5\textwidth}{!}{\input{figures/nasusCFAMatrix.pgf}}1371 \label{f:cfaMatrixAMD}1372 }1373 \subfloat[Intel \CFA Matrix Benchmark]{1374 \resizebox{0.5\textwidth}{!}{\input{figures/pykeCFAMatrix.pgf}}1375 \label{f:cfaMatrixIntel}1376 }1377 \caption{The matrix benchmark comparing \CFA stealing heuristics (lower is better).}1378 \end{figure}1379 1380 1367 Figures~\ref{f:cfaRepeatAMD} and~\ref{f:cfaRepeatIntel} show the effects of the stealing heuristics for the repeat benchmark. 1381 1368 This benchmark is a pathological case for work stealing actor systems, as the majority of work is being performed by the single actor conducting the scatter/gather. … … 1402 1389 Hence, it is difficult to attribute the AMD gain to the aggressive work stealing in CAF. 1403 1390 1391 \begin{figure} 1392 \centering 1393 \subfloat[AMD \CFA Matrix Benchmark]{ 1394 \resizebox{0.5\textwidth}{!}{\input{figures/nasusCFAMatrix.pgf}} 1395 \label{f:cfaMatrixAMD} 1396 } 1397 \subfloat[Intel \CFA Matrix Benchmark]{ 1398 \resizebox{0.5\textwidth}{!}{\input{figures/pykeCFAMatrix.pgf}} 1399 \label{f:cfaMatrixIntel} 1400 } 1401 \caption{The matrix benchmark comparing \CFA stealing heuristics (lower is better).} 1402 \end{figure} 1403 1404 1404 % Local Variables: % 1405 1405 % tab-width: 4 % -
doc/theses/colby_parsons_MMAth/text/channels.tex
r210c737 r4852232 196 196 197 197 \section{\CFA / Go channel Examples} 198 To highlight the differences between \CFA's and Go's close semantics, t woexamples are presented.198 To highlight the differences between \CFA's and Go's close semantics, three examples are presented. 199 199 The first example is a simple shutdown case, where there are producer threads and consumer threads operating on a channel for a fixed duration. 200 200 Once the duration ends, producers and consumers terminate immediately leaving unprocessed elements in the channel. -
doc/theses/colby_parsons_MMAth/text/frontpgs.tex
r210c737 r4852232 80 80 Concurrent programs are notoriously hard to program and even harder to debug. Furthermore concurrent programs must be performant, as the introduction of concurrency into a program is often done to achieve some form of speedup. 81 81 82 This thesis presents a suite of high-level concurrent-language features in the new programming language \CFA, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs.82 This thesis presents a suite of high-level concurrent-language features in the new programming-language \CFA, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs. 83 83 \CFA is a non-object-oriented programming language that extends C. 84 84 The foundation for concurrency in \CFA was laid by Thierry Delisle~\cite{Delisle18}, who implemented coroutines, user-level threads, and monitors. -
doc/theses/colby_parsons_MMAth/text/mutex_stmt.tex
r210c737 r4852232 350 350 351 351 The benchmark used to evaluate the avoidance algorithms repeatedly acquires a fixed number of locks in a random order and then releases them. 352 The pseudocode for the deadlock avoidance benchmark is shown in \VRef[ Figure]{l:deadlock_avoid_pseudo}.352 The pseudocode for the deadlock avoidance benchmark is shown in \VRef[Listing]{l:deadlock_avoid_pseudo}. 353 353 To ensure the comparison exercises the implementation of each lock avoidance algorithm, an identical spinlock is implemented in each language using a set of builtin atomics available in both \CC and \CFA. 354 354 The benchmarks are run for a fixed duration of 10 seconds and then terminate. … … 357 357 The median is calculated and is plotted alongside the 95\% confidence intervals for each point. 358 358 359 \begin{ figure}360 \begin{cfa} 359 \begin{cfa}[caption={Deadlock avoidance benchmark pseudocode},label={l:deadlock_avoid_pseudo}] 360 361 361 size_t n_locks; $\C{// number of locks}$ 362 362 size_t n_thds; $\C{// number of threads}$ … … 387 387 printf( "%lu\n", total ); 388 388 } 389 \end{cfa} 390 \caption{Deadlock avoidance benchmark pseudocode} 391 \label{l:deadlock_avoid_pseudo} 392 \end{figure} 389 390 \end{cfa} 393 391 394 392 The performance experiments were run on the following multi-core hardware systems to determine differences across platforms: -
doc/theses/colby_parsons_MMAth/text/waituntil.tex
r210c737 r4852232 123 123 124 124 \begin{figure} 125 \begin{lstlisting}[language=ada,literate= ,{moredelim={**[is][\color{red}]{@}{@}}}]126 task type buffer is -- thread type125 \begin{lstlisting}[language=ada,literate=] 126 task type buffer is -- thread 127 127 ... -- buffer declarations 128 128 count : integer := 0; … … 130 130 loop 131 131 select 132 @when count < Size@=> -- guard133 @accept insert( elem : in ElemType )@do -- method132 when count < Size => -- guard 133 accept insert( elem : in ElemType ) do -- method 134 134 ... -- add to buffer 135 135 count := count + 1; … … 137 137 -- executed if this accept called 138 138 or 139 @when count > 0@=> -- guard140 @accept remove( elem : out ElemType )@do -- method139 when count > 0 => -- guard 140 accept remove( elem : out ElemType ) do -- method 141 141 ... --remove and return from buffer via parameter 142 142 count := count - 1; 143 143 end; 144 144 -- executed if this accept called 145 or @delay 10.0@; -- unblock after 10 seconds without call146 or @else@-- do not block, cannot appear with delay145 or delay 10.0; -- unblock after 10 seconds without call 146 or else -- do not block, cannot appear with delay 147 147 end select; 148 148 end loop; … … 222 222 223 223 \begin{lrbox}{\myboxB} 224 \begin{lstlisting}[language=uC++ ,{moredelim={**[is][\color{red}]{@}{@}}}]224 \begin{lstlisting}[language=uC++=] 225 225 _Task BoundedBuffer { 226 226 int * buffer; … … 243 243 for ( ;; ) { 244 244 _Accept( ~buffer ) break; 245 @or _When ( count < Size ) _Accept( insert )@;246 @or _When ( count > 0 ) _Accept( remove )@;245 or _When ( count < Size ) _Accept( insert ); 246 or _When ( count > 0 ) _Accept( remove ); 247 247 } 248 248 } … … 267 267 Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements. 268 268 269 There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof},OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.269 There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. 270 270 Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources. 271 271 … … 276 276 Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour. 277 277 While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. 278 Consider the case where a thread has a single source of communication and it wants a set of $N$resources.279 It must sequentially request the $N$resources and wait for each response.280 During the receives for the $N$resources, it can receive other communication, and has to save and postpone these communications, or discard them.278 Consider the case where a thread has a single source of communication and it wants a set of @N@ resources. 279 It must sequentially request the @N@ resources and wait for each response. 280 During the receives for the @N@ resources, it can receive other communication, and has to save and postpone these communications, or discard them. 281 281 % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. 282 282 … … 320 320 int i = 0; 321 321 322 waituntil( @Lock@) { ... }323 or when( i == 0 ) waituntil( i << @Channel@) { ... }324 and waituntil( @Future@) { ... }325 or waituntil( @timeout( 1`s )@) { ... }322 waituntil( Lock ) { ... } 323 or when( i == 0 ) waituntil( i << Channel ) { ... } 324 and waituntil( Future ) { ... } 325 or waituntil( timeout( 1`s ) ) { ... } 326 326 // else { ... } 327 327 \end{cfa} … … 342 342 \begin{cfa} 343 343 future(int) bar, foo; 344 waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo345 \end{cfa} 346 The reason for this semantics is that prioritizing resources can be useful in certain problems , such as shutdown.347 In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form , alternating which resource has the highest priority:344 waituntil( foo ) { ... } or waituntil( bar ) { ... } 345 \end{cfa} 346 The reason for this semantics is that prioritizing resources can be useful in certain problems. 347 In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form: 348 348 \begin{cfa} 349 349 waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo 350 350 waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar 351 351 \end{cfa} 352 While this approach is not general for many resources, it handles many basic cases.353 352 354 353 The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}. 355 354 When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}. 356 When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied.355 When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now true. 357 356 This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator. 358 357 If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements. … … 416 415 \section{\lstinline{waituntil} Implementation} 417 416 418 The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm.419 Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives.420 The following sections use examples to fill in complexitydetails missing in Figure~\ref{f:WU_Impl}.421 After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}.417 The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} only shows the basic outline of the @waituntil@ algorithm. 418 The complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives. 419 The following sections then use examples to fill in details missing in Figure~\ref{f:WU_Impl}. 420 The full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}. 422 421 423 422 \begin{figure} … … 478 477 When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked. 479 478 Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released. 480 Immediately releasing the lock after the code blockprevents the waiting thread from holding multiple locks and potentially introducing a deadlock.479 Immediately releasing the lock prevents the waiting thread from holding multiple locks and potentially introducing a deadlock. 481 480 As such, the only unregistered nodes associated with locks are the ones that have not run. 482 481 483 482 \subsection{Timeouts} 484 483 485 A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg:484 A timeout for the @waituntil@ statement is a duration passed to \lstinline[deletekeywords={timeout}]{timeout}, \eg: 486 485 \begin{cquote} 487 486 \begin{tabular}{@{}l|l@{}} … … 509 508 Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers. 510 509 Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding. 511 In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, resp ectively.510 In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respctively. 512 511 \begin{cfa}[deletekeywords={timeout}] 513 512 waituntil( i << C1 ); and waituntil( timeout( 1`s ) ); 514 513 or waituntil( i << C2 ); and waituntil( timeout( 3`s ) ); 515 514 \end{cfa} 516 If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second oc curs before 3 seconds.515 If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second ocurs before 3 seconds. 517 516 Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics. 518 517 … … 524 523 525 524 Channels require more complexity to allow synchronous multiplexing. 526 For locks, when an outside thread releases a lock and unblocks the @waituntil@thread (WUT), the lock's MX property is passed to the WUT (no spinning locks).525 For locks, when an outside thread releases a lock and unblocks the waituntil thread (WUT), the lock's MX property is passed to the WUT (no spinning locks). 527 526 For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs. 528 In either case, after the WUT unblocks , it is safe to execute itscorresponding code block knowing access to the resource is protected by the lock or the read-only state of the future.527 In either case, after the WUT unblocks it is safe to execute its the corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future. 529 528 Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT. 530 529 However, for channels, there is a race condition that poses an issue. … … 537 536 Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg: 538 537 \begin{cfa} 539 int i;540 538 waituntil( i << A ) {} and waituntil( i << B ) {} 541 539 or waituntil( i << C ) {} and waituntil( i << D ) {} 542 540 \end{cfa} 543 541 If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@. 544 However, four outside threads inserting into each channelcan simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks.542 However, four outside threads can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks. 545 543 This case introduces a race with complexity that increases with the size of the @waituntil@ statement. 546 544 However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses. 547 545 This approach is a poor solution for two reasons. 548 It is possible that once all the locks are acquired ,the subtree is not satisfied and the locks must be released.546 It is possible that once all the locks are acquired the subtree is not satisfied and the locks must be released. 549 547 This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. 550 548 Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. 551 549 As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. 552 Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics.553 Given aliasing in C, it is impossible to even warn of the potential race.554 In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause.555 550 556 551 It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. … … 558 553 \begin{cquote} 559 554 \begin{tabular}{@{}l|l|l@{}} 560 \multicolumn{3}{@{}l@{}}{\lstinline{channel (int)A, B; // zero size channels}} \\555 \multicolumn{3}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\ 561 556 thread 1 & thread 2 & thread 3 \\ 562 557 \begin{cfa} … … 587 582 Channels introduce another interesting implementation issue. 588 583 Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause. 589 This conjunctionposes a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.584 This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. 590 585 Consider the following example, alongside a described ordering of events to highlight the race. 591 586 \begin{cquote} 592 587 \begin{tabular}{@{}l|l@{}} 593 \multicolumn{2}{@{}l@{}}{\lstinline{channel (int)A, B; // zero size channels}} \\588 \multicolumn{2}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\ 594 589 thread 1 & thread 2 \\ 595 590 \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] … … 604 599 \end{tabular} 605 600 \end{cquote} 606 Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@. 607 Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@. 608 Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@. 609 Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@. 601 Assume thread 1 executes first, registers with channel @A@ and proceeds, since it is empty, and then is interrupted before registering with @B@. 602 Thread 2 similarly registers with channel @B@, and proceeds, since it does not have space to insert, and then is interrupted before registering with @A@. 610 603 At this point, thread 1 and 2 resume execution. 611 Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement. 612 The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur. 613 If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error. 614 Hence, there is a race on two fronts. 615 If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second. 616 Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second. 617 Any other execution scenario is incorrect for exclusive-or semantics. 618 Note, that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread. 604 There is now a race that must be dealt with on two fronts. 605 If thread 1 and 2 only race to the \gls{cas}, \ie a clause in their own @waituntil@, thread 1 can think that it successfully removed from @B@, and thread 2 may think it successfully inserted into @A@, when only one of these operations occurs. 619 606 620 607 The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels. 621 608 This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time. 622 609 However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic. 623 Not all types in a @waituntil@ have an internal lock, and when using non-channel types ,acquiring all the locks incurs extra unneeded overhead.610 Not all types in a @waituntil@ have an internal lock, and when using non-channel types acquiring all the locks incurs extra unneeded overhead. 624 611 Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. 625 612 This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending. … … 639 626 640 627 // Try to set other status, if we succeed break and return true 641 while( ! CAS( other.clause_status, &cmp_status, SAT ) ) {628 while( !CAS( other.clause_status, &cmp_status, SAT ) ) { 642 629 if ( cmp_status == SAT ) 643 630 return false; // If other status is SAT we lost so return false … … 650 637 651 638 // Attempt to set own status flag back to PENDING to retry 652 if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) )639 if ( !CAS( mine.clause_status, &cmp_status, PENDING ) ) 653 640 return false; // If we fail then we lost so return false 654 641 … … 666 653 These exception mechanisms are supported through the @on_selected@ routine. 667 654 This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown. 668 Hence, the channel close-down mechanism is handled correctly.669 655 670 656 \subsection{Guards and Statement Predicate}\label{s:wu_guards} … … 672 658 It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished. 673 659 In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult. 674 Consider the following @waituntil@. 660 Consider the @waituntil@ in Figure~\ref{f:WU_ComplexPredicate}. 661 When the @waituntil@ thread wakes up, checking if the statement is complete is non-trivial. 662 The predicate that will return if the statement in Figure~\ref{f:WU_ComplexPredicate} is satisfied is the following. 663 \begin{cfa} 664 A && B || C || !GA && B || !GB && A || !GA && !GB && !GC 665 \end{cfa} 666 Which simplifies to: 667 \begin{cfa} 668 ( A || !GA ) && ( B || !GB ) || C || !GA && !GB && !GC 669 \end{cfa} 670 Checking a predicate this large with each iteration is expensive so \uC and \CFA both take steps to simplify checking statement completion. 671 672 \begin{figure} 675 673 \begin{cfa} 676 674 when( GA ) waituntil( A ) {} … … 678 676 or when( GC ) waituntil( C ) {} 679 677 \end{cfa} 680 When the @waituntil@ thread wakes up, the following predicate represents satisfaction: 681 \begin{cfa} 682 A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC 683 \end{cfa} 684 which can be simplified to: 685 \begin{cfa} 686 ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC 687 \end{cfa} 688 Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion. 678 \caption{\lstinline{waituntil} with a non-trivial predicate} 679 \label{f:WU_ComplexPredicate} 680 \end{figure} 689 681 690 682 In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. 691 A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. 692 Each internal node stores the statuses of the two subtrees beneath it. 683 The internal nodes also store the statuses of the two subtrees beneath them. 693 684 When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement. 694 685 Once the root of the tree has both subtrees marked as @true@ then the statement is complete. 695 686 As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again. 696 687 To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation. 688 An diagram of the tree for the statement in Figure~\ref{f:WU_ComplexPredicate} is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. 697 689 698 690 \begin{figure} … … 700 692 \input{diagrams/uCpp_select_tree.tikz} 701 693 \end{center} 702 \caption{\uC \lstinline[language=uC++]{select}tree modification}694 \caption{\uC select tree modification} 703 695 \label{f:uC_select_tree} 704 696 \end{figure} 705 697 706 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@.707 The waiting condition of the @waituntil@ statement is implemented as the completepredicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.708 This complete predicate is used as the mechanism to check if a thread is done waiting on a@waituntil@.698 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate. 699 The waiting condition of the @waituntil@ statement can be represented as a predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise. 700 In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the @waituntil@. 709 701 Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise. 710 702 To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. 711 703 The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values. 712 For example, the following is generated for the complete predicate above:704 For example, the following would be generated for the @waituntil@ shown in Figure~\ref{f:WU_ComplexPredicate}. 713 705 \begin{cfa} 714 706 // statement completion predicate … … 716 708 return nodes[0].status && nodes[1].status || nodes[2].status; 717 709 } 718 if ( GA || GB || GC ) { $\C{// skip statement if all guards false}$ 710 711 // skip statement if all guards false 712 if ( GA || GB || GC ) { 719 713 select_node nodes[3]; 720 nodes[0].status = ! GA && GB; $\C{// A's status}$ 721 nodes[1].status = ! GB && GA; $\C{// B's status}$ 722 nodes[2].status = ! GC; $\C{// C's status}$ 714 nodes[0].status = !GA && GB; // A's status 715 nodes[1].status = !GB && GA; // B's status 716 nodes[2].status = !GC; // C's status 717 723 718 // ... rest of waituntil codegen ... 719 724 720 } 725 721 \end{cfa} … … 727 723 \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses. 728 724 In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied. 725 729 726 \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}] 730 727 Future_ISM<int> A, B, C, D; … … 732 729 and _Select( @D && E@ ) { ... } 733 730 \end{lstlisting} 734 This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both}resources are available.731 This is more expressive that the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after both resources are available. 735 732 736 733 In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons. … … 741 738 or waituntil( C && D ) { ... } 742 739 \end{cfa} 743 If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold -and-wait situation.740 If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold and wait situation. 744 741 Other semantics are needed to ensure this operation is safe. 745 742 One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance}; … … 747 744 Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally. 748 745 One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. 749 This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first. 750 Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems. 751 Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels. 752 It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run. 753 However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. 754 Ultimately, I did not deemed this feature to be crucial enough when taking into account the complexity and runtime cost. 746 This approach does not work due to \gls{toctou} issues; 747 it is impossible to ensure that the full set of resources are available without holding them all first. 748 Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost need paid to handle this situation. 749 The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels. 750 It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. 755 751 756 752 \subsection{The full \lstinline{waituntil} picture} 757 753 758 Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure\ref{f:WU_Full_Impl}.754 Now the details have been discussed, the full pseudocode of the @waituntil@ is presented in Figure~\ref{f:WU_Full_Impl}. 759 755 Some things to note are as follows. 760 756 The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}. … … 804 800 \end{figure} 805 801 806 \section{ \lstinline{waituntil}Performance}802 \section{Waituntil Performance} 807 803 808 804 Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml. 809 805 However, these facilities are either not meaningful or feasible to benchmark against. 810 806 The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space. 811 Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement.807 Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operates on method calls, which is done in \CFA via the @waitfor@ statement, so it is not meaningful to benchmark against the @waituntil@, which cannot wait on this resource. 812 808 Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach. 813 809 OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@. … … 821 817 The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels. 822 818 The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads. 823 The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that i twaits on.819 The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that is waits on. 824 820 Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels. 825 821 For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is: … … 900 896 Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase. 901 897 In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel. 902 This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs.898 This scalability difference is more significant on the Intel machine than the AMD machine since the Intel machine has lower cache contention costs. 903 899 904 900 The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks. … … 932 928 In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired. 933 929 Interesting, this case may not be as pathological as it seems. 934 If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel.930 If the set of channels belonging to a select have channels that overlap with the set of another select, these statements lose the ability to operate in parallel. 935 931 The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism. 936 932 Comparison of this pathological case is shown in Table~\ref{t:pathGo}. 937 The AMD results highlight the worst -case scenario for Go since contention is more costly on this machine than the Intel machine.933 The AMD results highlight the worst case scenario for Go since contention is more costly on this machine than the Intel machine. 938 934 939 935 \begin{table}[t] … … 966 962 The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate. 967 963 964 \begin{figure} 965 \centering 966 \subfloat[AMD Future Synchronization Benchmark]{ 967 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}} 968 \label{f:futureAMD} 969 } 970 \subfloat[Intel Future Synchronization Benchmark]{ 971 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}} 972 \label{f:futureIntel} 973 } 974 \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).} 975 \caption{} 976 \label{f:futurePerf} 977 \end{figure} 978 968 979 This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements. 969 980 The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin. … … 1004 1015 1005 1016 Results of this benchmark are shown in Figure~\ref{f:futurePerf}. 1006 Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation. 1017 Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation.. 1007 1018 In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation. 1008 1019 However, the bars for both systems have similar height patterns across the experiments. … … 1011 1022 Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel. 1012 1023 Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates. 1013 1014 \begin{figure}1015 \centering1016 \subfloat[AMD Future Synchronization Benchmark]{1017 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}1018 \label{f:futureAMD}1019 }1020 \subfloat[Intel Future Synchronization Benchmark]{1021 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}1022 \label{f:futureIntel}1023 }1024 \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}1025 \label{f:futurePerf}1026 %%%%%%%%%%%%%%%1027 \vspace*{5.5in}1028 \end{figure} -
libcfa/src/heap.cfa
r210c737 r4852232 10 10 // Created On : Tue Dec 19 21:58:35 2017 11 11 // Last Modified By : Peter A. Buhr 12 // Last Modified On : Fri Jul 28 18:27:53 202313 // Update Count : 16 1212 // Last Modified On : Fri Dec 30 08:37:37 2022 13 // Update Count : 1605 14 14 // 15 15 … … 369 369 static __thread size_t PAD1 CALIGN TLSMODEL __attribute__(( unused )); // protect false sharing 370 370 static __thread Heap * heapManager CALIGN TLSMODEL; 371 static __thread bool heapManagerBootFlag CALIGN TLSMODEL = false;372 371 static __thread size_t PAD2 CALIGN TLSMODEL __attribute__(( unused )); // protect further false sharing 373 372 … … 489 488 allocUnfreed = 0; 490 489 #endif // __CFA_DEBUG__ 491 heapManagerBootFlag = true;492 490 } // with 493 491 } // if … … 501 499 502 500 lock( heapMaster.mgrLock ); // protect heapMaster counters 503 504 assert( ! heapManagerBootFlag );505 501 506 502 // get storage for heap manager … … 518 514 519 515 void heapManagerDtor() libcfa_public { 520 if ( unlikely( ! heapManagerBootFlag ) ) return; // thread never used ?521 522 516 lock( heapMaster.mgrLock ); 523 517 … … 532 526 // Do not set heapManager to NULL because it is used after Cforall is shutdown but before the program shuts down. 533 527 534 heapManagerBootFlag = false;535 528 unlock( heapMaster.mgrLock ); 536 529 } // heapManagerDtor … … 542 535 extern int cfa_main_returned; // from interpose.cfa 543 536 extern "C" { 544 void memory_startup( void ) { // singleton => called once at start of program537 void memory_startup( void ) { 545 538 if ( ! heapMasterBootFlag ) heapManagerCtor(); // sanity check 546 539 } // memory_startup … … 902 895 #endif // __STATISTICS__ 903 896 904 // Uncomment to get allocation addresses for a 0-sized allocation rather than a null pointer.905 //#define __NONNULL_0_ALLOC__906 #if ! defined( __NONNULL_0_ALLOC__ )907 #define __NULL_0_ALLOC__ unlikely( size == 0 ) || /* 0 BYTE ALLOCATION RETURNS NULL POINTER */908 #else909 #define __NULL_0_ALLOC__910 #endif // __NONNULL_0_ALLOC__911 912 897 #define PROLOG( counter, ... ) \ 913 898 BOOT_HEAP_MANAGER; \ 914 if ( \ 915 __NULL_0_ALLOC__ \ 899 if ( unlikely( size == 0 ) || /* 0 BYTE ALLOCATION RETURNS NULL POINTER */ \ 916 900 unlikely( size > ULONG_MAX - sizeof(Heap.Storage) ) ) { /* error check */ \ 917 901 STAT_0_CNT( counter ); \ -
libcfa/src/iostream.cfa
r210c737 r4852232 10 10 // Created On : Wed May 27 17:56:53 2015 11 11 // Last Modified By : Peter A. Buhr 12 // Last Modified On : Sun Jul 30 23:01:00202313 // Update Count : 140 612 // Last Modified On : Tue Jul 18 13:56:01 2023 13 // Update Count : 1403 14 14 // 15 15 … … 230 230 ostype & ?|?( ostype & os, const char s[] ) { 231 231 enum { Open = 1, Close, OpenClose }; 232 static const unsigned char mask[256] @= { // 256 covers all Latin-1 characters232 static const unsigned char mask[256] @= { 233 233 // opening delimiters, no space after 234 234 ['('] : Open, ['['] : Open, ['{'] : Open,
Note:
See TracChangeset
for help on using the changeset viewer.