Ignore:
Timestamp:
Sep 5, 2023, 5:48:31 PM (12 months ago)
Author:
caparsons <caparson@…>
Branches:
master
Children:
1f10959, efe39bb
Parents:
f54e6ec
Message:

edited thesis to incorporate Gregors comments

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/text/actors.tex

    rf54e6ec raae9c17  
    88Hence, actors are in the realm of \gls{impl_concurrency}, where programmers write concurrent code without dealing with explicit thread creation or interaction.
    99Actor message-passing is similar to channels, but with more abstraction, so there is no shared data to protect, making actors amenable in a distributed environment.
     10Actors are often used for high-performance computing and other data-centric problems, where the ease of use and scalability of an actor system provides an advantage over channels.
     11
    1012The study of actors can be broken into two concepts, the \gls{actor_model}, which describes the model of computation, and the \gls{actor_system}, which refers to the implementation of the model.
    1113Before discussing \CFA's actor system in detail, it is important to first describe the actor model, and the classic approach to implementing an actor system.
     
    1517An actor is composed of a \Newterm{mailbox} (message queue) and a set of \Newterm{behaviours} that receive from the mailbox to perform work.
    1618Actors execute asynchronously upon receiving a message and can modify their own state, make decisions, spawn more actors, and send messages to other actors.
     19Conceptually, actor systems can be thought of in terms of channels, where each actor's mailbox is a channel.
     20However, a mailbox behaves like an unbounded channel, which differs from the fixed size channels discussed in the previous chapter.
    1721Because the actor model is implicit concurrency, its strength is that it abstracts away many details and concerns needed in other concurrent paradigms.
    1822For example, mutual exclusion and locking are rarely relevant concepts in an actor model, as actors typically only operate on local state.
     
    6771% The arrows from the message queues to the actors in the diagram indicate interleaved messages addressed to each actor.
    6872
    69 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{Buhr22} but implemented in \CFA, and adds the following new \CFA contributions:
     73The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{Buhr22} but is implemented in \CFA. My contributions to the prior actor work include introducing queue gulping, developing an actor benchmark suite, and extending promise support for actors. Furthermore, I improved the design and implementation of the \uC actor system to greatly increase its performance. As such, the actor system in \CFA started as a copy of the \uC implementation, which was then refined. This work adds the following new \CFA contributions:
    7074\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
    7175\item
     
    237241tells the executor to mark the respective actor as finished executing, but not call the object's destructor nor deallocate the object.
    238242This status is used when actors or messages are global or stack allocated, or a programmer wants to manage deallocation themselves.
    239 Note, for messages, there is no difference between allocations @Nodelete@ and @Finished@ because both tell the executor to do nothing to the message.
     243Note that for messages there is no difference between allocations @Nodelete@ and @Finished@ because both tell the executor to do nothing to the message.
    240244Hence, @Finished@ is implicitly changed to @Nodelete@ in a message constructor, and @Nodelete@ is used internally for message error-checking \see{Section~\ref{s:SafetyProductivity}}.
    241245Therefore, reading a message's allocation status after setting to @Finished@ may be either @Nodelete@ (after construction) or @Finished@ (after explicitly setting using @set_allocation@).
     
    244248After an actor is terminated, it is erroneous to send messages to it.
    245249Similarly,  after a message is terminated, it cannot be sent to an actor.
    246 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action \emph{after} a behaviour returns.
     250Note that it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action \emph{after} a behaviour returns.
    247251
    248252\subsection{Actor Envelopes}\label{s:envelope}
     
    377381Poison-pill messages are common across actor systems, including Akka and ProtoActor~\cite{Akka,ProtoActor} to suggest or force actor termination.
    378382For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor.
    379 Note, assignment is used to initialize these messages rather than constructors because the constructor changes the allocation to @Nodelete@ for error checking
     383Note that assignment is used to initialize these messages rather than constructors because the constructor changes the allocation to @Nodelete@ for error checking
    380384
    381385\begin{figure}
     
    399403After the receive routine is done, the executor must clean up the actor and message according to their allocation status.
    400404If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor.
    401 This requirement poses a problem;
    402 the derived type of the actor or message is not available to the executor, but it needs to call the derived destructor.
     405This requirement poses a problem: the derived type of the actor or message is not available to the executor, but it needs to call the derived destructor.
    403406This requires downcasting from the base type to the derived type, which requires a virtual system.
    404407To accomplish the downcast, a rudimentary destructor-only virtual system was implemented in \CFA as part of this work.
     
    477480The only extra overhead is each executor cycling (usually round-robin) through its $M$/$N$ queues.
    478481The goal is to achieve better performance and scalability for certain kinds of actor applications by reducing executor locking.
    479 Note, lock-free queues do not help because busy waiting on any atomic instruction is the source of the slowdown whether it is a lock or lock-free.
     482Note that lock-free queues do not help because busy waiting on any atomic instruction is the source of the slowdown whether it is a lock or lock-free.
    480483
    481484\begin{figure}
     
    489492Each executor thread iterates over its own message queues until it finds one with messages.
    490493At this point, the executor thread atomically \gls{gulp}s the queue, meaning it moves the contents of message queue to a local queue of the executor thread.
    491 An example of the queue gulping operation is shown in the right side of Figure \ref{f:gulp}, where a executor thread gulps queue 0 and begins to process it locally.
     494An example of the queue gulping operation is shown in the right side of Figure \ref{f:gulp}, where an executor thread gulps queue 0 and begins to process it locally.
    492495This step allows the executor thread to process the local queue without any atomics until the next gulp.
    493496Other executor threads can continue adding to the ends of the executor thread's message queues.
     
    515518Instead, an index is stored in the copy-queue data-structure that keeps track of which element to pop next allowing pop to be $O(1)$.
    516519Push operations are amortized $O(1)$ since pushes may cause doubling reallocations of the underlying dynamic-sized array (like \CC @vector@).
     520Note that the copy queue is similar to a circular buffer, but has a key difference.
     521After all elements are popped, the end of the queue is set back to index zero, whereas a standard circular buffer would leave the end in the same location.
     522Using a standard circular buffer would incur an additional $O(n)$ cost of fixing the buffer after a doubling reallocation.
    517523
    518524Since the copy queue is an array, envelopes are allocated first on the stack and then copied into the copy queue to persist until they are no longer needed.
     
    547553To steal, a thief takes work from a victim's ready queue, so work stealing always results in a potential increase in contention on ready queues between the victim gulping from a queue and the thief stealing the queue.
    548554This contention can reduce the victim's throughput.
    549 Note, the data structure used for the ready queue is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations.
     555Note that the data structure used for the ready queue is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations.
    550556
    551557The stealing mechanism in this work differs from most work-stealing actor-systems because of the message-centric (inverted) actor-system.
     
    561567Achieving this goal requires work stealing to have minimal (practically no) effect on the performance of the victim.
    562568The implication is that thieves cannot contend with a victim, and that a victim should perform no stealing related work unless it becomes a thief.
    563 In theory, this goal is not achievable, but practical results show the goal is virtually achieved.
     569In theory, this goal is not achievable, but practical results show the goal is nearly achieved.
    564570
    565571One important lesson learned while working on \uC actors~\cite{Buhr22} and through discussions with fellow student Thierry Delisle, who examined work-stealing for user-threads in his Ph.D.~\cite{Delisle22}, is \emph{not} to aggressively steal.
    566572With reasonable workloads, being a thief should be a temporary state, \ie eventually work appears on the thief's ready-queues and it returns to normal operation.
    567 Furthermore, the act of \emph{looking} to find work is invasive (Heisenberg uncertainty principle), possibly disrupting multiple victims.
     573Furthermore, the act of \emph{looking} to find work is invasive, possibly disrupting multiple victims.
    568574Therefore, stealing should be done lazily in case work appears for the thief and to minimize disruption of victims.
    569 Note, the cost of stealing is not crucial for the thief because it does not have anything else to do except poll or block.
     575Note that the cost of stealing is not crucial for the thief because it does not have anything else to do except poll or block.
    570576
    571577The outline for lazy-stealing by a thief is: select a victim, scan its queues once, and return immediately if a queue is stolen.
     
    628634% The swap races to interchange the appropriate victim's mail-queue pointer with an empty mail-queue pointer in the thief's @worker_queues@ range.
    629635This swap can fail if another thief steals the queue, which is discussed further in Section~\ref{s:swap}.
    630 % Note, a thief never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers.
     636% Note that a thief never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers.
    631637If no appropriate victim mailbox is found, no swap is attempted.
    632638
     
    764770Figure~\ref{f:qpcasImpl} shows the \CFA pseudocode for the \gls{qpcas}.
    765771In detail, a thief performs the following steps to swap two pointers:
    766 \begin{enumerate}[start=0]
    767 \item
    768 stores local copies of the two pointers to be swapped.
    769 \item
    770 verifies the stored copy of the victim queue pointer, @vic_queue@, is valid.
     772\begin{enumerate}
     773\item
     774Stores local copies of the two pointers to be swapped.
     775\item
     776Verifies the stored copy of the victim queue pointer, @vic_queue@, is valid.
    771777If @vic_queue@ is null, then the victim queue is part of another swap so the operation fails.
    772778No state has changed at this point so the thief just returns.
    773 Note, thieves only set their own queues pointers to null when stealing, and queue pointers are not set to null anywhere else.
     779Note that thieves only set their own queues pointers to null when stealing, and queue pointers are not set to null anywhere else.
    774780As such, it is impossible for @my_queue@ to be null since each worker owns a disjoint range of the queue array.
    775781Hence, only @vic_queue@ is checked for null.
    776782\item
    777 attempts to atomically set the thief's queue pointer to null.
     783Attempts to atomically set the thief's queue pointer to null.
    778784The @CAS@ only fails if the thief's queue pointer is no longer equal to @my_queue@, which implies this thief has become a victim and its queue has been stolen.
    779785At this point, the thief-turned-victim fails, and since it has not changed any state, it just returns false.
     
    784790
    785791\item
    786 attempts to atomically set the victim's queue pointer to @my_queue@.
     792Attempts to atomically set the victim's queue pointer to @my_queue@.
    787793If the @CAS@ succeeds, the victim's queue pointer has been set and the swap can no longer fail.
    788794If the @CAS@ fails, the thief's queue pointer must be restored to its previous value before returning.
    789795\item
    790 sets the thief's queue pointer to @vic_queue@ completing the swap.
     796Sets the thief's queue pointer to @vic_queue@ completing the swap.
    791797\end{enumerate}
    792798
     
    794800\begin{cfa}
    795801bool try_swap_queues( worker & this, uint victim_idx, uint my_idx ) with(this) {
    796         // Step 0: mailboxes is the shared array of all sharded queues
     802        // Step 1: mailboxes is the shared array of all sharded queues
    797803        work_queue * my_queue = mailboxes[my_idx];
    798804        work_queue * vic_queue = mailboxes[victim_idx];
    799805
    800         // Step 1 If the victim queue is 0p then they are in the process of stealing so return false
     806        // Step 2 If the victim queue is 0p then they are in the process of stealing so return false
    801807        // 0p is Cforall's equivalent of C++'s nullptr
    802808        if ( vic_queue == 0p ) return false;
    803809
    804         // Step 2 Try to set our own (thief's) queue ptr to be 0p.
     810        // Step 3 Try to set our own (thief's) queue ptr to be 0p.
    805811        // If this CAS fails someone stole our (thief's) queue so return false
    806812        if ( !CAS( &mailboxes[my_idx], &my_queue, 0p ) )
    807813                return false;
    808814
    809         // Step 3: Try to set victim queue ptr to be our (thief's) queue ptr.
     815        // Step 4: Try to set victim queue ptr to be our (thief's) queue ptr.
    810816        // If it fails someone stole the other queue, so fix up then return false
    811817        if ( !CAS( &mailboxes[victim_idx], &vic_queue, my_queue ) ) {
     
    813819                return false;
    814820        }
    815         // Step 4: Successfully swapped.
     821        // Step 5: Successfully swapped.
    816822        // Thief's ptr is 0p so no one touches it
    817823        // Write back without CAS is safe
     
    828834\end{theorem}
    829835To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{qpcas}.
    830 Step 1 is missing in the sequential example since it only matters in the concurrent context.
     836Step 2 is missing in the sequential example since it only matters in the concurrent context.
    831837By inspection, the sequential swap copies each pointer being swapped, and then the original values of each pointer are reset using the copy of the other pointer.
    832838
     
    834840\begin{cfa}
    835841void swap( uint victim_idx, uint my_idx ) {
    836         // Step 0:
     842        // Step 1:
    837843        work_queue * my_queue = mailboxes[my_idx];
    838844        work_queue * vic_queue = mailboxes[victim_idx];
    839         // Step 2:
     845        // Step 3:
    840846        mailboxes[my_idx] = 0p;
    841         // Step 3:
     847        // Step 4:
    842848        mailboxes[victim_idx] = my_queue;
    843         // Step 4:
     849        // Step 5:
    844850        mailboxes[my_idx] = vic_queue;
    845851}
     
    859865\begin{itemize}
    860866\item
    861 Step 0 and 1 do not write, and as such, they cannot invalidate the invariant of any other thieves.
    862 \item
    863 In step 2, a thief attempts to write @0p@ to one of their queue pointers.
     867Step 1 and 2 do not write, and as such, they cannot invalidate the invariant of any other thieves.
     868\item
     869In step 3, a thief attempts to write @0p@ to one of their queue pointers.
    864870This queue pointer cannot be @0p@.
    865871As stated above, @my_queue@ is never equal to @0p@ since thieves only write @0p@ to queue pointers from their own queue range and all worker's queue ranges are disjoint.
    866 As such, step 2 upholds the invariant, since in a failure case no write occurs, and in the success case, the value of the queue pointer is guaranteed to not be 0p.
    867 \item
    868 In step 3, the thief attempts to write @my_queue@ to the victim's queue pointer.
    869 If the current value of the victim's queue pointer is @0p@, then the @CAS@ fails since @vic_queue@ cannot be equal to @0p@ because of the check in step 1.
     872As such, step 3 upholds the invariant, since in a failure case no write occurs, and in the success case, the value of the queue pointer is guaranteed to not be 0p.
     873\item
     874In step 4, the thief attempts to write @my_queue@ to the victim's queue pointer.
     875If the current value of the victim's queue pointer is @0p@, then the @CAS@ fails since @vic_queue@ cannot be equal to @0p@ because of the check in step 2.
    870876Therefore, when the @CAS@ succeeds, the value of the victim's queue pointer must not be @0p@.
    871 As such, the write never overwrites a value of @0p@, hence the invariant is held in the @CAS@ of step 3.
    872 \item
    873 The write back to the thief's queue pointer that happens in the failure case of step 3 and in step 4 hold the invariant since they are the subsequent write to a @0p@ queue pointer and are being set by the same thief that set the pointer to @0p@.
     877As such, the write never overwrites a value of @0p@, hence the invariant is held in the @CAS@ of step 4.
     878\item
     879The write back to the thief's queue pointer that happens in the failure case of step 4 and in step 5 hold the invariant since they are the subsequent write to a @0p@ queue pointer and are being set by the same thief that set the pointer to @0p@.
    874880\end{itemize}
    875881
    876882Given this informal proof of invariance it can be shown that the successful swap is correct.
    877 Once a thief atomically sets their queue pointer to be @0p@ in step 2, the invariant guarantees that that pointer does not change.
    878 In the success case of step 3, it is known the value of the victim's queue-pointer, which is not overwritten, must be @vic_queue@ due to the use of @CAS@.
     883Once a thief atomically sets their queue pointer to be @0p@ in step 3, the invariant guarantees that that pointer does not change.
     884In the success case of step 4, it is known the value of the victim's queue-pointer, which is not overwritten, must be @vic_queue@ due to the use of @CAS@.
    879885Given that the pointers all have unique memory locations (a pointer is never swapped with itself), this first write of the successful swap is correct since it can only occur when the pointer has not changed.
    880886By the invariant, the write back in the successful case is correct since no other worker can write to the @0p@ pointer.
    881 In the failed case of step 3, the outcome is correct in steps 1 and 2 since no writes have occurred so the program state is unchanged.
    882 Therefore, the program state is safely restored to the state it had prior to the @0p@ write in step 2, because the invariant makes the write back to the @0p@ pointer safe.
     887In the failed case of step 4, the outcome is correct in steps 2 and 3 since no writes have occurred so the program state is unchanged.
     888Therefore, the program state is safely restored to the state it had prior to the @0p@ write in step 3, because the invariant makes the write back to the @0p@ pointer safe.
    883889Note that the pointers having unique memory locations prevents the ABA problem.
    884890
     
    906912As such, the victim here is not also a thief.
    907913Stepping through the code in \ref{f:qpcasImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@.
    908 Similarly, for all thieves, step 2 succeed since no one is stealing from any of the thieves.
    909 In step 3, the first thief to @CAS@ wins the race and successfully swaps the queue pointer.
     914Similarly, for all thieves, step 3 succeeds since no one is stealing from any of the thieves.
     915In step 4, the first thief to @CAS@ wins the race and successfully swaps the queue pointer.
    910916Since it is the first one to @CAS@ and @CAS@ is atomic, there is no way for the @CAS@ to fail since no other thief could have written to the victim's queue pointer and the victim did not write to the pointer since they aren't stealing.
    911917Hence at least one swap is guaranteed to succeed in this case.
     
    926932This is a proof by contradiction.
    927933Assume no swaps occur.
    928 Then all thieves must have failed at step 1, step 2 or step 3.
    929 For a given thief $b$ to fail at step 1, thief $b + 1$ must have succeeded at step 2 before $b$ executes step 0.
    930 Hence, not all thieves can fail at step 1.
    931 Furthermore if a thief $b$ fails at step 1 it logically splits the chain into two subchains $0 <- b$ and $b + 1 <- M - 1$, where $b$ has become solely a victim since its swap has failed and it did not modify any state.
    932 There must exist at least one chain containing two or more queues after since it is impossible for a split to occur both before and after a thief, since that requires failing at step 1 and succeeding at step 2.
    933 Hence, without loss of generality, whether thieves succeed or fail at step 1, this proof can proceed inductively.
    934 
    935 For a given thief $i$ to fail at step 2, it means that another thief $j$ had to have written to $i$'s queue pointer between $i$'s step 0 and step 2.
    936 The only way for $j$ to write to $i$'s queue pointer would be if $j$ was stealing from $i$ and had successfully finished step 3.
    937 If $j$ finished step 3 then the at least one swap was successful.
    938 Therefore all thieves did not fail at step 2.
    939 Hence all thieves must successfully complete step 2 and fail at step 3.
     934Then all thieves must have failed at step 2, step 3 or step 4.
     935For a given thief $b$ to fail at step 2, thief $b + 1$ must have succeeded at step 3 before $b$ executes step 1.
     936Hence, not all thieves can fail at step 2.
     937Furthermore if a thief $b$ fails at step 2 it logically splits the chain into two subchains $0 <- b$ and $b + 1 <- M - 1$, where $b$ has become solely a victim since its swap has failed and it did not modify any state.
     938There must exist at least one chain containing two or more queues after since it is impossible for a split to occur both before and after a thief, since that requires failing at step 2 and succeeding at step 3.
     939Hence, without loss of generality, whether thieves succeed or fail at step 2, this proof can proceed inductively.
     940
     941For a given thief $i$ to fail at step 3, it means that another thief $j$ had to have written to $i$'s queue pointer between $i$'s step 1 and step 3.
     942The only way for $j$ to write to $i$'s queue pointer would be if $j$ was stealing from $i$ and had successfully finished 4.
     943If $j$ finished step 4, then at least one swap was successful.
     944Therefore all thieves did not fail at step 3.
     945Hence all thieves must successfully complete step 3 and fail at step 4.
    940946However, since the first worker, thief $0$, is solely a victim and not a thief, it does not change the state of any of its queue pointers.
    941 Hence, in this case thief $1$ always succeeds in step 3 if all thieves succeed in step 2.
     947Hence, in this case thief $1$ always succeeds in step 4 if all thieves succeed in step 3.
    942948Thus, by contradiction with the earlier assumption that no swaps occur, at least one swap must succeed.
    943949
     
    964970The forward direction is proven by contradiction in a similar fashion to \ref{t:vic_chain}.
    965971Assume no swaps occur.
    966 Similar to \ref{t:vic_chain}, this graph can be inductively split into subgraphs of the same type by failure at step 1, so the proof proceeds without loss of generality.
    967 Similar to \ref{t:vic_chain} the conclusion is drawn that all thieves must successfully complete step 2 for no swaps to occur, since for step 2 to fail, a different thief has to successfully complete step 3, which would imply a successful swap.
    968 Hence, the only way forward is to assume all thieves successfully complete step 2.
    969 Hence for there to be no swaps all thieves must fail step 3.
     972Similar to \ref{t:vic_chain}, this graph can be inductively split into subgraphs of the same type by failure at step 2, so the proof proceeds without loss of generality.
     973Similar to \ref{t:vic_chain} the conclusion is drawn that all thieves must successfully complete step 3 for no swaps to occur, since for step 3 to fail, a different thief has to successfully complete step 4, which would imply a successful swap.
     974Hence, the only way forward is to assume all thieves successfully complete step 3.
     975Hence for there to be no swaps all thieves must fail step 4.
    970976However, since $A$ has no outgoing edges, since the graph is connected there must be some $K$ such that $K < M - 1$ thieves are attempting to swap with $A$.
    971 Since all $K$ thieves have passed step 2, similar to \ref{t:one_vic} the first one of the $K$ thieves to attempt step 3 is guaranteed to succeed.
     977Since all $K$ thieves have passed step 3, similar to \ref{t:one_vic} the first one of the $K$ thieves to attempt step 4 is guaranteed to succeed.
    972978Thus, by contradiction with the earlier assumption that no swaps occur, if the graph does not contain a cycle, at least one swap must succeed.
    973979
     
    983989Thus, by induction all vertices in the graph must have exactly one outgoing edge.
    984990Hence all vertices are thief queues.
    985 Now consider the case where all thieves successfully complete step 0-1, and then they all complete step 2.
     991Now consider the case where all thieves successfully complete step 1-2, and then they all complete step 3.
    986992At this point all thieves are attempting to swap with a queue pointer whose value has changed to @0p@.
    987993If all thieves attempt the @CAS@ before any write backs, then they all fail.
     
    9971003There is no one selection heuristic known to be best for all workloads.
    9981004Recent work focuses on locality aware scheduling in actor systems~\cite{barghi18,wolke17}.
    999 However, while locality-aware scheduling provides good performance on some workloads, sometime randomized selection performs better on other workloads~\cite{barghi18}.
     1005However, while locality-aware scheduling provides good performance on some workloads, randomized selection performs better on other workloads~\cite{barghi18}.
    10001006Since locality aware scheduling has been explored recently, this work introduces a heuristic called \Newterm{longest victim} and compares it to randomized work stealing.
    10011007
     
    10211027
    10221028\CFA's actor system comes with a suite of safety and productivity features.
    1023 Most of these features are only present in \CFA's debug mode, and hence, have zero-cost in nodebug mode.
     1029Most of these features are only present in \CFA's debug mode, and hence, have zero-cost in no-debug mode.
    10241030The suit of features include the following.
    10251031\begin{itemize}
     
    11941200The copy queue both reduces and consolidates allocations, heavily reducing contention on the memory allocator.
    11951201Additionally, due to the static typing in \CFA's actor system, there is no expensive dynamic (RTTI) casts that occur in \uC to discriminate messages types.
    1196 Note, while dynamic cast is relatively inexpensive, the remaining send cost in both \uC and \CFA is small;
     1202Note that while dynamic cast is relatively inexpensive, the remaining send cost in both \uC and \CFA is small;
    11971203hence, the relative cost for the RTTI in \uC is significant.
    11981204
     
    12641270Figures~\ref{f:MatrixAMD} and \ref{f:MatrixIntel} show the matrix-multiply results.
    12651271There are two groupings with Akka and ProtoActor being slightly slower than \uC, \CFA, and CAF.
    1266 On the Intel, there is an unknown divergence between \uC and \CFA/CAF at 24 cores.
     1272On the Intel, there is an unexplained divergence between \uC and \CFA/CAF at 24 cores.
    12671273Given that the bottleneck of this benchmark is the computation of the result matrix, all executors perform well on this embarrassingly parallel application.
    12681274Hence, the results are tightly clustered across all actor systems.
     
    12881294
    12891295Two pathological unbalanced cases are created, and compared using vanilla and randomized work stealing in \CFA.
    1290 These benchmarks adversarially takes advantage of the round-robin assignment of actors to workers by loading actors only on specific cores (there is one worker per core).
     1296These benchmarks adversarially take advantage of the round-robin assignment of actors to workers by loading actors only on specific cores (there is one worker per core).
    12911297The workload on the loaded cores is the same as the executor benchmark described in \ref{s:executorPerf}, but with fewer rounds.
    12921298
     
    12941300Given this layout, the ideal speedup of work stealing in the balance-one case should be $N / N - 1$ where $N$ is the number of threads;
    12951301in the balance-multi case, the ideal speedup is 0.5.
    1296 Note, in the balance-one benchmark, the workload is fixed so decreasing runtime is expected;
     1302Note that in the balance-one benchmark, the workload is fixed so decreasing runtime is expected;
    12971303in the balance-multi experiment, the workload increases with the number of cores so an increasing or constant runtime is expected.
    12981304
     
    13281334On Intel in Figure~\ref{f:BalanceOneIntel}, above 32 cores the performance gets worse for all variants due to hyperthreading.
    13291335Here, the longest-victim and random heuristic are the same.
    1330 Note, the non-stealing variation of balance-one slows down slightly (no decrease in graph) as the cores increase, since a few \emph{dummy} actors are created for each of the extra cores beyond the first to adversarially layout all loaded actors on the first core.
     1336Note that the non-stealing variation of balance-one slows down slightly (no decrease in graph) as the cores increase, since a few \emph{dummy} actors are created for each of the extra cores beyond the first to adversarially layout all loaded actors on the first core.
    13311337
    13321338For the balance-multi benchmark in Figures~\ref{f:BalanceMultiAMD} and~\ref{f:BalanceMultiIntel}, the random heuristic outperforms the longest victim.
Note: See TracChangeset for help on using the changeset viewer.