Changes in / [96ea77a:70f97c8]
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/colby_parsons_MMAth/text/actors.tex
r96ea77a r70f97c8 11 11 12 12 \section{Actor Model} 13 The \Newterm{actor model}is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}.13 The actor model is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}. 14 14 An actor is composed of a \Newterm{mailbox} (message queue) and a set of \Newterm{behaviours} that receive from the mailbox to perform work. 15 15 Actors execute asynchronously upon receiving a message and can modify their own state, make decisions, spawn more actors, and send messages to other actors. … … 20 20 An actor is executed by an underlying \Newterm{executor} (kernel thread-pool) that fairly invokes each actor, where an actor invocation processes one or more messages from its mailbox. 21 21 The default number of executor threads is often proportional to the number of computer cores to achieve good performance. 22 An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s: ActorSystem}}.22 An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s:CFAActor}}. 23 23 24 24 \subsection{Classic Actor System} 25 An implementation of the actor model with a community of actors is called an \Newterm{actor system}.25 An implementation of the actor model with a community of actors is called an actor system. 26 26 Actor systems largely follow the actor model, but can differ in some ways. 27 27 While the semantics of message \emph{send} is asynchronous, the implementation may be synchronous or a combination. 28 The default semantics for message \emph{receive} is \gls{fifo}, so an actor receives messages from its mailbox in temporal (arrival) order;28 The default semantics for message \emph{receive} is FIFO, so an actor receives messages from its mailbox in temporal (arrival) order; 29 29 however, messages sent among actors arrive in any order. 30 30 Some actor systems provide priority-based mailboxes and/or priority-based message-selection within a mailbox, where custom message dispatchers search among or within a mailbox(es) with a predicate for specific kinds of actors and/or messages. 31 31 Some actor systems provide a shared mailbox where multiple actors receive from a common mailbox~\cite{Akka}, which is contrary to the no-sharing design of the basic actor-model (and requires additional locking). 32 For non- \gls{fifo}service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received.33 Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism \see{Section~\ref{s:SafetyProductivity}}).32 For non-FIFO service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received. 33 Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism (see Section~\ref{s:SafetyProductivity}). 34 34 %While the definition of the actor model provides no restrictions on message ordering, actor systems tend to guarantee that messages sent from a given actor $i$ to actor $j$ will arrive at actor $j$ in the order they were sent. 35 35 Another way an actor system varies from the model is allowing access to shared global-state. … … 49 49 The more common design is to \Newterm{shard} the single queue among the executor threads, where actors are permanently assigned or can float among the queues. 50 50 Sharding significantly decreases contention among executor threads adding and removing actors to/from a queue. 51 Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors addmessages.52 When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current behaviouron the message(s).51 Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors can attach messages. 52 When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current work unit on the message(s). 53 53 54 54 % cite parallel theatre and our paper 55 55 Figure \ref{f:inverted_actor} shows an actor system designed as \Newterm{message-centric}, where a set of messages are scheduled and run on underlying executor threads~\cite{uC++,Nigro21}. 56 This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor.57 Now a message send must queries the actor to know which message queue to post the message.58 56 Again, the simplest design has a single global queue of messages accessed by the executor threads, but this approach has the same contention problem by the executor threads. 59 57 Therefore, the messages (mailboxes) are sharded and executor threads schedule each message, which points to its corresponding actor. 60 Here, an actor's messages are permanently assigned to one queue to ensure \gls{fifo} receiving and/or reduce searching for specific actor/messages. 61 Since multiple actors belong to each message queue, actor messages are interleaved on a queue, but individually in FIFO order. 58 Here, an actor's messages are permanently assigned to one queue to ensure FIFO receiving and/or reduce searching for specific actor/messages. 59 Since multiple actors belong to each message queue, actor messages are interleaved on a queue. 60 This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor. 62 61 % In this inverted actor system instead of each executor threads owning a queue of actors, they each own a queue of messages. 63 62 % In this scheme work is consumed from their queue and executed by underlying threads. … … 66 65 % The arrows from the message queues to the actors in the diagram indicate interleaved messages addressed to each actor. 67 66 68 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{ Buhr22} but implemented in \CFA, and adds the following new \CFA contributions:67 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{}, and adds the following contributions related to \CFA: 69 68 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 70 69 \item 71 Provide insight into the impact of envelope allocation in actor systems \see{Section~\ref{s:envelope}}.70 Provide insight into the impact of envelope allocation in actor systems. 72 71 In all actor systems, dynamic allocation is needed to ensure the lifetime of a unit of work persists from its creation until the unit of work is executed. 73 72 This allocation is often called an \Newterm{envelope} as it ``packages'' the information needed to run the unit of work, alongside any other information needed to send the unit of work, such as an actor's address or link fields. … … 127 126 \end{cfa} 128 127 \end{lrbox} 129 130 128 \subfloat[dynamic typing]{\label{l:dynamic_style}\usebox\myboxA} 131 129 \hspace*{10pt} … … 179 177 \end{figure} 180 178 181 Figure~\ref{f:CFAActor} shows a complete \CFA actor example, which is discussed in detail. 182 The actor type @my_actor@ is a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword. 179 Figure~\ref{f:CFAActor} shows a complete \CFA actor example starting with the actor type @my_actor@ created by defining a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword. 183 180 This inheritance style is the Plan-9 C-style inheritance discussed in Section~\ref{s:Inheritance}. 184 Similarly, the message types @str_msg@ and @int_msg@ are @struct@sthat inherits from the base @message@ @struct@ via the @inline@ keyword.181 Similarly, the message types @str_msg@ and @int_msg@ are created by defining a @struct@ that inherits from the base @message@ @struct@ via the @inline@ keyword. 185 182 Only @str_msg@ needs a constructor to copy the C string; 186 183 @int_msg@ is initialized using its \CFA auto-generated constructors. … … 190 187 The program main begins by creating two messages on the stack. 191 188 Then the executor system is started by calling @start_actor_system@. 192 Now an actor is created on the stack and four messages are sent toit using operator @?|?@.193 The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to remove the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}.189 Now an actor is created on the stack and four messages are sent it using operator @?|?@. 190 The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to removes the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}. 194 191 The call to @stop_actor_system@ blocks the program main until all actors are finished and removed from the actor system. 195 192 The program main ends by deleting the actor and two messages from the stack. … … 202 199 203 200 \subsection{Actor Behaviours}\label{s:ActorBehaviours} 204 In general, a behaviour for some derived actor and derived message type is defined with thefollowing signature:201 In general, a behaviour for some derived actor and derived message type is defined with following signature: 205 202 \begin{cfa} 206 203 allocation receive( my_actor & receiver, my_msg & msg ) … … 216 213 Message state is updated via a call to: 217 214 \begin{cfa} 218 void set_allocation( message & this, allocation state ) ;215 void set_allocation( message & this, allocation state ) 219 216 \end{cfa} 220 217 … … 223 220 \noindent@Nodelete@ 224 221 tells the executor that no action is to be taken with regard to an actor or message. 225 This status is used when an actor continues receiving messages or a message isreused.222 This status is used when an actor continues receiving messages or a message may be reused. 226 223 227 224 \noindent@Delete@ … … 236 233 tells the executor to mark the respective actor as finished executing, but not call the object's destructor nor deallocate the object. 237 234 This status is used when actors or messages are global or stack allocated, or a programmer wants to manage deallocation themselves. 238 Note, for messages, there is no difference between allocations @Nodelete@ and @Finished@ because both tell the executor to do nothing to the message.239 Hence, @Finished@ is implicitly changed to @Nodelete@ in a message constructor, and @Nodelete@ is used internally for message error-checking \see{Section~\ref{s:SafetyProductivity}}.240 Therefore, reading a message's allocation status after setting to @Finished@ may be either @Nodelete@ (after construction) or @Finished@ (after explicitly setting using @set_allocation@).241 235 242 236 For the actor system to terminate, all actors must have returned a status other than @Nodelete@. 243 237 After an actor is terminated, it is erroneous to send messages to it. 244 238 Similarly, after a message is terminated, it cannot be sent to an actor. 245 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action \emph{after}a behaviour returns.239 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action after a behaviour returns. 246 240 247 241 \subsection{Actor Envelopes}\label{s:envelope} 248 242 As stated, each message, regardless of where it is allocated, can be sent to an arbitrary number of actors, and hence, appear on an arbitrary number of message queues. 249 Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies ?250 Therefore, it isup to the actor program to manage message life-time across receives.243 Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies. 244 Therefore, it up to the actor program to manage message life-time across receives. 251 245 However, for a message to appear on multiple message queues, it needs an arbitrary number of associated destination behaviours. 252 246 Hence, there is the concept of an envelop, which is dynamically allocated on each send, that wraps a message with any extra implementation fields needed to persist between send and receive. 253 247 Managing the envelop is straightforward because it is created at the send and deleted after the receive, \ie there is 1:1 relationship for an envelop and a many to one relationship for a message. 254 255 \PAB{Do you need to say more here?}256 248 257 249 % In actor systems, messages are sent and received by actors. … … 283 275 \noindent@void start_actor_system( executor & this )@ 284 276 allows the programmer to explicitly create and configure an executor for use by the actor system. 285 Executor configuration options are discussed in Section~\ref{s:executor}.277 Executor configuration options include are discussed in Section~\ref{s:executor}. 286 278 287 279 \noindent … … 290 282 \subsection{Actor Send}\label{s:ActorSend} 291 283 All message sends are done using the vertical-bar (bit-or) operator, @?|?@, similar to the syntax of the \CFA stream I/O. 292 One way to provide this operator is through the \CFA type system: 293 \begin{cfa} 294 actor & ?|?( actor &, message & ) { // base actor and message types 295 // boilerplate to send message to executor mail queue 296 } 297 actor | str_msg | int_msg; // rewritten: ?|?( ?|?( actor, int_msg ), str_msg ) 298 \end{cfa} 299 In the \CFA type system, calls to this routine work for any pair of parameters that inherit from the @actor@ and @message@ types via Plan-9 inheritance. 300 However, within the body the routine, all type information about the derived actor and message is lost (type erasure), so this approach is unable to find the right @receive@ routine to put in the envelope. 301 302 If \CFA had a fully-fledged virtual system, the generic @?|?@ routine would work, since the virtual system could dynamically select the derived @receive@ routine via virtual dispatch. 303 \CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work, so a different approach is needed. 304 305 Without virtuals, the idiomatic \CFA way to create the generic @?|?@ routine is using @forall@: 306 \begin{cfa} 307 // forall types A, M that have a receive that returns allocation 308 forall( A &, M & | { allocation receive( A &, M & ); } ) 309 A & ?|?( A &, M & ) { // actor and message types 310 // boilerplate to send message to executor mail queue 311 } 312 \end{cfa} 313 This approach should work. 314 However, the \CFA type system is still a work in progress, and there is a nontrivial bug where inherited routines are not recognized by @forall@. 315 For example, Figure~\ref{f:type_problem} shows type @B@ has an inherited @foo@ routine through type @A@ and should find the @bar@ routine defined via the @forall@, but does not due the type-system bug. 316 317 \begin{figure} 318 \begin{cfa} 319 struct A {}; 320 struct B { inline A; } 321 void foo( A & a ) { ... } 322 323 // for all types that have a foo routine here is a bar routine 324 forall( T & | { void foo( T & ); } ) 325 void bar( T & t ) { ... } 326 327 int main() { 328 B b; 329 foo( b ); // B has a foo so it should find a bar via the forall 330 bar( b ); // compilation error, no bar found for type B 331 } 332 \end{cfa} 333 \caption{\CFA Type-System Problem} 334 \label{f:type_problem} 335 \end{figure} 336 337 Users could be expected to write the @?|?@ routines, but this approach is error prone and creates maintenance issues. 338 Until the \CFA type-system matures, I created a workaround using a template-like approach, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature. 339 This workaround is outside of the type system, but performing a type-system like action. 340 The workaround requires no annotation or additional code to be written by users; 341 thus, it resolves the maintenance and error problems. 342 It should be possible to seamlessly transition the workaround into any updated version of the \CFA type-system. 284 Hence, programmers must write a matching @?|?@ routine for each @receive@ routine, which is awkward and generates a maintenance problem that must be solved. 285 \CFA's current best approach for creating a generic @?|?@ routine requires users to create routines for their actor and message types that access the base type. 286 Since these routines are not complex, they can be generated using macros that are used to annotate the user-defined actor and message types. 287 This approach is used in \CFA's intrusive list data structure. 288 This is not much better than asking users to write the @?|?@ routine themselves in terms of maintenance, since the user needs to remember the boilerplate macro annotation. 289 290 As stated, \CFA does not have named inheritance with RTTI. 291 \CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work. 292 Virtuals would provide a mechanism to write a single generic @?|?@ routine taking a base actor and message type, that would dynamically select the @receive@ routine from the actor argument. 293 Note, virtuals are not needed for the send; Plan-9 inheritance is sufficient because only the inherited fields are needed during the message send (only upcasting is needed). 294 295 Therefore, a template-like approach was chosen, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature. 296 This approach requires no annotation or additional code to be written by users, thus it resolves the maintenance problem. 297 (When the \CFA virtual routines mature, it should be possible to seamlessly transition to it from the template approach.) 343 298 344 299 Figure~\ref{f:send_gen} shows the generated send routine for the @int_msg@ receive in Figure~\ref{f:CFAActor}. 345 300 Operator @?|?@ has the same parameter signature as the corresponding @receive@ routine and returns an @actor@ so the operator can be cascaded. 346 301 The routine sets @rec_fn@ to the matching @receive@ routine using the left-hand type to perform the selection. 347 Then the routine packages the actor and message, along with the receive routine into an envelope.302 Then the routine packages the base and derived actor and message and actor, along with the receive routine into an \hyperref[s:envelope]{envelope}. 348 303 Finally, the envelop is added to the executor queue designated by the actor using the executor routine @send@. 349 304 … … 351 306 \begin{cfa} 352 307 $\LstCommentStyle{// from Figure~\ref{f:CFAActor}}$ 353 struct my_actor { inline actor; }; $\C[3.75in]{// actor}$354 struct int_msg { inline message; int i; }; $\C{// message}$308 struct my_actor { inline actor; }; $\C[3.75in]{// actor}$ 309 struct int_msg { inline message; int i; }; $\C{// message}$ 355 310 allocation receive( @my_actor &, int_msg & msg@ ) {...} $\C{// receiver}$ 356 311 … … 359 314 actor & ?|?( @my_actor & receiver, int_msg & msg@ ) { 360 315 allocation (*rec_fn)( my_actor &, int_msg & ) = @receive@; // deduce receive routine 361 request req{ (actor *)&receiver, (message *)&msg, (receive_t)rec_fn };362 send( receiver, req ); 316 request req{ &receiver, (actor *)&receiver, &msg, (message *)&msg, (receive_t)rec_fn }; 317 send( receiver, req ); $\C{// queue message for execution}\CRT$ 363 318 return receiver; 364 319 } … … 368 323 \end{figure} 369 324 325 \subsection{Actor Termination}\label{s:ActorTerm} 326 During a message send, the receiving actor and message being sent are stored via pointers in the envelope. 327 These pointers have the base actor and message types, so type information of the actor and message is lost and then recovered later when the typed receive routine is called. 328 After the receive routine is done, the executor must clean up the actor and message according to their allocation status. 329 If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor. 330 This poses a problem; the correct type of the actor or message is not available to the executor, but it needs to call the right destructor! 331 This requires downcasting from the base type to derived type, which requires a virtual system. 332 Thus, a rudimentary destructor-only virtual system was added to \CFA as part of this work. 333 This virtual system is used via Plan-9 inheritance of the @virtual_dtor@ type, shown in Figure~\ref{f:VirtDtor}. 334 The @virtual_dtor@ type maintains a pointer to the start of the object, and a pointer to the correct destructor. 335 When a type inherits the @virtual_dtor@ type, the compiler adds code to its destructor to make sure that any destructor calls along this segment of the inheritance tree is called are intercepted, and restart at the appropriate destructor for that object. 336 337 \begin{figure} 338 \begin{cfa} 339 struct base_type { inline virtual_dtor; }; 340 struct intermediate_type { inline base_type; }; 341 struct derived_type { inline intermediate_type; }; 342 343 int main() { 344 derived_type d1, d2, d3; 345 intermediate_type & i = d2; 346 base_type & b = d3; 347 348 // these will call the destructors in the correct order 349 ^d1{}; ^i{}; ^b{}; 350 } 351 352 \end{cfa} 353 \caption{\CFA Virtual Destructor} 354 \label{f:VirtDtor} 355 \end{figure} 356 357 This virtual destructor system was built for this work, but is general and can be used in any type in \CFA. 358 Actors and messages opt into this system by inheriting the @virtual_dtor@ type, which allows the executor to call the right destructor without knowing the derived actor or message type. 359 370 360 Figure~\ref{f:ConvenienceMessages} shows three builtin convenience messages and receive routines used to terminate actors, depending on how an actor is allocated: @Delete@, @Destroy@ or @Finished@. 371 361 For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor. 372 Note, assignment is used to initialize these messages rather than constructors because the constructor changes the allocation to @Nodelete@ for error checking373 362 374 363 \begin{figure} 375 364 \begin{cfa} 376 message __base_msg_finished $@$= { .allocation_ : Finished }; 377 struct delete_msg_t { inline message; } delete_msg = __base_msg_finished;378 struct destroy_msg_t { inline message; } destroy_msg = __base_msg_finished;379 struct finished_msg_t { inline message; } finished_msg = __base_msg_finished;380 381 allocation receive( actor & this, delete_msg_t & msg ) { return Delete; }382 allocation receive( actor & this, destroy_msg_t & msg ) { return Destroy; }383 allocation receive( actor & this, finished_msg_t & msg ) { return Finished; }365 message __base_msg_finished $@$= { .allocation_ : Finished }; // no auto-gen constructors 366 struct __delete_msg_t { inline message; } delete_msg = __base_msg_finished; 367 struct __destroy_msg_t { inline message; } destroy_msg = __base_msg_finished; 368 struct __finished_msg_t { inline message; } finished_msg = __base_msg_finished; 369 370 allocation receive( actor & this, __delete_msg_t & msg ) { return Delete; } 371 allocation receive( actor & this, __destroy_msg_t & msg ) { return Destroy; } 372 allocation receive( actor & this, __finished_msg_t & msg ) { return Finished; } 384 373 \end{cfa} 385 374 \caption{Builtin Convenience Messages} 386 375 \label{f:ConvenienceMessages} 387 376 \end{figure} 388 389 \subsection{Actor Termination}\label{s:ActorTerm}390 During a message send, the receiving actor and message being sent are stored via pointers in the envelope.391 These pointers are the base actor and message types, so type information of the derived actor and message is lost and must be recovered later when the typed receive routine is called.392 After the receive routine is done, the executor must clean up the actor and message according to their allocation status.393 If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor.394 This poses a problem;395 the derived type of the actor or message is not available to the executor, but it needs to call the derived destructor.!396 This requires downcasting from the base type to the derived type, which requires a virtual system.397 To accomplish the dowcast, I implemented a rudimentary destructor-only virtual system in \CFA.398 This virtual system is used via Plan-9 inheritance of the @virtual_dtor@ type, shown in Figure~\ref{f:VirtDtor}.399 The @virtual_dtor@ type maintains a pointer to the start of the object, and a pointer to the correct destructor.400 When a type inherits @virtual_dtor@, the compiler adds code to its destructor to intercepted any destructor calls along this segment of the inheritance tree and restart at the appropriate destructor for that object.401 402 \begin{figure}403 \centering404 405 \begin{lrbox}{\myboxA}406 \begin{cfa}407 struct base { inline virtual_dtor; };408 void ^?{}( base & ) { sout | "^base"; }409 struct intermediate { inline base; };410 void ^?{}( intermediate & ) { sout | "^intermediate"; }411 struct derived { inline intermediate; };412 void ^?{}( derived & ) { sout | "^derived"; }413 414 int main() {415 base & b;416 intermediate i;417 derived d1, d2, d3;418 intermediate & ri = d2;419 base & rb = d3;420 // explicit destructor calls421 ^d1{}; sout | nl;422 ^ri{}; sout | nl;423 ^rb{}; sout | nl;424 } // ^i, ^b425 \end{cfa}426 \end{lrbox}427 428 \begin{lrbox}{\myboxB}429 \begin{cfa}430 ^derived431 ^intermediate432 ^base433 434 ^derived435 ^intermediate436 ^base437 438 ^derived439 ^intermediate440 ^base441 442 ^intermediate443 ^base444 445 446 447 448 \end{cfa}449 450 \end{lrbox}451 \subfloat[Destructor calls]{\label{l:destructor_calls}\usebox\myboxA}452 \hspace*{10pt}453 \vrule454 \hspace*{10pt}455 \subfloat[Output]{\usebox\myboxB}456 457 \caption{\CFA Virtual Destructor}458 \label{f:VirtDtor}459 \end{figure}460 461 While this virtual destructor system was built for this work, it is general and can be used in any type in \CFA.462 Actors and messages opt into this system by inheriting the @virtual_dtor@ type, which allows the executor to call the right destructor without knowing the derived actor or message type.463 377 464 378 \section{\CFA Executor}\label{s:executor} … … 466 380 An executor of an actor system is the scheduler that organizes where actor behaviours are run and how messages are sent and delivered. 467 381 In \CFA, the executor is message-centric \see{Figure~\ref{f:inverted_actor}}, but extended by over sharding of a message queue \see{left side of Figure~\ref{f:gulp}}, \ie there are $M$ message queues where $M$ is greater than the number of executor threads $N$ (usually a multiple of $N$). 468 This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor \gls{fifo}message-delivery semantics.382 This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor FIFO message-delivery semantics. 469 383 The only extra overhead is each executor cycling (usually round-robin) through its $M$/$N$ queues. 470 384 The goal is to achieve better performance and scalability for certain kinds of actor applications by reducing executor locking. … … 530 444 \section{Work Stealing}\label{s:steal} 531 445 Work stealing is a scheduling strategy to provide \Newterm{load balance}. 532 The goal is to increase resource utilization by having an idle thread steal work from a working thread.533 While there are multiple parts in a work-stealing scheduler, the two important components are the stealing mechanism and victim selection.446 The goal is to increase resource utilization by having idle threads steal work from working threads. 447 While there are multiple parts in work-stealing scheduler, the two important components are victim selection and the stealing mechanism. 534 448 535 449 \subsection{Stealing Mechanism} 536 In work stealing, the stealing worker is called the \Newterm{thief} and the stolen worker is called the \Newterm{victim}. 537 % Workers consume actors from their ready queue and execute their behaviours. 538 % Through these behaviours, a worker inserts messages onto its own and other worker ready-queues. 539 To steal, a thieve takes one or more actors from a victim's ready queue, so work stealing always results in a potential increase in contention on ready queues between the victim gulping from a queue and the thief stealing the queue. 450 In work stealing, the stealing worker is called the \Newterm{thief} and the stolen-from worker is called the \Newterm{victim}. 451 The stealing mechanism presented here differs from existing work-stealing actor-systems because of the message-centric (inverted) actor-system. 452 Other actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF}, have work stealing, but use an actor-centric system where stealing is dequeuing from a non-empty ready-queue to an empty ready-queue. 453 As an example, in CAF, the sharded actor queue is a set of double-ended queues (dequeues). 454 When an actor has messages, it is inserted into a worker's dequeue (ready queue). 455 Workers then consume actors from the dequeue and execute their behaviours. 456 To steal work, thieves take one or more actors from a victim's dequeue. 457 By the pigeon hole principle, there are three dequeue operations (push/victim pop/thief pop) that can occur concurrently and only two ends to a dequeue, so work stealing in a dequeue-based system always results in a potential increase in contention on the dequeues. 540 458 This contention can slows down the victim's throughput. 541 Note, the data structure used for the ready queue is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations. 459 Note, which end of the dequeue is used for stealing, consuming, and inserting is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations. 460 461 Work steal now becomes queue stealing, where an entire actor/message queue is stolen, which trivially preserves message ordering in a queue \see{Section~\ref{s:steal}}. 542 462 543 463 % C_TODO: maybe insert stealing diagram 544 464 545 The stealing mechanism in this work differs from most work-stealing actor-systems because of the message-centric (inverted) actor-system.546 Actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF} using actor-centric systems, steal by dequeuing from a non-empty actor ready-queue and enqueue\-ing to an empty ready-queue.547 % As an example, in CAF, the sharded actor ready queue is a set of double-ended queues (dequeues).548 465 In \CFA, the actor work-stealing implementation is unique because of the message-centric system. 549 With this approach, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue.550 To ensure sequential actor execution and \gls{fifo} message delivery, actor stealing requires finding and removing \emph{all}of an actor's messages, and inserting them consecutively in another message queue.466 In this system, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue. 467 To ensure sequential actor execution and FIFO message delivery, actor stealing requires finding and removing all of an actor's messages, and inserting them consecutively in another message queue. 551 468 This operation is $O(N)$ with a non-trivial constant. 552 The only way for work stealing to become practical is to shard each worker'smessage queue, which also reduces contention, and steal queues to eliminate queue searching.553 554 Given queue stealing, my goal is to have an essentially zero-contention-cost stealing mechanism.555 This goal means work stealing has minimal affect on the performance of the victim.469 The only way for work stealing to become practical is to shard the message queue, which also reduces contention, and steal queues to eliminate queue searching. 470 471 Given queue stealing, the goal is to have a zero-victim-cost stealing mechanism, which does not mean stealing has no cost. 472 It means work stealing does not affect the performance of the victim worker. 556 473 The implication is that thieves cannot contend with a victim, and that a victim should perform no stealing related work unless it becomes a thief. 557 In theory, this goal is not achievable, but practical results show the goal is virtually achieved. 558 559 One important lesson I learned working on \uC actors~\cite{} and talking with fellow student Thierry Delisle, who examined work-stealing for user-threads in his Ph.D., is \emph{not} to aggressively steal. 560 With reasonable workloads, being a thief should be a temporary state, \ie eventually work appears on the thief's ready-queues and it returns to normal operation. 561 Furthermore, the act of \emph{looking} to find work is invasive (Heisenberg uncertainty principle), possibly disrupting multiple victims. 562 Therefore, stealing should be done lazily in case work appears for the thief and to minimize disruption of victims. 563 Note, the cost of stealing is not crucial for the thief because it does not have anything else to do except poll or block. 564 565 My lazy-stealing approach is to select a victim, scan its queues once, and return immediately if a queue is stolen. 566 Then perform a regular scan looking for work, where stolen work is placed at the end of the scan. 567 Hence, only one victim is effected and there is a reasonable delay between stealing events by scanning all the thief's ready queue looking for its own work. 568 If no work is found in the thief's queues because a steal is unsuccessful, it performs another steal from a different victim. 569 This lazy examination by the thief has a low perturbation cost for victims, while still finding work in a moderately loaded system. 570 In all work-stealing algorithms, there is the pathological case where there is too little work and too many workers; 571 this scenario can only be handled by putting workers to sleep or deleting them. 572 This case is not examined in this work. 573 574 In detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker. 575 Stealing a queue is done wait-free (\ie no busy waiting) with a few atomic instructions that only create contention with other stealing workers, not the victim. 576 The complexity in the implementation is that victim gulping does not take the mailbox queue; 577 rather it atomically transfers the mailbox queue to another list leaving the mailbox empty. 578 Hence, the original list is available for new message deliveries. 579 However, this transfer logically subdivides the mailbox, and during this period, the mailbox cannot be stolen; 580 otherwise there are two threads simultaneously running messages on actors in the two parts of the mailbox queue. 581 To solve this problem, an atomic gulp also marks the mailbox queue as subdivided, making it ineligible for stealing. 582 Hence, a thief checks for ineligible and non-empty before attempting an atomic steal of a queue, 583 584 585 586 To steal a queue, a thief does the following: 474 In theory, this goal is not achievable, but results show the goal is achieved in practice. 475 476 In \CFA's actor system, workers own a set of sharded queues, which they iterate over and gulp. 477 If a worker has iterated over its message queues twice without finding any work, it tries to steal a queue from another worker. 478 Stealing a queue is done wait-free with a few atomic instructions that can only create contention with other stealing workers, not the victim. 479 To steal a queue, a worker does the following: 587 480 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 588 481 \item 589 chooses a victim, which is trivial because there is an array of $N$ pointers to the mailboxes (copy queues) that is subdivided into $M$/$N$ worker ranges \see{Section~\ref{s:executor}}. 590 591 \item 592 scan the victim's mailbox-range and test each mail queue in that range for a non-empty queue by performing a test-for-non-empty and swap. 593 The conditional check reduces the number of atomic operations. 594 The swap races to interchange the non-empty mail-queue pointer from the victim's range with an empty mail-queue pointer in the thief's range. 595 Regardless, either a thief swaps or the victim gulps the mail queue; 596 correctness is guaranteed, no matter which thread wins the race, because both threads are cycling through appropriate lists. 597 Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers. 598 599 \item 600 stops searching after a non-empty queue steal or all queues in the random worker's range are examined. 601 The thief then returns to its scheduler and iterates over its messages queues, because new messages may have arrived during stealing, including a stolen queue. 482 The thief chooses a victim, which is trivial because all workers are stored in a shared array. 483 484 \item 485 The thief starts at a random index in the array of the victim's queues and searches for a candidate queue. 486 A candidate queue is any non-empty queue not being processed by the victim and not being stolen by another thief. 487 These rules are not strictly enforced. 488 A candidate is identified non-atomically, and as such, queues that do not satisfy these rules may be stolen. 489 However, steals not meeting the rules do not affect correctness and do not constitute failed steals as the queue is always swapped. 490 491 \item 492 Once a candidate queue is chosen, the thief attempts a wait-free swap of a victim's queue to a random empty thief queue. 493 If the swap successes, the steal is completed. 494 If the swap fails, the victim may have been gulping that message queue or another thief must have attempted to steal the victim's queue. 495 In either case, that message queue is highly likely to be empty. 496 497 \item 498 Once a thief fails or succeeds in stealing a queue, it iterates over its messages queues again because new messages may have arrived during stealing. 602 499 Stealing is only repeated after two consecutive iterations over its owned queues without finding work. 603 500 \end{enumerate} 604 This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue. 605 As well, pushes to the queues by other workers can happen concurrently during the swap since pushes happen via the actor queue references. 606 607 608 609 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 610 \item 611 selects a random worker's mailbox-range and tests each mail queue in that range for a non-empty queue. 612 613 A queue can be stolen and gulped at the same time since they are independent operations. 614 However, two workers cannot gulp from the same queue at the same time since that would violate the ordering and mutual exclusion guarantees of actors, so an avail flag is used on queues to stop this from happening. 615 If a worker tries to gulp a queue and sees an avail flag up, the gulp fails, and it will move on to other queues and try to gulp it again later. 616 stops searching after a non-empty queue steal or all queues in the random worker's range are examined. 617 It stops searching after any queue steal (even an empty one) or all queues are examined. 618 619 Stealing is only repeated after two consecutive iterations over its owned queues without finding work. 620 Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers. 621 \end{enumerate} 622 This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue. 623 The first and last sentence here are correct. I'm not sure I understand the second sentence. 624 625 626 627 As an aside I think I need a diagram here since it is clear that the algorithm is not clear through my writing. 628 I made a very quick one that I'll use here to explain, but I will work on one to add to the chapter. 629 630 631 To give some accompanying code there are two arrays, the array of copy queues, and the array of pointers to the copy queues: 632 \begin{cfa} 633 work_queue * request_queues; // master array of work request queues 634 work_queue ** worker_req_queues; // secondary array of work queues to allow for swapping 635 \end{cfa} 636 \includegraphics[width=4in]{../figures/steal_diagram.eps} 637 When a send happens, it inserts a request directly into one of the queues in @request_queues@, when a steal happens two pointers in @worker_req_queues@ are swapped. 638 Each worker owns a section of @worker_req_queues@. 639 When a worker gulps, it dereferences one of the pointers in its section of @worker_req_queues@ and then gulps from the queue it points at. 640 As such, operations can happen on the queues directly independent of stealing, which avoids almost all contention between stealing threads and busy threads. 641 642 If you want to poke some of the code, in @actor.hfa@ @steal_work()@ selects the victim, @choose_queue()@ iterates over the queues looking for non-empty queues and @try_swap_queues()@ performs the thread safe pointer swap. 643 644 % The first key to this is that actors and workers maintain two distinct arrays of references to queues. 645 % Actors will always receive messages via the same queues. 646 % Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own. 647 % Swapping queues is a matter of atomically swapping two pointers in the worker array. 648 % As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references. 649 650 % Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes. 651 % When a worker is not stealing it iterates across its own range of queues and gulps them one by one. 652 653 Correctness of gulping is slightly more complex with stealing. 654 When a worker operates on a queue, it first copies the queue pointer from the mailbox array to a local variable. 501 502 The key to the stealing mechanism is that the queues can still be operated on while they are being swapped. 503 This functionality eliminates any contention among thieves and victims. 504 505 The first key to this is that actors and workers maintain two distinct arrays of references to queues. 506 Actors will always receive messages via the same queues. 507 Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own. 508 Swapping queues is a matter of atomically swapping two pointers in the worker array. 509 As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references. 510 511 Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes. 512 When a worker is not stealing it iterates across its own range of queues and gulps them one by one. 513 When a worker operates on a queue it first copies the current pointer from the worker array of references to a local variable. 655 514 It then uses that local variable for all queue operations until it moves to the next index of its range of the queue array. 656 515 This ensures that any swaps do not interrupt gulping operations, however this introduces a correctness issue. … … 791 650 In the failed case of step 3 the program state is safely restored to its state it had prior to the @0p@ write in step 2, thanks to the invariant that makes the write back to the @0p@ pointer safe. 792 651 793 \begin{comment}794 652 \subsection{Stealing Guarantees} 795 653 Given that the stealing operation can potentially fail, it is important to discuss the guarantees provided by the stealing implementation. … … 896 754 Thus, by contrapositive, if the graph contains a cycle then there exists a situation where no swaps occur. 897 755 Hence, at least one swap is guaranteed to succeed if and only if the graph does not contain a cycle. 898 \end{comment}899 756 900 757 % C_TODO: go through and use \paragraph to format to make it look nicer … … 975 832 976 833 Another productivity feature that is included is a group of poison-pill messages. 977 Poison-pill messages are common across actor systems, including Akka and ProtoActor\cite{Akka,ProtoActor}.834 Poison-pill messages are common across actor systems, and are used in actor libraries Akka and ProtoActor~\cite{Akka,ProtoActor}. 978 835 Poison-pill messages inform an actor to terminate. 979 836 In \CFA, due to the allocation of actors and lack of garbage collection, there needs to be a suite of poison-pills. … … 1019 876 & \multicolumn{1}{c|}{\CFA (100M)} & \multicolumn{1}{c|}{CAF (10M)} & \multicolumn{1}{c|}{Akka (100M)} & \multicolumn{1}{c|}{\uC (100M)} & \multicolumn{1}{c@{}}{ProtoActor (100M)} \\ 1020 877 \hline 1021 AMD & \input{data/ pykeSendStatic} \\878 AMD & \input{data/nasusSendStatic} \\ 1022 879 \hline 1023 Intel & \input{data/ nasusSendStatic}880 Intel & \input{data/pykeSendStatic} 1024 881 \end{tabular} 1025 882 … … 1032 889 & \multicolumn{1}{c|}{\CFA (20M)} & \multicolumn{1}{c|}{CAF (2M)} & \multicolumn{1}{c|}{Akka (2M)} & \multicolumn{1}{c|}{\uC (20M)} & \multicolumn{1}{c@{}}{ProtoActor (2M)} \\ 1033 890 \hline 1034 AMD & \input{data/ pykeSendDynamic} \\891 AMD & \input{data/nasusSendDynamic} \\ 1035 892 \hline 1036 Intel & \input{data/ nasusSendDynamic}893 Intel & \input{data/pykeSendDynamic} 1037 894 \end{tabular} 1038 895 \end{table} … … 1054 911 The results from the static/dynamic send benchmarks are shown in Figures~\ref{t:StaticActorMessagePerformance} and \ref{t:DynamicActorMessagePerformance} respectively. 1055 912 \CFA leads the charts in both benchmarks, largely due to the copy queue removing the majority of the envelope allocations. 1056 Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incur s a compile-time cost.913 Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incurrs a compile-time cost. 1057 914 All the other systems use their virtual system to find the correct behaviour at message send. 1058 915 This requires two virtual dispatch operations, which is an additional runtime send cost that \CFA does not have. … … 1199 1056 1200 1057 This result is shown in Figure~\ref{f:cfaRepeatAMD} and \ref{f:cfaRepeatIntel} where the no-stealing version of \CFA performs better than both stealing variations. 1201 In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and \CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing).1058 In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing). 1202 1059 The shift for CAF is particularly large, which further supports the hypothesis that CAF's work stealing is particularly eager. 1203 1060 In both the executor and the repeat benchmark CAF performs poorly. … … 1226 1083 1227 1084 Figure~\ref{t:ExecutorMemory} shows the high memory watermark of the actor systems when running the executor benchmark on 48 cores. 1228 \CFA has a high watermark relative to the other non-garbage 1085 \CFA has a high watermark relative to the other non-garbage-collected systems \uC, and CAF. 1229 1086 This is a result of the copy queue data structure, as it will over-allocate storage and not clean up eagerly, whereas the per envelope allocations will always allocate exactly the amount of storage needed. 1230 1087 Despite having a higher watermark, the \CFA memory usage remains comparable to other non-garbage-collected systems.
Note:
See TracChangeset
for help on using the changeset viewer.