Changeset 8909a2d for doc/theses/colby_parsons_MMAth/text
- Timestamp:
- Jul 3, 2023, 1:43:48 PM (21 months ago)
- Branches:
- master
- Children:
- 96ea77a
- Parents:
- 1ae3ac46
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
TabularUnified doc/theses/colby_parsons_MMAth/text/actors.tex ¶
r1ae3ac46 r8909a2d 11 11 12 12 \section{Actor Model} 13 The actor modelis a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}.13 The \Newterm{actor model} is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}. 14 14 An actor is composed of a \Newterm{mailbox} (message queue) and a set of \Newterm{behaviours} that receive from the mailbox to perform work. 15 15 Actors execute asynchronously upon receiving a message and can modify their own state, make decisions, spawn more actors, and send messages to other actors. … … 20 20 An actor is executed by an underlying \Newterm{executor} (kernel thread-pool) that fairly invokes each actor, where an actor invocation processes one or more messages from its mailbox. 21 21 The default number of executor threads is often proportional to the number of computer cores to achieve good performance. 22 An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s: CFAActor}}.22 An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s:ActorSystem}}. 23 23 24 24 \subsection{Classic Actor System} 25 An implementation of the actor model with a community of actors is called an actor system.25 An implementation of the actor model with a community of actors is called an \Newterm{actor system}. 26 26 Actor systems largely follow the actor model, but can differ in some ways. 27 27 While the semantics of message \emph{send} is asynchronous, the implementation may be synchronous or a combination. 28 The default semantics for message \emph{receive} is FIFO, so an actor receives messages from its mailbox in temporal (arrival) order;28 The default semantics for message \emph{receive} is \gls{fifo}, so an actor receives messages from its mailbox in temporal (arrival) order; 29 29 however, messages sent among actors arrive in any order. 30 30 Some actor systems provide priority-based mailboxes and/or priority-based message-selection within a mailbox, where custom message dispatchers search among or within a mailbox(es) with a predicate for specific kinds of actors and/or messages. 31 31 Some actor systems provide a shared mailbox where multiple actors receive from a common mailbox~\cite{Akka}, which is contrary to the no-sharing design of the basic actor-model (and requires additional locking). 32 For non- FIFOservice, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received.33 Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism (see Section~\ref{s:SafetyProductivity}).32 For non-\gls{fifo} service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received. 33 Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism \see{Section~\ref{s:SafetyProductivity}}). 34 34 %While the definition of the actor model provides no restrictions on message ordering, actor systems tend to guarantee that messages sent from a given actor $i$ to actor $j$ will arrive at actor $j$ in the order they were sent. 35 35 Another way an actor system varies from the model is allowing access to shared global-state. … … 49 49 The more common design is to \Newterm{shard} the single queue among the executor threads, where actors are permanently assigned or can float among the queues. 50 50 Sharding significantly decreases contention among executor threads adding and removing actors to/from a queue. 51 Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors can attachmessages.52 When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current work uniton the message(s).51 Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors add messages. 52 When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current behaviour on the message(s). 53 53 54 54 % cite parallel theatre and our paper 55 55 Figure \ref{f:inverted_actor} shows an actor system designed as \Newterm{message-centric}, where a set of messages are scheduled and run on underlying executor threads~\cite{uC++,Nigro21}. 56 This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor. 57 Now a message send must queries the actor to know which message queue to post the message. 56 58 Again, the simplest design has a single global queue of messages accessed by the executor threads, but this approach has the same contention problem by the executor threads. 57 59 Therefore, the messages (mailboxes) are sharded and executor threads schedule each message, which points to its corresponding actor. 58 Here, an actor's messages are permanently assigned to one queue to ensure FIFO receiving and/or reduce searching for specific actor/messages. 59 Since multiple actors belong to each message queue, actor messages are interleaved on a queue. 60 This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor. 60 Here, an actor's messages are permanently assigned to one queue to ensure \gls{fifo} receiving and/or reduce searching for specific actor/messages. 61 Since multiple actors belong to each message queue, actor messages are interleaved on a queue, but individually in FIFO order. 61 62 % In this inverted actor system instead of each executor threads owning a queue of actors, they each own a queue of messages. 62 63 % In this scheme work is consumed from their queue and executed by underlying threads. … … 65 66 % The arrows from the message queues to the actors in the diagram indicate interleaved messages addressed to each actor. 66 67 67 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{ }, and adds the following contributions related to \CFA:68 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{Buhr22} but implemented in \CFA, and adds the following new \CFA contributions: 68 69 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 69 70 \item 70 Provide insight into the impact of envelope allocation in actor systems .71 Provide insight into the impact of envelope allocation in actor systems \see{Section~\ref{s:envelope}}. 71 72 In all actor systems, dynamic allocation is needed to ensure the lifetime of a unit of work persists from its creation until the unit of work is executed. 72 73 This allocation is often called an \Newterm{envelope} as it ``packages'' the information needed to run the unit of work, alongside any other information needed to send the unit of work, such as an actor's address or link fields. … … 126 127 \end{cfa} 127 128 \end{lrbox} 129 128 130 \subfloat[dynamic typing]{\label{l:dynamic_style}\usebox\myboxA} 129 131 \hspace*{10pt} … … 177 179 \end{figure} 178 180 179 Figure~\ref{f:CFAActor} shows a complete \CFA actor example starting with the actor type @my_actor@ created by defining a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword. 181 Figure~\ref{f:CFAActor} shows a complete \CFA actor example, which is discussed in detail. 182 The actor type @my_actor@ is a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword. 180 183 This inheritance style is the Plan-9 C-style inheritance discussed in Section~\ref{s:Inheritance}. 181 Similarly, the message types @str_msg@ and @int_msg@ are created by defining a @struct@that inherits from the base @message@ @struct@ via the @inline@ keyword.184 Similarly, the message types @str_msg@ and @int_msg@ are @struct@s that inherits from the base @message@ @struct@ via the @inline@ keyword. 182 185 Only @str_msg@ needs a constructor to copy the C string; 183 186 @int_msg@ is initialized using its \CFA auto-generated constructors. … … 187 190 The program main begins by creating two messages on the stack. 188 191 Then the executor system is started by calling @start_actor_system@. 189 Now an actor is created on the stack and four messages are sent it using operator @?|?@.190 The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to remove sthe actor from the actor system \see{Section~\ref{s:ActorBehaviours}}.192 Now an actor is created on the stack and four messages are sent to it using operator @?|?@. 193 The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to remove the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}. 191 194 The call to @stop_actor_system@ blocks the program main until all actors are finished and removed from the actor system. 192 195 The program main ends by deleting the actor and two messages from the stack. … … 199 202 200 203 \subsection{Actor Behaviours}\label{s:ActorBehaviours} 201 In general, a behaviour for some derived actor and derived message type is defined with following signature:204 In general, a behaviour for some derived actor and derived message type is defined with the following signature: 202 205 \begin{cfa} 203 206 allocation receive( my_actor & receiver, my_msg & msg ) … … 213 216 Message state is updated via a call to: 214 217 \begin{cfa} 215 void set_allocation( message & this, allocation state ) 218 void set_allocation( message & this, allocation state ); 216 219 \end{cfa} 217 220 … … 220 223 \noindent@Nodelete@ 221 224 tells the executor that no action is to be taken with regard to an actor or message. 222 This status is used when an actor continues receiving messages or a message may bereused.225 This status is used when an actor continues receiving messages or a message is reused. 223 226 224 227 \noindent@Delete@ … … 233 236 tells the executor to mark the respective actor as finished executing, but not call the object's destructor nor deallocate the object. 234 237 This status is used when actors or messages are global or stack allocated, or a programmer wants to manage deallocation themselves. 238 Note, for messages, there is no difference between allocations @Nodelete@ and @Finished@ because both tell the executor to do nothing to the message. 239 Hence, @Finished@ is implicitly changed to @Nodelete@ in a message constructor, and @Nodelete@ is used internally for message error-checking \see{Section~\ref{s:SafetyProductivity}}. 240 Therefore, reading a message's allocation status after setting to @Finished@ may be either @Nodelete@ (after construction) or @Finished@ (after explicitly setting using @set_allocation@). 235 241 236 242 For the actor system to terminate, all actors must have returned a status other than @Nodelete@. 237 243 After an actor is terminated, it is erroneous to send messages to it. 238 244 Similarly, after a message is terminated, it cannot be sent to an actor. 239 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action aftera behaviour returns.245 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action \emph{after} a behaviour returns. 240 246 241 247 \subsection{Actor Envelopes}\label{s:envelope} 242 248 As stated, each message, regardless of where it is allocated, can be sent to an arbitrary number of actors, and hence, appear on an arbitrary number of message queues. 243 Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies .244 Therefore, it up to the actor program to manage message life-time across receives.249 Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies? 250 Therefore, it is up to the actor program to manage message life-time across receives. 245 251 However, for a message to appear on multiple message queues, it needs an arbitrary number of associated destination behaviours. 246 252 Hence, there is the concept of an envelop, which is dynamically allocated on each send, that wraps a message with any extra implementation fields needed to persist between send and receive. 247 253 Managing the envelop is straightforward because it is created at the send and deleted after the receive, \ie there is 1:1 relationship for an envelop and a many to one relationship for a message. 254 255 \PAB{Do you need to say more here?} 248 256 249 257 % In actor systems, messages are sent and received by actors. … … 275 283 \noindent@void start_actor_system( executor & this )@ 276 284 allows the programmer to explicitly create and configure an executor for use by the actor system. 277 Executor configuration options includeare discussed in Section~\ref{s:executor}.285 Executor configuration options are discussed in Section~\ref{s:executor}. 278 286 279 287 \noindent … … 282 290 \subsection{Actor Send}\label{s:ActorSend} 283 291 All message sends are done using the vertical-bar (bit-or) operator, @?|?@, similar to the syntax of the \CFA stream I/O. 284 Hence, programmers must write a matching @?|?@ routine for each @receive@ routine, which is awkward and generates a maintenance problem that must be solved. 285 \CFA's current best approach for creating a generic @?|?@ routine requires users to create routines for their actor and message types that access the base type. 286 Since these routines are not complex, they can be generated using macros that are used to annotate the user-defined actor and message types. 287 This approach is used in \CFA's intrusive list data structure. 288 This is not much better than asking users to write the @?|?@ routine themselves in terms of maintenance, since the user needs to remember the boilerplate macro annotation. 289 290 As stated, \CFA does not have named inheritance with RTTI. 291 \CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work. 292 Virtuals would provide a mechanism to write a single generic @?|?@ routine taking a base actor and message type, that would dynamically select the @receive@ routine from the actor argument. 293 Note, virtuals are not needed for the send; Plan-9 inheritance is sufficient because only the inherited fields are needed during the message send (only upcasting is needed). 294 295 Therefore, a template-like approach was chosen, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature. 296 This approach requires no annotation or additional code to be written by users, thus it resolves the maintenance problem. 297 (When the \CFA virtual routines mature, it should be possible to seamlessly transition to it from the template approach.) 292 One way to provide this operator is through the \CFA type system: 293 \begin{cfa} 294 actor & ?|?( actor &, message & ) { // base actor and message types 295 // boilerplate to send message to executor mail queue 296 } 297 actor | str_msg | int_msg; // rewritten: ?|?( ?|?( actor, int_msg ), str_msg ) 298 \end{cfa} 299 In the \CFA type system, calls to this routine work for any pair of parameters that inherit from the @actor@ and @message@ types via Plan-9 inheritance. 300 However, within the body the routine, all type information about the derived actor and message is lost (type erasure), so this approach is unable to find the right @receive@ routine to put in the envelope. 301 302 If \CFA had a fully-fledged virtual system, the generic @?|?@ routine would work, since the virtual system could dynamically select the derived @receive@ routine via virtual dispatch. 303 \CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work, so a different approach is needed. 304 305 Without virtuals, the idiomatic \CFA way to create the generic @?|?@ routine is using @forall@: 306 \begin{cfa} 307 // forall types A, M that have a receive that returns allocation 308 forall( A &, M & | { allocation receive( A &, M & ); } ) 309 A & ?|?( A &, M & ) { // actor and message types 310 // boilerplate to send message to executor mail queue 311 } 312 \end{cfa} 313 This approach should work. 314 However, the \CFA type system is still a work in progress, and there is a nontrivial bug where inherited routines are not recognized by @forall@. 315 For example, Figure~\ref{f:type_problem} shows type @B@ has an inherited @foo@ routine through type @A@ and should find the @bar@ routine defined via the @forall@, but does not due the type-system bug. 316 317 \begin{figure} 318 \begin{cfa} 319 struct A {}; 320 struct B { inline A; } 321 void foo( A & a ) { ... } 322 323 // for all types that have a foo routine here is a bar routine 324 forall( T & | { void foo( T & ); } ) 325 void bar( T & t ) { ... } 326 327 int main() { 328 B b; 329 foo( b ); // B has a foo so it should find a bar via the forall 330 bar( b ); // compilation error, no bar found for type B 331 } 332 \end{cfa} 333 \caption{\CFA Type-System Problem} 334 \label{f:type_problem} 335 \end{figure} 336 337 Users could be expected to write the @?|?@ routines, but this approach is error prone and creates maintenance issues. 338 Until the \CFA type-system matures, I created a workaround using a template-like approach, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature. 339 This workaround is outside of the type system, but performing a type-system like action. 340 The workaround requires no annotation or additional code to be written by users; 341 thus, it resolves the maintenance and error problems. 342 It should be possible to seamlessly transition the workaround into any updated version of the \CFA type-system. 298 343 299 344 Figure~\ref{f:send_gen} shows the generated send routine for the @int_msg@ receive in Figure~\ref{f:CFAActor}. 300 345 Operator @?|?@ has the same parameter signature as the corresponding @receive@ routine and returns an @actor@ so the operator can be cascaded. 301 346 The routine sets @rec_fn@ to the matching @receive@ routine using the left-hand type to perform the selection. 302 Then the routine packages the base and derived actor and message and actor, along with the receive routine into an \hyperref[s:envelope]{envelope}.347 Then the routine packages the actor and message, along with the receive routine into an envelope. 303 348 Finally, the envelop is added to the executor queue designated by the actor using the executor routine @send@. 304 349 … … 306 351 \begin{cfa} 307 352 $\LstCommentStyle{// from Figure~\ref{f:CFAActor}}$ 308 struct my_actor { inline actor; }; 309 struct int_msg { inline message; int i; }; 353 struct my_actor { inline actor; }; $\C[3.75in]{// actor}$ 354 struct int_msg { inline message; int i; }; $\C{// message}$ 310 355 allocation receive( @my_actor &, int_msg & msg@ ) {...} $\C{// receiver}$ 311 356 … … 314 359 actor & ?|?( @my_actor & receiver, int_msg & msg@ ) { 315 360 allocation (*rec_fn)( my_actor &, int_msg & ) = @receive@; // deduce receive routine 316 request req{ &receiver, (actor *)&receiver, &msg, (message *)&msg, (receive_t)rec_fn };317 send( receiver, req ); 361 request req{ (actor *)&receiver, (message *)&msg, (receive_t)rec_fn }; 362 send( receiver, req ); $\C{// queue message for execution}\CRT$ 318 363 return receiver; 319 364 } … … 323 368 \end{figure} 324 369 370 Figure~\ref{f:ConvenienceMessages} shows three builtin convenience messages and receive routines used to terminate actors, depending on how an actor is allocated: @Delete@, @Destroy@ or @Finished@. 371 For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor. 372 Note, assignment is used to initialize these messages rather than constructors because the constructor changes the allocation to @Nodelete@ for error checking 373 374 \begin{figure} 375 \begin{cfa} 376 message __base_msg_finished $@$= { .allocation_ : Finished }; 377 struct delete_msg_t { inline message; } delete_msg = __base_msg_finished; 378 struct destroy_msg_t { inline message; } destroy_msg = __base_msg_finished; 379 struct finished_msg_t { inline message; } finished_msg = __base_msg_finished; 380 381 allocation receive( actor & this, delete_msg_t & msg ) { return Delete; } 382 allocation receive( actor & this, destroy_msg_t & msg ) { return Destroy; } 383 allocation receive( actor & this, finished_msg_t & msg ) { return Finished; } 384 \end{cfa} 385 \caption{Builtin Convenience Messages} 386 \label{f:ConvenienceMessages} 387 \end{figure} 388 325 389 \subsection{Actor Termination}\label{s:ActorTerm} 326 390 During a message send, the receiving actor and message being sent are stored via pointers in the envelope. 327 These pointers have the base actor and message types, so type information of the actor and message is lost and thenrecovered later when the typed receive routine is called.391 These pointers are the base actor and message types, so type information of the derived actor and message is lost and must be recovered later when the typed receive routine is called. 328 392 After the receive routine is done, the executor must clean up the actor and message according to their allocation status. 329 393 If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor. 330 This poses a problem; the correct type of the actor or message is not available to the executor, but it needs to call the right destructor! 331 This requires downcasting from the base type to derived type, which requires a virtual system. 332 Thus, a rudimentary destructor-only virtual system was added to \CFA as part of this work. 394 This poses a problem; 395 the derived type of the actor or message is not available to the executor, but it needs to call the derived destructor.! 396 This requires downcasting from the base type to the derived type, which requires a virtual system. 397 To accomplish the dowcast, I implemented a rudimentary destructor-only virtual system in \CFA. 333 398 This virtual system is used via Plan-9 inheritance of the @virtual_dtor@ type, shown in Figure~\ref{f:VirtDtor}. 334 399 The @virtual_dtor@ type maintains a pointer to the start of the object, and a pointer to the correct destructor. 335 When a type inherits the @virtual_dtor@ type, the compiler adds code to its destructor to make sure that any destructor calls along this segment of the inheritance tree is called are intercepted, and restart at the appropriate destructor for that object. 336 337 \begin{figure} 338 \begin{cfa} 339 struct base_type { inline virtual_dtor; }; 340 struct intermediate_type { inline base_type; }; 341 struct derived_type { inline intermediate_type; }; 400 When a type inherits @virtual_dtor@, the compiler adds code to its destructor to intercepted any destructor calls along this segment of the inheritance tree and restart at the appropriate destructor for that object. 401 402 \begin{figure} 403 \centering 404 405 \begin{lrbox}{\myboxA} 406 \begin{cfa} 407 struct base { inline virtual_dtor; }; 408 void ^?{}( base & ) { sout | "^base"; } 409 struct intermediate { inline base; }; 410 void ^?{}( intermediate & ) { sout | "^intermediate"; } 411 struct derived { inline intermediate; }; 412 void ^?{}( derived & ) { sout | "^derived"; } 342 413 343 414 int main() { 344 derived_type d1, d2, d3; 345 intermediate_type & i = d2; 346 base_type & b = d3; 347 348 // these will call the destructors in the correct order 349 ^d1{}; ^i{}; ^b{}; 350 } 351 352 \end{cfa} 415 base & b; 416 intermediate i; 417 derived d1, d2, d3; 418 intermediate & ri = d2; 419 base & rb = d3; 420 // explicit destructor calls 421 ^d1{}; sout | nl; 422 ^ri{}; sout | nl; 423 ^rb{}; sout | nl; 424 } // ^i, ^b 425 \end{cfa} 426 \end{lrbox} 427 428 \begin{lrbox}{\myboxB} 429 \begin{cfa} 430 ^derived 431 ^intermediate 432 ^base 433 434 ^derived 435 ^intermediate 436 ^base 437 438 ^derived 439 ^intermediate 440 ^base 441 442 ^intermediate 443 ^base 444 445 446 447 448 \end{cfa} 449 450 \end{lrbox} 451 \subfloat[Destructor calls]{\label{l:destructor_calls}\usebox\myboxA} 452 \hspace*{10pt} 453 \vrule 454 \hspace*{10pt} 455 \subfloat[Output]{\usebox\myboxB} 456 353 457 \caption{\CFA Virtual Destructor} 354 458 \label{f:VirtDtor} 355 459 \end{figure} 356 460 357 This virtual destructor system was built for this work, but is general and can be used in any type in \CFA.461 While this virtual destructor system was built for this work, it is general and can be used in any type in \CFA. 358 462 Actors and messages opt into this system by inheriting the @virtual_dtor@ type, which allows the executor to call the right destructor without knowing the derived actor or message type. 359 360 Figure~\ref{f:ConvenienceMessages} shows three builtin convenience messages and receive routines used to terminate actors, depending on how an actor is allocated: @Delete@, @Destroy@ or @Finished@.361 For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor.362 363 \begin{figure}364 \begin{cfa}365 message __base_msg_finished $@$= { .allocation_ : Finished }; // no auto-gen constructors366 struct __delete_msg_t { inline message; } delete_msg = __base_msg_finished;367 struct __destroy_msg_t { inline message; } destroy_msg = __base_msg_finished;368 struct __finished_msg_t { inline message; } finished_msg = __base_msg_finished;369 370 allocation receive( actor & this, __delete_msg_t & msg ) { return Delete; }371 allocation receive( actor & this, __destroy_msg_t & msg ) { return Destroy; }372 allocation receive( actor & this, __finished_msg_t & msg ) { return Finished; }373 \end{cfa}374 \caption{Builtin Convenience Messages}375 \label{f:ConvenienceMessages}376 \end{figure}377 463 378 464 \section{\CFA Executor}\label{s:executor} … … 380 466 An executor of an actor system is the scheduler that organizes where actor behaviours are run and how messages are sent and delivered. 381 467 In \CFA, the executor is message-centric \see{Figure~\ref{f:inverted_actor}}, but extended by over sharding of a message queue \see{left side of Figure~\ref{f:gulp}}, \ie there are $M$ message queues where $M$ is greater than the number of executor threads $N$ (usually a multiple of $N$). 382 This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor FIFOmessage-delivery semantics.468 This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor \gls{fifo} message-delivery semantics. 383 469 The only extra overhead is each executor cycling (usually round-robin) through its $M$/$N$ queues. 384 470 The goal is to achieve better performance and scalability for certain kinds of actor applications by reducing executor locking. … … 444 530 \section{Work Stealing}\label{s:steal} 445 531 Work stealing is a scheduling strategy to provide \Newterm{load balance}. 446 The goal is to increase resource utilization by having idle threads steal work from working threads.447 While there are multiple parts in work-stealing scheduler, the two important components are victim selection and the stealing mechanism.532 The goal is to increase resource utilization by having an idle thread steal work from a working thread. 533 While there are multiple parts in a work-stealing scheduler, the two important components are the stealing mechanism and victim selection. 448 534 449 535 \subsection{Stealing Mechanism} 450 In work stealing, the stealing worker is called the \Newterm{thief} and the stolen-from worker is called the \Newterm{victim}. 451 The stealing mechanism presented here differs from existing work-stealing actor-systems because of the message-centric (inverted) actor-system. 452 Other actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF}, have work stealing, but use an actor-centric system where stealing is dequeuing from a non-empty ready-queue to an empty ready-queue. 453 As an example, in CAF, the sharded actor queue is a set of double-ended queues (dequeues). 454 When an actor has messages, it is inserted into a worker's dequeue (ready queue). 455 Workers then consume actors from the dequeue and execute their behaviours. 456 To steal work, thieves take one or more actors from a victim's dequeue. 457 By the pigeon hole principle, there are three dequeue operations (push/victim pop/thief pop) that can occur concurrently and only two ends to a dequeue, so work stealing in a dequeue-based system always results in a potential increase in contention on the dequeues. 536 In work stealing, the stealing worker is called the \Newterm{thief} and the stolen worker is called the \Newterm{victim}. 537 % Workers consume actors from their ready queue and execute their behaviours. 538 % Through these behaviours, a worker inserts messages onto its own and other worker ready-queues. 539 To steal, a thieve takes one or more actors from a victim's ready queue, so work stealing always results in a potential increase in contention on ready queues between the victim gulping from a queue and the thief stealing the queue. 458 540 This contention can slows down the victim's throughput. 459 Note, which end of the dequeue is used for stealing, consuming, and inserting is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations. 460 461 Work steal now becomes queue stealing, where an entire actor/message queue is stolen, which trivially preserves message ordering in a queue \see{Section~\ref{s:steal}}. 541 Note, the data structure used for the ready queue is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations. 462 542 463 543 % C_TODO: maybe insert stealing diagram 464 544 545 The stealing mechanism in this work differs from most work-stealing actor-systems because of the message-centric (inverted) actor-system. 546 Actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF} using actor-centric systems, steal by dequeuing from a non-empty actor ready-queue and enqueue\-ing to an empty ready-queue. 547 % As an example, in CAF, the sharded actor ready queue is a set of double-ended queues (dequeues). 465 548 In \CFA, the actor work-stealing implementation is unique because of the message-centric system. 466 In this system, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue.467 To ensure sequential actor execution and FIFO message delivery, actor stealing requires finding and removing allof an actor's messages, and inserting them consecutively in another message queue.549 With this approach, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue. 550 To ensure sequential actor execution and \gls{fifo} message delivery, actor stealing requires finding and removing \emph{all} of an actor's messages, and inserting them consecutively in another message queue. 468 551 This operation is $O(N)$ with a non-trivial constant. 469 The only way for work stealing to become practical is to shard themessage queue, which also reduces contention, and steal queues to eliminate queue searching.470 471 Given queue stealing, the goal is to have a zero-victim-cost stealing mechanism, which does not mean stealing has no cost.472 It means work stealing does not affect the performance of the victim worker.552 The only way for work stealing to become practical is to shard each worker's message queue, which also reduces contention, and steal queues to eliminate queue searching. 553 554 Given queue stealing, my goal is to have an essentially zero-contention-cost stealing mechanism. 555 This goal means work stealing has minimal affect on the performance of the victim. 473 556 The implication is that thieves cannot contend with a victim, and that a victim should perform no stealing related work unless it becomes a thief. 474 In theory, this goal is not achievable, but results show the goal is achieved in practice. 475 476 In \CFA's actor system, workers own a set of sharded queues, which they iterate over and gulp. 477 If a worker has iterated over its message queues twice without finding any work, it tries to steal a queue from another worker. 478 Stealing a queue is done wait-free with a few atomic instructions that can only create contention with other stealing workers, not the victim. 479 To steal a queue, a worker does the following: 557 In theory, this goal is not achievable, but practical results show the goal is virtually achieved. 558 559 One important lesson I learned working on \uC actors~\cite{} and talking with fellow student Thierry Delisle, who examined work-stealing for user-threads in his Ph.D., is \emph{not} to aggressively steal. 560 With reasonable workloads, being a thief should be a temporary state, \ie eventually work appears on the thief's ready-queues and it returns to normal operation. 561 Furthermore, the act of \emph{looking} to find work is invasive (Heisenberg uncertainty principle), possibly disrupting multiple victims. 562 Therefore, stealing should be done lazily in case work appears for the thief and to minimize disruption of victims. 563 Note, the cost of stealing is not crucial for the thief because it does not have anything else to do except poll or block. 564 565 My lazy-stealing approach is to select a victim, scan its queues once, and return immediately if a queue is stolen. 566 Then perform a regular scan looking for work, where stolen work is placed at the end of the scan. 567 Hence, only one victim is effected and there is a reasonable delay between stealing events by scanning all the thief's ready queue looking for its own work. 568 If no work is found in the thief's queues because a steal is unsuccessful, it performs another steal from a different victim. 569 This lazy examination by the thief has a low perturbation cost for victims, while still finding work in a moderately loaded system. 570 In all work-stealing algorithms, there is the pathological case where there is too little work and too many workers; 571 this scenario can only be handled by putting workers to sleep or deleting them. 572 This case is not examined in this work. 573 574 In detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker. 575 Stealing a queue is done wait-free (\ie no busy waiting) with a few atomic instructions that only create contention with other stealing workers, not the victim. 576 The complexity in the implementation is that victim gulping does not take the mailbox queue; 577 rather it atomically transfers the mailbox queue to another list leaving the mailbox empty. 578 Hence, the original list is available for new message deliveries. 579 However, this transfer logically subdivides the mailbox, and during this period, the mailbox cannot be stolen; 580 otherwise there are two threads simultaneously running messages on actors in the two parts of the mailbox queue. 581 To solve this problem, an atomic gulp also marks the mailbox queue as subdivided, making it ineligible for stealing. 582 Hence, a thief checks for ineligible and non-empty before attempting an atomic steal of a queue, 583 584 585 586 To steal a queue, a thief does the following: 480 587 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 481 588 \item 482 The thief chooses a victim, which is trivial because all workers are stored in a shared array. 483 484 \item 485 The thief starts at a random index in the array of the victim's queues and searches for a candidate queue. 486 A candidate queue is any non-empty queue not being processed by the victim and not being stolen by another thief. 487 These rules are not strictly enforced. 488 A candidate is identified non-atomically, and as such, queues that do not satisfy these rules may be stolen. 489 However, steals not meeting the rules do not affect correctness and do not constitute failed steals as the queue is always swapped. 490 491 \item 492 Once a candidate queue is chosen, the thief attempts a wait-free swap of a victim's queue to a random empty thief queue. 493 If the swap successes, the steal is completed. 494 If the swap fails, the victim may have been gulping that message queue or another thief must have attempted to steal the victim's queue. 495 In either case, that message queue is highly likely to be empty. 496 497 \item 498 Once a thief fails or succeeds in stealing a queue, it iterates over its messages queues again because new messages may have arrived during stealing. 589 chooses a victim, which is trivial because there is an array of $N$ pointers to the mailboxes (copy queues) that is subdivided into $M$/$N$ worker ranges \see{Section~\ref{s:executor}}. 590 591 \item 592 scan the victim's mailbox-range and test each mail queue in that range for a non-empty queue by performing a test-for-non-empty and swap. 593 The conditional check reduces the number of atomic operations. 594 The swap races to interchange the non-empty mail-queue pointer from the victim's range with an empty mail-queue pointer in the thief's range. 595 Regardless, either a thief swaps or the victim gulps the mail queue; 596 correctness is guaranteed, no matter which thread wins the race, because both threads are cycling through appropriate lists. 597 Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers. 598 599 \item 600 stops searching after a non-empty queue steal or all queues in the random worker's range are examined. 601 The thief then returns to its scheduler and iterates over its messages queues, because new messages may have arrived during stealing, including a stolen queue. 499 602 Stealing is only repeated after two consecutive iterations over its owned queues without finding work. 500 603 \end{enumerate} 501 502 The key to the stealing mechanism is that the queues can still be operated on while they are being swapped. 503 This functionality eliminates any contention among thieves and victims. 504 505 The first key to this is that actors and workers maintain two distinct arrays of references to queues. 506 Actors will always receive messages via the same queues. 507 Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own. 508 Swapping queues is a matter of atomically swapping two pointers in the worker array. 509 As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references. 510 511 Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes. 512 When a worker is not stealing it iterates across its own range of queues and gulps them one by one. 513 When a worker operates on a queue it first copies the current pointer from the worker array of references to a local variable. 604 This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue. 605 As well, pushes to the queues by other workers can happen concurrently during the swap since pushes happen via the actor queue references. 606 607 608 609 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 610 \item 611 selects a random worker's mailbox-range and tests each mail queue in that range for a non-empty queue. 612 613 A queue can be stolen and gulped at the same time since they are independent operations. 614 However, two workers cannot gulp from the same queue at the same time since that would violate the ordering and mutual exclusion guarantees of actors, so an avail flag is used on queues to stop this from happening. 615 If a worker tries to gulp a queue and sees an avail flag up, the gulp fails, and it will move on to other queues and try to gulp it again later. 616 stops searching after a non-empty queue steal or all queues in the random worker's range are examined. 617 It stops searching after any queue steal (even an empty one) or all queues are examined. 618 619 Stealing is only repeated after two consecutive iterations over its owned queues without finding work. 620 Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers. 621 \end{enumerate} 622 This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue. 623 The first and last sentence here are correct. I'm not sure I understand the second sentence. 624 625 626 627 As an aside I think I need a diagram here since it is clear that the algorithm is not clear through my writing. 628 I made a very quick one that I'll use here to explain, but I will work on one to add to the chapter. 629 630 631 To give some accompanying code there are two arrays, the array of copy queues, and the array of pointers to the copy queues: 632 \begin{cfa} 633 work_queue * request_queues; // master array of work request queues 634 work_queue ** worker_req_queues; // secondary array of work queues to allow for swapping 635 \end{cfa} 636 \includegraphics[width=4in]{../figures/steal_diagram.eps} 637 When a send happens, it inserts a request directly into one of the queues in @request_queues@, when a steal happens two pointers in @worker_req_queues@ are swapped. 638 Each worker owns a section of @worker_req_queues@. 639 When a worker gulps, it dereferences one of the pointers in its section of @worker_req_queues@ and then gulps from the queue it points at. 640 As such, operations can happen on the queues directly independent of stealing, which avoids almost all contention between stealing threads and busy threads. 641 642 If you want to poke some of the code, in @actor.hfa@ @steal_work()@ selects the victim, @choose_queue()@ iterates over the queues looking for non-empty queues and @try_swap_queues()@ performs the thread safe pointer swap. 643 644 % The first key to this is that actors and workers maintain two distinct arrays of references to queues. 645 % Actors will always receive messages via the same queues. 646 % Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own. 647 % Swapping queues is a matter of atomically swapping two pointers in the worker array. 648 % As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references. 649 650 % Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes. 651 % When a worker is not stealing it iterates across its own range of queues and gulps them one by one. 652 653 Correctness of gulping is slightly more complex with stealing. 654 When a worker operates on a queue, it first copies the queue pointer from the mailbox array to a local variable. 514 655 It then uses that local variable for all queue operations until it moves to the next index of its range of the queue array. 515 656 This ensures that any swaps do not interrupt gulping operations, however this introduces a correctness issue. … … 650 791 In the failed case of step 3 the program state is safely restored to its state it had prior to the @0p@ write in step 2, thanks to the invariant that makes the write back to the @0p@ pointer safe. 651 792 793 \begin{comment} 652 794 \subsection{Stealing Guarantees} 653 795 Given that the stealing operation can potentially fail, it is important to discuss the guarantees provided by the stealing implementation. … … 754 896 Thus, by contrapositive, if the graph contains a cycle then there exists a situation where no swaps occur. 755 897 Hence, at least one swap is guaranteed to succeed if and only if the graph does not contain a cycle. 898 \end{comment} 756 899 757 900 % C_TODO: go through and use \paragraph to format to make it look nicer … … 832 975 833 976 Another productivity feature that is included is a group of poison-pill messages. 834 Poison-pill messages are common across actor systems, and are used in actor libraries Akka and ProtoActor~\cite{Akka,ProtoActor}.977 Poison-pill messages are common across actor systems, including Akka and ProtoActor \cite{Akka,ProtoActor}. 835 978 Poison-pill messages inform an actor to terminate. 836 979 In \CFA, due to the allocation of actors and lack of garbage collection, there needs to be a suite of poison-pills. … … 876 1019 & \multicolumn{1}{c|}{\CFA (100M)} & \multicolumn{1}{c|}{CAF (10M)} & \multicolumn{1}{c|}{Akka (100M)} & \multicolumn{1}{c|}{\uC (100M)} & \multicolumn{1}{c@{}}{ProtoActor (100M)} \\ 877 1020 \hline 878 AMD & \input{data/ nasusSendStatic} \\1021 AMD & \input{data/pykeSendStatic} \\ 879 1022 \hline 880 Intel & \input{data/ pykeSendStatic}1023 Intel & \input{data/nasusSendStatic} 881 1024 \end{tabular} 882 1025 … … 889 1032 & \multicolumn{1}{c|}{\CFA (20M)} & \multicolumn{1}{c|}{CAF (2M)} & \multicolumn{1}{c|}{Akka (2M)} & \multicolumn{1}{c|}{\uC (20M)} & \multicolumn{1}{c@{}}{ProtoActor (2M)} \\ 890 1033 \hline 891 AMD & \input{data/ nasusSendDynamic} \\1034 AMD & \input{data/pykeSendDynamic} \\ 892 1035 \hline 893 Intel & \input{data/ pykeSendDynamic}1036 Intel & \input{data/nasusSendDynamic} 894 1037 \end{tabular} 895 1038 \end{table} … … 911 1054 The results from the static/dynamic send benchmarks are shown in Figures~\ref{t:StaticActorMessagePerformance} and \ref{t:DynamicActorMessagePerformance} respectively. 912 1055 \CFA leads the charts in both benchmarks, largely due to the copy queue removing the majority of the envelope allocations. 913 Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incur rs a compile-time cost.1056 Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incurs a compile-time cost. 914 1057 All the other systems use their virtual system to find the correct behaviour at message send. 915 1058 This requires two virtual dispatch operations, which is an additional runtime send cost that \CFA does not have. … … 1056 1199 1057 1200 This result is shown in Figure~\ref{f:cfaRepeatAMD} and \ref{f:cfaRepeatIntel} where the no-stealing version of \CFA performs better than both stealing variations. 1058 In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing).1201 In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and \CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing). 1059 1202 The shift for CAF is particularly large, which further supports the hypothesis that CAF's work stealing is particularly eager. 1060 1203 In both the executor and the repeat benchmark CAF performs poorly. … … 1083 1226 1084 1227 Figure~\ref{t:ExecutorMemory} shows the high memory watermark of the actor systems when running the executor benchmark on 48 cores. 1085 \CFA has a high watermark relative to the other non-garbage -collected systems \uC, and CAF.1228 \CFA has a high watermark relative to the other non-garbage collected systems \uC, and CAF. 1086 1229 This is a result of the copy queue data structure, as it will over-allocate storage and not clean up eagerly, whereas the per envelope allocations will always allocate exactly the amount of storage needed. 1087 1230 Despite having a higher watermark, the \CFA memory usage remains comparable to other non-garbage-collected systems.
Note: See TracChangeset
for help on using the changeset viewer.