Changes in / [96ea77a:70f97c8]


Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/text/actors.tex

    r96ea77a r70f97c8  
    1111
    1212\section{Actor Model}
    13 The \Newterm{actor model} is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}.
     13The actor model is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}.
    1414An actor is composed of a \Newterm{mailbox} (message queue) and a set of \Newterm{behaviours} that receive from the mailbox to perform work.
    1515Actors execute asynchronously upon receiving a message and can modify their own state, make decisions, spawn more actors, and send messages to other actors.
     
    2020An actor is executed by an underlying \Newterm{executor} (kernel thread-pool) that fairly invokes each actor, where an actor invocation processes one or more messages from its mailbox.
    2121The default number of executor threads is often proportional to the number of computer cores to achieve good performance.
    22 An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s:ActorSystem}}.
     22An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s:CFAActor}}.
    2323
    2424\subsection{Classic Actor System}
    25 An implementation of the actor model with a community of actors is called an \Newterm{actor system}.
     25An implementation of the actor model with a community of actors is called an actor system.
    2626Actor systems largely follow the actor model, but can differ in some ways.
    2727While the semantics of message \emph{send} is asynchronous, the implementation may be synchronous or a combination.
    28 The default semantics for message \emph{receive} is \gls{fifo}, so an actor receives messages from its mailbox in temporal (arrival) order;
     28The default semantics for message \emph{receive} is FIFO, so an actor receives messages from its mailbox in temporal (arrival) order;
    2929however, messages sent among actors arrive in any order.
    3030Some actor systems provide priority-based mailboxes and/or priority-based message-selection within a mailbox, where custom message dispatchers search among or within a mailbox(es) with a predicate for specific kinds of actors and/or messages.
    3131Some actor systems provide a shared mailbox where multiple actors receive from a common mailbox~\cite{Akka}, which is contrary to the no-sharing design of the basic actor-model (and requires additional locking).
    32 For non-\gls{fifo} service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received.
    33 Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism \see{Section~\ref{s:SafetyProductivity}}).
     32For non-FIFO service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received.
     33Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism (see Section~\ref{s:SafetyProductivity}).
    3434%While the definition of the actor model provides no restrictions on message ordering, actor systems tend to guarantee that messages sent from a given actor $i$ to actor $j$ will arrive at actor $j$ in the order they were sent.
    3535Another way an actor system varies from the model is allowing access to shared global-state.
     
    4949The more common design is to \Newterm{shard} the single queue among the executor threads, where actors are permanently assigned or can float among the queues.
    5050Sharding significantly decreases contention among executor threads adding and removing actors to/from a queue.
    51 Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors add messages.
    52 When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current behaviour on the message(s).
     51Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors can attach messages.
     52When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current work unit on the message(s).
    5353
    5454% cite parallel theatre and our paper
    5555Figure \ref{f:inverted_actor} shows an actor system designed as \Newterm{message-centric}, where a set of messages are scheduled and run on underlying executor threads~\cite{uC++,Nigro21}.
    56 This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor.
    57 Now a message send must queries the actor to know which message queue to post the message.
    5856Again, the simplest design has a single global queue of messages accessed by the executor threads, but this approach has the same contention problem by the executor threads.
    5957Therefore, the messages (mailboxes) are sharded and executor threads schedule each message, which points to its corresponding actor.
    60 Here, an actor's messages are permanently assigned to one queue to ensure \gls{fifo} receiving and/or reduce searching for specific actor/messages.
    61 Since multiple actors belong to each message queue, actor messages are interleaved on a queue, but individually in FIFO order.
     58Here, an actor's messages are permanently assigned to one queue to ensure FIFO receiving and/or reduce searching for specific actor/messages.
     59Since multiple actors belong to each message queue, actor messages are interleaved on a queue.
     60This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor.
    6261% In this inverted actor system instead of each executor threads owning a queue of actors, they each own a queue of messages.
    6362% In this scheme work is consumed from their queue and executed by underlying threads.
     
    6665% The arrows from the message queues to the actors in the diagram indicate interleaved messages addressed to each actor.
    6766
    68 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{Buhr22} but implemented in \CFA, and adds the following new \CFA contributions:
     67The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{}, and adds the following contributions related to \CFA:
    6968\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
    7069\item
    71 Provide insight into the impact of envelope allocation in actor systems \see{Section~\ref{s:envelope}}.
     70Provide insight into the impact of envelope allocation in actor systems.
    7271In all actor systems, dynamic allocation is needed to ensure the lifetime of a unit of work persists from its creation until the unit of work is executed.
    7372This allocation is often called an \Newterm{envelope} as it ``packages'' the information needed to run the unit of work, alongside any other information needed to send the unit of work, such as an actor's address or link fields.
     
    127126\end{cfa}
    128127\end{lrbox}
    129 
    130128\subfloat[dynamic typing]{\label{l:dynamic_style}\usebox\myboxA}
    131129\hspace*{10pt}
     
    179177\end{figure}
    180178
    181 Figure~\ref{f:CFAActor} shows a complete \CFA actor example, which is discussed in detail.
    182 The actor type @my_actor@ is a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword.
     179Figure~\ref{f:CFAActor} shows a complete \CFA actor example starting with the actor type @my_actor@ created by defining a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword.
    183180This inheritance style is the Plan-9 C-style inheritance discussed in Section~\ref{s:Inheritance}.
    184 Similarly, the message types @str_msg@ and @int_msg@ are @struct@s that inherits from the base @message@ @struct@ via the @inline@ keyword.
     181Similarly, the message types @str_msg@ and @int_msg@ are created by defining a @struct@ that inherits from the base @message@ @struct@ via the @inline@ keyword.
    185182Only @str_msg@ needs a constructor to copy the C string;
    186183@int_msg@ is initialized using its \CFA auto-generated constructors.
     
    190187The program main begins by creating two messages on the stack.
    191188Then the executor system is started by calling @start_actor_system@.
    192 Now an actor is created on the stack and four messages are sent to it using operator @?|?@.
    193 The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to remove the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}.
     189Now an actor is created on the stack and four messages are sent it using operator @?|?@.
     190The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to removes the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}.
    194191The call to @stop_actor_system@ blocks the program main until all actors are finished and removed from the actor system.
    195192The program main ends by deleting the actor and two messages from the stack.
     
    202199
    203200\subsection{Actor Behaviours}\label{s:ActorBehaviours}
    204 In general, a behaviour for some derived actor and derived message type is defined with the following signature:
     201In general, a behaviour for some derived actor and derived message type is defined with following signature:
    205202\begin{cfa}
    206203allocation receive( my_actor & receiver, my_msg & msg )
     
    216213Message state is updated via a call to:
    217214\begin{cfa}
    218 void set_allocation( message & this, allocation state );
     215void set_allocation( message & this, allocation state )
    219216\end{cfa}
    220217
     
    223220\noindent@Nodelete@
    224221tells the executor that no action is to be taken with regard to an actor or message.
    225 This status is used when an actor continues receiving messages or a message is reused.
     222This status is used when an actor continues receiving messages or a message may be reused.
    226223
    227224\noindent@Delete@
     
    236233tells the executor to mark the respective actor as finished executing, but not call the object's destructor nor deallocate the object.
    237234This status is used when actors or messages are global or stack allocated, or a programmer wants to manage deallocation themselves.
    238 Note, for messages, there is no difference between allocations @Nodelete@ and @Finished@ because both tell the executor to do nothing to the message.
    239 Hence, @Finished@ is implicitly changed to @Nodelete@ in a message constructor, and @Nodelete@ is used internally for message error-checking \see{Section~\ref{s:SafetyProductivity}}.
    240 Therefore, reading a message's allocation status after setting to @Finished@ may be either @Nodelete@ (after construction) or @Finished@ (after explicitly setting using @set_allocation@).
    241235
    242236For the actor system to terminate, all actors must have returned a status other than @Nodelete@.
    243237After an actor is terminated, it is erroneous to send messages to it.
    244238Similarly,  after a message is terminated, it cannot be sent to an actor.
    245 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action \emph{after} a behaviour returns.
     239Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action after a behaviour returns.
    246240
    247241\subsection{Actor Envelopes}\label{s:envelope}
    248242As stated, each message, regardless of where it is allocated, can be sent to an arbitrary number of actors, and hence, appear on an arbitrary number of message queues.
    249 Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies?
    250 Therefore, it is up to the actor program to manage message life-time across receives.
     243Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies.
     244Therefore, it up to the actor program to manage message life-time across receives.
    251245However, for a message to appear on multiple message queues, it needs an arbitrary number of associated destination behaviours.
    252246Hence, there is the concept of an envelop, which is dynamically allocated on each send, that wraps a message with any extra implementation fields needed to persist between send and receive.
    253247Managing the envelop is straightforward because it is created at the send and deleted after the receive, \ie there is 1:1 relationship for an envelop and a many to one relationship for a message.
    254 
    255 \PAB{Do you need to say more here?}
    256248
    257249% In actor systems, messages are sent and received by actors.
     
    283275\noindent@void start_actor_system( executor & this )@
    284276allows the programmer to explicitly create and configure an executor for use by the actor system.
    285 Executor configuration options are discussed in Section~\ref{s:executor}.
     277Executor configuration options include are discussed in Section~\ref{s:executor}.
    286278
    287279\noindent
     
    290282\subsection{Actor Send}\label{s:ActorSend}
    291283All message sends are done using the vertical-bar (bit-or) operator, @?|?@, similar to the syntax of the \CFA stream I/O.
    292 One way to provide this operator is through the \CFA type system:
    293 \begin{cfa}
    294 actor & ?|?( actor &, message & ) { // base actor and message types
    295         // boilerplate to send message to executor mail queue
    296 }
    297 actor | str_msg | int_msg;   // rewritten: ?|?( ?|?( actor, int_msg ), str_msg )
    298 \end{cfa}
    299 In the \CFA type system, calls to this routine work for any pair of parameters that inherit from the @actor@ and @message@ types via Plan-9 inheritance.
    300 However, within the body the routine, all type information about the derived actor and message is lost (type erasure), so this approach is unable to find the right @receive@ routine to put in the envelope.
    301 
    302 If \CFA had a fully-fledged virtual system, the generic @?|?@ routine would work, since the virtual system could dynamically select the derived @receive@ routine via virtual dispatch.
    303 \CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work, so a different approach is needed.
    304 
    305 Without virtuals, the idiomatic \CFA way to create the generic @?|?@ routine is using @forall@:
    306 \begin{cfa}
    307 // forall types A, M that have a receive that returns allocation
    308 forall( A &, M & | { allocation receive( A &, M & ); } )
    309 A & ?|?( A &, M & ) { // actor and message types
    310         // boilerplate to send message to executor mail queue
    311 }
    312 \end{cfa}
    313 This approach should work.
    314 However, the \CFA type system is still a work in progress, and there is a nontrivial bug where inherited routines are not recognized by @forall@.
    315 For example, Figure~\ref{f:type_problem} shows type @B@ has an inherited @foo@ routine through type @A@ and should find the @bar@ routine defined via the @forall@, but does not due the type-system bug.
    316 
    317 \begin{figure}
    318 \begin{cfa}
    319 struct A {};
    320 struct B { inline A; }
    321 void foo( A & a ) { ... }
    322 
    323 // for all types that have a foo routine here is a bar routine
    324 forall( T & | { void foo( T & ); } )
    325 void bar( T & t ) { ... }
    326 
    327 int main() {
    328         B b;
    329         foo( b ); // B has a foo so it should find a bar via the forall
    330         bar( b ); // compilation error, no bar found for type B
    331 }
    332 \end{cfa}
    333 \caption{\CFA Type-System Problem}
    334 \label{f:type_problem}
    335 \end{figure}
    336 
    337 Users could be expected to write the @?|?@ routines, but this approach is error prone and creates maintenance issues.
    338 Until the \CFA type-system matures, I created a workaround using a template-like approach, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature.
    339 This workaround is outside of the type system, but performing a type-system like action.
    340 The workaround requires no annotation or additional code to be written by users;
    341 thus, it resolves the maintenance and error problems.
    342 It should be possible to seamlessly transition the workaround into any updated version of the \CFA type-system.
     284Hence, programmers must write a matching @?|?@ routine for each @receive@ routine, which is awkward and generates a maintenance problem that must be solved.
     285\CFA's current best approach for creating a generic @?|?@ routine requires users to create routines for their actor and message types that access the base type.
     286Since these routines are not complex, they can be generated using macros that are used to annotate the user-defined actor and message types.
     287This approach is used in \CFA's intrusive list data structure.
     288This is not much better than asking users to write the @?|?@ routine themselves in terms of maintenance, since the user needs to remember the boilerplate macro annotation.
     289
     290As stated, \CFA does not have named inheritance with RTTI.
     291\CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work.
     292Virtuals would provide a mechanism to write a single generic @?|?@ routine taking a base actor and message type, that would dynamically select the @receive@ routine from the actor argument.
     293Note, virtuals are not needed for the send; Plan-9 inheritance is sufficient because only the inherited fields are needed during the message send (only upcasting is needed).
     294
     295Therefore, a template-like approach was chosen, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature.
     296This approach requires no annotation or additional code to be written by users, thus it resolves the maintenance problem.
     297(When the \CFA virtual routines mature, it should be possible to seamlessly transition to it from the template approach.)
    343298
    344299Figure~\ref{f:send_gen} shows the generated send routine for the @int_msg@ receive in Figure~\ref{f:CFAActor}.
    345300Operator @?|?@ has the same parameter signature as the corresponding @receive@ routine and returns an @actor@ so the operator can be cascaded.
    346301The routine sets @rec_fn@ to the matching @receive@ routine using the left-hand type to perform the selection.
    347 Then the routine packages the actor and message, along with the receive routine into an envelope.
     302Then the routine packages the base and derived actor and message and actor, along with the receive routine into an \hyperref[s:envelope]{envelope}.
    348303Finally, the envelop is added to the executor queue designated by the actor using the executor routine @send@.
    349304
     
    351306\begin{cfa}
    352307$\LstCommentStyle{// from Figure~\ref{f:CFAActor}}$
    353 struct my_actor { inline actor; };                                              $\C[3.75in]{// actor}$
    354 struct int_msg { inline message; int i; };                              $\C{// message}$
     308struct my_actor { inline actor; };                                              $\C[3.75in]{// actor}$
     309struct int_msg { inline message; int i; };                              $\C{// message}$
    355310allocation receive( @my_actor &, int_msg & msg@ ) {...} $\C{// receiver}$
    356311
     
    359314actor & ?|?( @my_actor & receiver, int_msg & msg@ ) {
    360315        allocation (*rec_fn)( my_actor &, int_msg & ) = @receive@; // deduce receive routine
    361         request req{ (actor *)&receiver, (message *)&msg, (receive_t)rec_fn };
    362         send( receiver, req );                                                          $\C{// queue message for execution}\CRT$
     316        request req{ &receiver, (actor *)&receiver, &msg, (message *)&msg, (receive_t)rec_fn };
     317        send( receiver, req );                                                          $\C{// queue message for execution}\CRT$
    363318        return receiver;
    364319}
     
    368323\end{figure}
    369324
     325\subsection{Actor Termination}\label{s:ActorTerm}
     326During a message send, the receiving actor and message being sent are stored via pointers in the envelope.
     327These pointers have the base actor and message types, so type information of the actor and message is lost and then recovered later when the typed receive routine is called.
     328After the receive routine is done, the executor must clean up the actor and message according to their allocation status.
     329If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor.
     330This poses a problem; the correct type of the actor or message is not available to the executor, but it needs to call the right destructor!
     331This requires downcasting from the base type to derived type, which requires a virtual system.
     332Thus, a rudimentary destructor-only virtual system was added to \CFA as part of this work.
     333This virtual system is used via Plan-9 inheritance of the @virtual_dtor@ type, shown in Figure~\ref{f:VirtDtor}.
     334The @virtual_dtor@ type maintains a pointer to the start of the object, and a pointer to the correct destructor.
     335When a type inherits the @virtual_dtor@ type, the compiler adds code to its destructor to make sure that any destructor calls along this segment of the inheritance tree is called are intercepted, and restart at the appropriate destructor for that object.
     336
     337\begin{figure}
     338\begin{cfa}
     339struct base_type { inline virtual_dtor; };
     340struct intermediate_type { inline base_type; };
     341struct derived_type { inline intermediate_type; };
     342
     343int main() {
     344    derived_type d1, d2, d3;
     345    intermediate_type & i = d2;
     346    base_type & b = d3;
     347
     348    // these will call the destructors in the correct order
     349    ^d1{}; ^i{}; ^b{};
     350}
     351
     352\end{cfa}
     353\caption{\CFA Virtual Destructor}
     354\label{f:VirtDtor}
     355\end{figure}
     356
     357This virtual destructor system was built for this work, but is general and can be used in any type in \CFA.
     358Actors and messages opt into this system by inheriting the @virtual_dtor@ type, which allows the executor to call the right destructor without knowing the derived actor or message type.
     359
    370360Figure~\ref{f:ConvenienceMessages} shows three builtin convenience messages and receive routines used to terminate actors, depending on how an actor is allocated: @Delete@, @Destroy@ or @Finished@.
    371361For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor.
    372 Note, assignment is used to initialize these messages rather than constructors because the constructor changes the allocation to @Nodelete@ for error checking
    373362
    374363\begin{figure}
    375364\begin{cfa}
    376 message __base_msg_finished $@$= { .allocation_ : Finished };
    377 struct delete_msg_t { inline message; } delete_msg = __base_msg_finished;
    378 struct destroy_msg_t { inline message; } destroy_msg = __base_msg_finished;
    379 struct finished_msg_t { inline message; } finished_msg = __base_msg_finished;
    380 
    381 allocation receive( actor & this, delete_msg_t & msg ) { return Delete; }
    382 allocation receive( actor & this, destroy_msg_t & msg ) { return Destroy; }
    383 allocation receive( actor & this, finished_msg_t & msg ) { return Finished; }
     365message __base_msg_finished $@$= { .allocation_ : Finished }; // no auto-gen constructors
     366struct __delete_msg_t { inline message; } delete_msg = __base_msg_finished;
     367struct __destroy_msg_t { inline message; } destroy_msg = __base_msg_finished;
     368struct __finished_msg_t { inline message; } finished_msg = __base_msg_finished;
     369
     370allocation receive( actor & this, __delete_msg_t & msg ) { return Delete; }
     371allocation receive( actor & this, __destroy_msg_t & msg ) { return Destroy; }
     372allocation receive( actor & this, __finished_msg_t & msg ) { return Finished; }
    384373\end{cfa}
    385374\caption{Builtin Convenience Messages}
    386375\label{f:ConvenienceMessages}
    387376\end{figure}
    388 
    389 \subsection{Actor Termination}\label{s:ActorTerm}
    390 During a message send, the receiving actor and message being sent are stored via pointers in the envelope.
    391 These pointers are the base actor and message types, so type information of the derived actor and message is lost and must be recovered later when the typed receive routine is called.
    392 After the receive routine is done, the executor must clean up the actor and message according to their allocation status.
    393 If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor.
    394 This poses a problem;
    395 the derived type of the actor or message is not available to the executor, but it needs to call the derived destructor.!
    396 This requires downcasting from the base type to the derived type, which requires a virtual system.
    397 To accomplish the dowcast, I implemented a rudimentary destructor-only virtual system in \CFA.
    398 This virtual system is used via Plan-9 inheritance of the @virtual_dtor@ type, shown in Figure~\ref{f:VirtDtor}.
    399 The @virtual_dtor@ type maintains a pointer to the start of the object, and a pointer to the correct destructor.
    400 When a type inherits @virtual_dtor@, the compiler adds code to its destructor to intercepted any destructor calls along this segment of the inheritance tree and restart at the appropriate destructor for that object.
    401 
    402 \begin{figure}
    403 \centering
    404 
    405 \begin{lrbox}{\myboxA}
    406 \begin{cfa}
    407 struct base { inline virtual_dtor; };
    408 void ^?{}( base & ) { sout | "^base"; }
    409 struct intermediate { inline base; };
    410 void ^?{}( intermediate & ) { sout | "^intermediate"; }
    411 struct derived { inline intermediate; };
    412 void ^?{}( derived & ) { sout | "^derived"; }
    413 
    414 int main() {
    415         base & b;
    416         intermediate i;
    417         derived d1, d2, d3;
    418         intermediate & ri = d2;
    419         base & rb = d3;
    420         // explicit destructor calls
    421         ^d1{};  sout | nl;
    422         ^ri{};  sout | nl;
    423         ^rb{};  sout | nl;
    424 } // ^i, ^b
    425 \end{cfa}
    426 \end{lrbox}
    427 
    428 \begin{lrbox}{\myboxB}
    429 \begin{cfa}
    430 ^derived
    431 ^intermediate
    432 ^base
    433 
    434 ^derived
    435 ^intermediate
    436 ^base
    437 
    438 ^derived
    439 ^intermediate
    440 ^base
    441 
    442 ^intermediate
    443 ^base
    444 
    445 
    446 
    447 
    448 \end{cfa}
    449 
    450 \end{lrbox}
    451 \subfloat[Destructor calls]{\label{l:destructor_calls}\usebox\myboxA}
    452 \hspace*{10pt}
    453 \vrule
    454 \hspace*{10pt}
    455 \subfloat[Output]{\usebox\myboxB}
    456 
    457 \caption{\CFA Virtual Destructor}
    458 \label{f:VirtDtor}
    459 \end{figure}
    460 
    461 While this virtual destructor system was built for this work, it is general and can be used in any type in \CFA.
    462 Actors and messages opt into this system by inheriting the @virtual_dtor@ type, which allows the executor to call the right destructor without knowing the derived actor or message type.
    463377
    464378\section{\CFA Executor}\label{s:executor}
     
    466380An executor of an actor system is the scheduler that organizes where actor behaviours are run and how messages are sent and delivered.
    467381In \CFA, the executor is message-centric \see{Figure~\ref{f:inverted_actor}}, but extended by over sharding of a message queue \see{left side of Figure~\ref{f:gulp}}, \ie there are $M$ message queues where $M$ is greater than the number of executor threads $N$ (usually a multiple of $N$).
    468 This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor \gls{fifo} message-delivery semantics.
     382This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor FIFO message-delivery semantics.
    469383The only extra overhead is each executor cycling (usually round-robin) through its $M$/$N$ queues.
    470384The goal is to achieve better performance and scalability for certain kinds of actor applications by reducing executor locking.
     
    530444\section{Work Stealing}\label{s:steal}
    531445Work stealing is a scheduling strategy to provide \Newterm{load balance}.
    532 The goal is to increase resource utilization by having an idle thread steal work from a working thread.
    533 While there are multiple parts in a work-stealing scheduler, the two important components are the stealing mechanism and victim selection.
     446The goal is to increase resource utilization by having idle threads steal work from working threads.
     447While there are multiple parts in work-stealing scheduler, the two important components are victim selection and the stealing mechanism.
    534448
    535449\subsection{Stealing Mechanism}
    536 In work stealing, the stealing worker is called the \Newterm{thief} and the stolen worker is called the \Newterm{victim}.
    537 % Workers consume actors from their ready queue and execute their behaviours.
    538 % Through these behaviours, a worker inserts messages onto its own and other worker ready-queues.
    539 To steal, a thieve takes one or more actors from a victim's ready queue, so work stealing always results in a potential increase in contention on ready queues between the victim gulping from a queue and the thief stealing the queue.
     450In work stealing, the stealing worker is called the \Newterm{thief} and the stolen-from worker is called the \Newterm{victim}.
     451The stealing mechanism presented here differs from existing work-stealing actor-systems because of the message-centric (inverted) actor-system.
     452Other actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF}, have work stealing, but use an actor-centric system where stealing is dequeuing from a non-empty ready-queue to an empty ready-queue.
     453As an example, in CAF, the sharded actor queue is a set of double-ended queues (dequeues).
     454When an actor has messages, it is inserted into a worker's dequeue (ready queue).
     455Workers then consume actors from the dequeue and execute their behaviours.
     456To steal work, thieves take one or more actors from a victim's dequeue.
     457By the pigeon hole principle, there are three dequeue operations (push/victim pop/thief pop) that can occur concurrently and only two ends to a dequeue, so work stealing in a dequeue-based system always results in a potential increase in contention on the dequeues.
    540458This contention can slows down the victim's throughput.
    541 Note, the data structure used for the ready queue is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations.
     459Note, which end of the dequeue is used for stealing, consuming, and inserting is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations.
     460
     461Work steal now becomes queue stealing, where an entire actor/message queue is stolen, which trivially preserves message ordering in a queue \see{Section~\ref{s:steal}}.
    542462
    543463% C_TODO: maybe insert stealing diagram
    544464
    545 The stealing mechanism in this work differs from most work-stealing actor-systems because of the message-centric (inverted) actor-system.
    546 Actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF} using actor-centric systems, steal by dequeuing from a non-empty actor ready-queue and enqueue\-ing to an empty ready-queue.
    547 % As an example, in CAF, the sharded actor ready queue is a set of double-ended queues (dequeues).
    548465In \CFA, the actor work-stealing implementation is unique because of the message-centric system.
    549 With this approach, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue.
    550 To ensure sequential actor execution and \gls{fifo} message delivery, actor stealing requires finding and removing \emph{all} of an actor's messages, and inserting them consecutively in another message queue.
     466In this system, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue.
     467To ensure sequential actor execution and FIFO message delivery, actor stealing requires finding and removing all of an actor's messages, and inserting them consecutively in another message queue.
    551468This operation is $O(N)$ with a non-trivial constant.
    552 The only way for work stealing to become practical is to shard each worker's message queue, which also reduces contention, and steal queues to eliminate queue searching.
    553 
    554 Given queue stealing, my goal is to have an essentially zero-contention-cost stealing mechanism.
    555 This goal means work stealing has minimal affect on the performance of the victim.
     469The only way for work stealing to become practical is to shard the message queue, which also reduces contention, and steal queues to eliminate queue searching.
     470
     471Given queue stealing, the goal is to have a zero-victim-cost stealing mechanism, which does not mean stealing has no cost.
     472It means work stealing does not affect the performance of the victim worker.
    556473The implication is that thieves cannot contend with a victim, and that a victim should perform no stealing related work unless it becomes a thief.
    557 In theory, this goal is not achievable, but practical results show the goal is virtually achieved.
    558 
    559 One important lesson I learned working on \uC actors~\cite{} and talking with fellow student Thierry Delisle, who examined work-stealing for user-threads in his Ph.D., is \emph{not} to aggressively steal.
    560 With reasonable workloads, being a thief should be a temporary state, \ie eventually work appears on the thief's ready-queues and it returns to normal operation.
    561 Furthermore, the act of \emph{looking} to find work is invasive (Heisenberg uncertainty principle), possibly disrupting multiple victims.
    562 Therefore, stealing should be done lazily in case work appears for the thief and to minimize disruption of victims.
    563 Note, the cost of stealing is not crucial for the thief because it does not have anything else to do except poll or block.
    564 
    565 My lazy-stealing approach is to select a victim, scan its queues once, and return immediately if a queue is stolen.
    566 Then perform a regular scan looking for work, where stolen work is placed at the end of the scan.
    567 Hence, only one victim is effected and there is a reasonable delay between stealing events by scanning all the thief's ready queue looking for its own work.
    568 If no work is found in the thief's queues because a steal is unsuccessful, it performs another steal from a different victim.
    569 This lazy examination by the thief has a low perturbation cost for victims, while still finding work in a moderately loaded system.
    570 In all work-stealing algorithms, there is the pathological case where there is too little work and too many workers;
    571 this scenario can only be handled by putting workers to sleep or deleting them.
    572 This case is not examined in this work.
    573 
    574 In detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker.
    575 Stealing a queue is done wait-free (\ie no busy waiting) with a few atomic instructions that only create contention with other stealing workers, not the victim.
    576 The complexity in the implementation is that victim gulping does not take the mailbox queue;
    577 rather it atomically transfers the mailbox queue to another list leaving the mailbox empty.
    578 Hence, the original list is available for new message deliveries.
    579 However, this transfer logically subdivides the mailbox, and during this period, the mailbox cannot be stolen;
    580 otherwise there are two threads simultaneously running messages on actors in the two parts of the mailbox queue.
    581 To solve this problem, an atomic gulp also marks the mailbox queue as subdivided, making it ineligible for stealing.
    582 Hence, a thief checks for ineligible and non-empty before attempting an atomic steal of a queue,
    583 
    584 
    585 
    586 To steal a queue, a thief does the following:
     474In theory, this goal is not achievable, but results show the goal is achieved in practice.
     475
     476In \CFA's actor system, workers own a set of sharded queues, which they iterate over and gulp.
     477If a worker has iterated over its message queues twice without finding any work, it tries to steal a queue from another worker.
     478Stealing a queue is done wait-free with a few atomic instructions that can only create contention with other stealing workers, not the victim.
     479To steal a queue, a worker does the following:
    587480\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
    588481\item
    589 chooses a victim, which is trivial because there is an array of $N$ pointers to the mailboxes (copy queues) that is subdivided into $M$/$N$ worker ranges \see{Section~\ref{s:executor}}.
    590 
    591 \item
    592 scan the victim's mailbox-range and test each mail queue in that range for a non-empty queue by performing a test-for-non-empty and swap.
    593 The conditional check reduces the number of atomic operations.
    594 The swap races to interchange the non-empty mail-queue pointer from the victim's range with an empty mail-queue pointer in the thief's range.
    595 Regardless, either a thief swaps or the victim gulps the mail queue;
    596 correctness is guaranteed, no matter which thread wins the race, because both threads are cycling through appropriate lists.
    597 Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers.
    598 
    599 \item
    600 stops searching after a non-empty queue steal or all queues in the random worker's range are examined.
    601 The thief then returns to its scheduler and iterates over its messages queues, because new messages may have arrived during stealing, including a stolen queue.
     482The thief chooses a victim, which is trivial because all workers are stored in a shared array.
     483
     484\item
     485The thief starts at a random index in the array of the victim's queues and searches for a candidate queue.
     486A candidate queue is any non-empty queue not being processed by the victim and not being stolen by another thief.
     487These rules are not strictly enforced.
     488A candidate is identified non-atomically, and as such, queues that do not satisfy these rules may be stolen.
     489However, steals not meeting the rules do not affect correctness and do not constitute failed steals as the queue is always swapped.
     490
     491\item
     492Once a candidate queue is chosen, the thief attempts a wait-free swap of a victim's queue to a random empty thief queue.
     493If the swap successes, the steal is completed.
     494If the swap fails, the victim may have been gulping that message queue or another thief must have attempted to steal the victim's queue.
     495In either case, that message queue is highly likely to be empty.
     496
     497\item
     498Once a thief fails or succeeds in stealing a queue, it iterates over its messages queues again because new messages may have arrived during stealing.
    602499Stealing is only repeated after two consecutive iterations over its owned queues without finding work.
    603500\end{enumerate}
    604 This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue.
    605 As well, pushes to the queues by other workers can happen concurrently during the swap since pushes happen via the actor queue references.
    606 
    607 
    608 
    609 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
    610 \item
    611 selects a random worker's mailbox-range and tests each mail queue in that range for a non-empty queue.
    612 
    613 A queue can be stolen and gulped at the same time since they are independent operations.
    614 However, two workers cannot gulp from the same queue at the same time since that would violate the ordering and mutual exclusion guarantees of actors, so an avail flag is used on queues to stop this from happening.
    615 If a worker tries to gulp a queue and sees an avail flag up, the gulp fails, and it will move on to other queues and try to gulp it again later.
    616 stops searching after a non-empty queue steal or all queues in the random worker's range are examined.
    617 It stops searching after any queue steal (even an empty one) or all queues are examined.
    618 
    619 Stealing is only repeated after two consecutive iterations over its owned queues without finding work.
    620 Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers.
    621 \end{enumerate}
    622 This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue.
    623 The first and last sentence here are correct. I'm not sure I understand the second sentence.
    624 
    625 
    626 
    627 As an aside I think I need a diagram here since it is clear that the algorithm is not clear through my writing.
    628 I made a very quick one that I'll use here to explain, but I will work on one to add to the chapter.
    629 
    630 
    631 To give some accompanying code there are two arrays, the array of copy queues, and the array of pointers to the copy queues:
    632 \begin{cfa}
    633 work_queue * request_queues;                   // master array of work request queues
    634 work_queue ** worker_req_queues;                // secondary array of work queues to allow for swapping
    635 \end{cfa}
    636 \includegraphics[width=4in]{../figures/steal_diagram.eps}
    637 When a send happens, it inserts a request directly into one of the queues in @request_queues@, when a steal happens two pointers in @worker_req_queues@ are swapped.
    638 Each worker owns a section of @worker_req_queues@.
    639 When a worker gulps, it dereferences one of the pointers in its section of @worker_req_queues@ and then gulps from the queue it points at.
    640 As such, operations can happen on the queues directly independent of stealing, which avoids almost all contention between stealing threads and busy threads.
    641 
    642 If you want to poke some of the code, in @actor.hfa@ @steal_work()@ selects the victim, @choose_queue()@ iterates over the queues looking for non-empty queues and @try_swap_queues()@ performs the thread safe pointer swap.
    643 
    644 % The first key to this is that actors and workers maintain two distinct arrays of references to queues.
    645 % Actors will always receive messages via the same queues.
    646 % Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own.
    647 % Swapping queues is a matter of atomically swapping two pointers in the worker array.
    648 % As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references.
    649 
    650 % Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes.
    651 % When a worker is not stealing it iterates across its own range of queues and gulps them one by one.
    652 
    653 Correctness of gulping is slightly more complex with stealing.
    654 When a worker operates on a queue, it first copies the queue pointer from the mailbox array to a local variable.
     501
     502The key to the stealing mechanism is that the queues can still be operated on while they are being swapped.
     503This functionality eliminates any contention among thieves and victims.
     504
     505The first key to this is that actors and workers maintain two distinct arrays of references to queues.
     506Actors will always receive messages via the same queues.
     507Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own.
     508Swapping queues is a matter of atomically swapping two pointers in the worker array.
     509As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references.
     510
     511Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes.
     512When a worker is not stealing it iterates across its own range of queues and gulps them one by one.
     513When a worker operates on a queue it first copies the current pointer from the worker array of references to a local variable.
    655514It then uses that local variable for all queue operations until it moves to the next index of its range of the queue array.
    656515This ensures that any swaps do not interrupt gulping operations, however this introduces a correctness issue.
     
    791650In the failed case of step 3 the program state is safely restored to its state it had prior to the @0p@ write in step 2, thanks to the invariant that makes the write back to the @0p@ pointer safe.
    792651
    793 \begin{comment}
    794652\subsection{Stealing Guarantees}
    795653Given that the stealing operation can potentially fail, it is important to discuss the guarantees provided by the stealing implementation.
     
    896754Thus, by contrapositive, if the graph contains a cycle then there exists a situation where no swaps occur.
    897755Hence, at least one swap is guaranteed to succeed if and only if the graph does not contain a cycle.
    898 \end{comment}
    899756
    900757% C_TODO: go through and use \paragraph to format to make it look nicer
     
    975832
    976833Another productivity feature that is included is a group of poison-pill messages.
    977 Poison-pill messages are common across actor systems, including Akka and ProtoActor \cite{Akka,ProtoActor}.
     834Poison-pill messages are common across actor systems, and are used in actor libraries Akka and ProtoActor~\cite{Akka,ProtoActor}.
    978835Poison-pill messages inform an actor to terminate.
    979836In \CFA, due to the allocation of actors and lack of garbage collection, there needs to be a suite of poison-pills.
     
    1019876        & \multicolumn{1}{c|}{\CFA (100M)} & \multicolumn{1}{c|}{CAF (10M)} & \multicolumn{1}{c|}{Akka (100M)} & \multicolumn{1}{c|}{\uC (100M)} & \multicolumn{1}{c@{}}{ProtoActor (100M)} \\
    1020877        \hline
    1021         AMD             & \input{data/pykeSendStatic} \\
     878        AMD             & \input{data/nasusSendStatic} \\
    1022879        \hline
    1023         Intel   & \input{data/nasusSendStatic}
     880        Intel   & \input{data/pykeSendStatic}
    1024881\end{tabular}
    1025882
     
    1032889        & \multicolumn{1}{c|}{\CFA (20M)} & \multicolumn{1}{c|}{CAF (2M)} & \multicolumn{1}{c|}{Akka (2M)} & \multicolumn{1}{c|}{\uC (20M)} & \multicolumn{1}{c@{}}{ProtoActor (2M)} \\
    1033890        \hline
    1034         AMD             & \input{data/pykeSendDynamic} \\
     891        AMD             & \input{data/nasusSendDynamic} \\
    1035892        \hline
    1036         Intel   & \input{data/nasusSendDynamic}
     893        Intel   & \input{data/pykeSendDynamic}
    1037894\end{tabular}
    1038895\end{table}
     
    1054911The results from the static/dynamic send benchmarks are shown in Figures~\ref{t:StaticActorMessagePerformance} and \ref{t:DynamicActorMessagePerformance} respectively.
    1055912\CFA leads the charts in both benchmarks, largely due to the copy queue removing the majority of the envelope allocations.
    1056 Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incurs a compile-time cost.
     913Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incurrs a compile-time cost.
    1057914All the other systems use their virtual system to find the correct behaviour at message send.
    1058915This requires two virtual dispatch operations, which is an additional runtime send cost that \CFA does not have.
     
    11991056
    12001057This result is shown in Figure~\ref{f:cfaRepeatAMD} and \ref{f:cfaRepeatIntel} where the no-stealing version of \CFA performs better than both stealing variations.
    1201 In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and \CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing).
     1058In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing).
    12021059The shift for CAF is particularly large, which further supports the hypothesis that CAF's work stealing is particularly eager.
    12031060In both the executor and the repeat benchmark CAF performs poorly.
     
    12261083
    12271084Figure~\ref{t:ExecutorMemory} shows the high memory watermark of the actor systems when running the executor benchmark on 48 cores.
    1228 \CFA has a high watermark relative to the other non-garbage collected systems \uC, and CAF.
     1085\CFA has a high watermark relative to the other non-garbage-collected systems \uC, and CAF.
    12291086This is a result of the copy queue data structure, as it will over-allocate storage and not clean up eagerly, whereas the per envelope allocations will always allocate exactly the amount of storage needed.
    12301087Despite having a higher watermark, the \CFA memory usage remains comparable to other non-garbage-collected systems.
Note: See TracChangeset for help on using the changeset viewer.