Changeset 8909a2d


Ignore:
Timestamp:
Jul 3, 2023, 1:43:48 PM (10 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
96ea77a
Parents:
1ae3ac46
Message:

more proofreading of actor chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/text/actors.tex

    r1ae3ac46 r8909a2d  
    1111
    1212\section{Actor Model}
    13 The actor model is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}.
     13The \Newterm{actor model} is a concurrent paradigm where computation is broken into units of work called actors, and the data for computation is distributed to actors in the form of messages~\cite{Hewitt73}.
    1414An actor is composed of a \Newterm{mailbox} (message queue) and a set of \Newterm{behaviours} that receive from the mailbox to perform work.
    1515Actors execute asynchronously upon receiving a message and can modify their own state, make decisions, spawn more actors, and send messages to other actors.
     
    2020An actor is executed by an underlying \Newterm{executor} (kernel thread-pool) that fairly invokes each actor, where an actor invocation processes one or more messages from its mailbox.
    2121The default number of executor threads is often proportional to the number of computer cores to achieve good performance.
    22 An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s:CFAActor}}.
     22An executor is often tunable with respect to the number of kernel threads and its scheduling algorithm, which optimize for specific actor applications and workloads \see{end of Section~\ref{s:ActorSystem}}.
    2323
    2424\subsection{Classic Actor System}
    25 An implementation of the actor model with a community of actors is called an actor system.
     25An implementation of the actor model with a community of actors is called an \Newterm{actor system}.
    2626Actor systems largely follow the actor model, but can differ in some ways.
    2727While the semantics of message \emph{send} is asynchronous, the implementation may be synchronous or a combination.
    28 The default semantics for message \emph{receive} is FIFO, so an actor receives messages from its mailbox in temporal (arrival) order;
     28The default semantics for message \emph{receive} is \gls{fifo}, so an actor receives messages from its mailbox in temporal (arrival) order;
    2929however, messages sent among actors arrive in any order.
    3030Some actor systems provide priority-based mailboxes and/or priority-based message-selection within a mailbox, where custom message dispatchers search among or within a mailbox(es) with a predicate for specific kinds of actors and/or messages.
    3131Some actor systems provide a shared mailbox where multiple actors receive from a common mailbox~\cite{Akka}, which is contrary to the no-sharing design of the basic actor-model (and requires additional locking).
    32 For non-FIFO service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received.
    33 Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism (see Section~\ref{s:SafetyProductivity}).
     32For non-\gls{fifo} service, some notion of fairness (eventual progress) must exist, otherwise messages have a high latency or starve, \ie never received.
     33Finally, some actor systems provide multiple typed-mailboxes, which then lose the actor-\lstinline{become} mechanism \see{Section~\ref{s:SafetyProductivity}}).
    3434%While the definition of the actor model provides no restrictions on message ordering, actor systems tend to guarantee that messages sent from a given actor $i$ to actor $j$ will arrive at actor $j$ in the order they were sent.
    3535Another way an actor system varies from the model is allowing access to shared global-state.
     
    4949The more common design is to \Newterm{shard} the single queue among the executor threads, where actors are permanently assigned or can float among the queues.
    5050Sharding significantly decreases contention among executor threads adding and removing actors to/from a queue.
    51 Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors can attach messages.
    52 When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current work unit on the message(s).
     51Finally, each actor has a receive queue of messages (mailbox), which is a single consumer, multi-producer queue, \ie only the actor removes from the mailbox but multiple actors add messages.
     52When an actor receives a message in its mailbox, the actor is marked ready and scheduled by a thread to run the actor's current behaviour on the message(s).
    5353
    5454% cite parallel theatre and our paper
    5555Figure \ref{f:inverted_actor} shows an actor system designed as \Newterm{message-centric}, where a set of messages are scheduled and run on underlying executor threads~\cite{uC++,Nigro21}.
     56This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor.
     57Now a message send must queries the actor to know which message queue to post the message.
    5658Again, the simplest design has a single global queue of messages accessed by the executor threads, but this approach has the same contention problem by the executor threads.
    5759Therefore, the messages (mailboxes) are sharded and executor threads schedule each message, which points to its corresponding actor.
    58 Here, an actor's messages are permanently assigned to one queue to ensure FIFO receiving and/or reduce searching for specific actor/messages.
    59 Since multiple actors belong to each message queue, actor messages are interleaved on a queue.
    60 This design is \Newterm{inverted} because actors belong to a message queue, whereas in the classic approach a message queue belongs to each actor.
     60Here, an actor's messages are permanently assigned to one queue to ensure \gls{fifo} receiving and/or reduce searching for specific actor/messages.
     61Since multiple actors belong to each message queue, actor messages are interleaved on a queue, but individually in FIFO order.
    6162% In this inverted actor system instead of each executor threads owning a queue of actors, they each own a queue of messages.
    6263% In this scheme work is consumed from their queue and executed by underlying threads.
     
    6566% The arrows from the message queues to the actors in the diagram indicate interleaved messages addressed to each actor.
    6667
    67 The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{}, and adds the following contributions related to \CFA:
     68The actor system in \CFA uses a message-centric design, adopts several features from my prior actor work in \uC~\cite{Buhr22} but implemented in \CFA, and adds the following new \CFA contributions:
    6869\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
    6970\item
    70 Provide insight into the impact of envelope allocation in actor systems.
     71Provide insight into the impact of envelope allocation in actor systems \see{Section~\ref{s:envelope}}.
    7172In all actor systems, dynamic allocation is needed to ensure the lifetime of a unit of work persists from its creation until the unit of work is executed.
    7273This allocation is often called an \Newterm{envelope} as it ``packages'' the information needed to run the unit of work, alongside any other information needed to send the unit of work, such as an actor's address or link fields.
     
    126127\end{cfa}
    127128\end{lrbox}
     129
    128130\subfloat[dynamic typing]{\label{l:dynamic_style}\usebox\myboxA}
    129131\hspace*{10pt}
     
    177179\end{figure}
    178180
    179 Figure~\ref{f:CFAActor} shows a complete \CFA actor example starting with the actor type @my_actor@ created by defining a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword.
     181Figure~\ref{f:CFAActor} shows a complete \CFA actor example, which is discussed in detail.
     182The actor type @my_actor@ is a @struct@ that inherits from the base @actor@ @struct@ via the @inline@ keyword.
    180183This inheritance style is the Plan-9 C-style inheritance discussed in Section~\ref{s:Inheritance}.
    181 Similarly, the message types @str_msg@ and @int_msg@ are created by defining a @struct@ that inherits from the base @message@ @struct@ via the @inline@ keyword.
     184Similarly, the message types @str_msg@ and @int_msg@ are @struct@s that inherits from the base @message@ @struct@ via the @inline@ keyword.
    182185Only @str_msg@ needs a constructor to copy the C string;
    183186@int_msg@ is initialized using its \CFA auto-generated constructors.
     
    187190The program main begins by creating two messages on the stack.
    188191Then the executor system is started by calling @start_actor_system@.
    189 Now an actor is created on the stack and four messages are sent it using operator @?|?@.
    190 The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to removes the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}.
     192Now an actor is created on the stack and four messages are sent to it using operator @?|?@.
     193The last message is the builtin @finish_msg@, which returns @Finished@ to an executor thread, causing it to remove the actor from the actor system \see{Section~\ref{s:ActorBehaviours}}.
    191194The call to @stop_actor_system@ blocks the program main until all actors are finished and removed from the actor system.
    192195The program main ends by deleting the actor and two messages from the stack.
     
    199202
    200203\subsection{Actor Behaviours}\label{s:ActorBehaviours}
    201 In general, a behaviour for some derived actor and derived message type is defined with following signature:
     204In general, a behaviour for some derived actor and derived message type is defined with the following signature:
    202205\begin{cfa}
    203206allocation receive( my_actor & receiver, my_msg & msg )
     
    213216Message state is updated via a call to:
    214217\begin{cfa}
    215 void set_allocation( message & this, allocation state )
     218void set_allocation( message & this, allocation state );
    216219\end{cfa}
    217220
     
    220223\noindent@Nodelete@
    221224tells the executor that no action is to be taken with regard to an actor or message.
    222 This status is used when an actor continues receiving messages or a message may be reused.
     225This status is used when an actor continues receiving messages or a message is reused.
    223226
    224227\noindent@Delete@
     
    233236tells the executor to mark the respective actor as finished executing, but not call the object's destructor nor deallocate the object.
    234237This status is used when actors or messages are global or stack allocated, or a programmer wants to manage deallocation themselves.
     238Note, for messages, there is no difference between allocations @Nodelete@ and @Finished@ because both tell the executor to do nothing to the message.
     239Hence, @Finished@ is implicitly changed to @Nodelete@ in a message constructor, and @Nodelete@ is used internally for message error-checking \see{Section~\ref{s:SafetyProductivity}}.
     240Therefore, reading a message's allocation status after setting to @Finished@ may be either @Nodelete@ (after construction) or @Finished@ (after explicitly setting using @set_allocation@).
    235241
    236242For the actor system to terminate, all actors must have returned a status other than @Nodelete@.
    237243After an actor is terminated, it is erroneous to send messages to it.
    238244Similarly,  after a message is terminated, it cannot be sent to an actor.
    239 Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action after a behaviour returns.
     245Note, it is safe to construct an actor or message with a status other than @Nodelete@, since the executor only examines the allocation action \emph{after} a behaviour returns.
    240246
    241247\subsection{Actor Envelopes}\label{s:envelope}
    242248As stated, each message, regardless of where it is allocated, can be sent to an arbitrary number of actors, and hence, appear on an arbitrary number of message queues.
    243 Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies.
    244 Therefore, it up to the actor program to manage message life-time across receives.
     249Because a C program manages message lifetime, messages cannot be copied for each send, otherwise who manages the copies?
     250Therefore, it is up to the actor program to manage message life-time across receives.
    245251However, for a message to appear on multiple message queues, it needs an arbitrary number of associated destination behaviours.
    246252Hence, there is the concept of an envelop, which is dynamically allocated on each send, that wraps a message with any extra implementation fields needed to persist between send and receive.
    247253Managing the envelop is straightforward because it is created at the send and deleted after the receive, \ie there is 1:1 relationship for an envelop and a many to one relationship for a message.
     254
     255\PAB{Do you need to say more here?}
    248256
    249257% In actor systems, messages are sent and received by actors.
     
    275283\noindent@void start_actor_system( executor & this )@
    276284allows the programmer to explicitly create and configure an executor for use by the actor system.
    277 Executor configuration options include are discussed in Section~\ref{s:executor}.
     285Executor configuration options are discussed in Section~\ref{s:executor}.
    278286
    279287\noindent
     
    282290\subsection{Actor Send}\label{s:ActorSend}
    283291All message sends are done using the vertical-bar (bit-or) operator, @?|?@, similar to the syntax of the \CFA stream I/O.
    284 Hence, programmers must write a matching @?|?@ routine for each @receive@ routine, which is awkward and generates a maintenance problem that must be solved.
    285 \CFA's current best approach for creating a generic @?|?@ routine requires users to create routines for their actor and message types that access the base type.
    286 Since these routines are not complex, they can be generated using macros that are used to annotate the user-defined actor and message types.
    287 This approach is used in \CFA's intrusive list data structure.
    288 This is not much better than asking users to write the @?|?@ routine themselves in terms of maintenance, since the user needs to remember the boilerplate macro annotation.
    289 
    290 As stated, \CFA does not have named inheritance with RTTI.
    291 \CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work.
    292 Virtuals would provide a mechanism to write a single generic @?|?@ routine taking a base actor and message type, that would dynamically select the @receive@ routine from the actor argument.
    293 Note, virtuals are not needed for the send; Plan-9 inheritance is sufficient because only the inherited fields are needed during the message send (only upcasting is needed).
    294 
    295 Therefore, a template-like approach was chosen, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature.
    296 This approach requires no annotation or additional code to be written by users, thus it resolves the maintenance problem.
    297 (When the \CFA virtual routines mature, it should be possible to seamlessly transition to it from the template approach.)
     292One way to provide this operator is through the \CFA type system:
     293\begin{cfa}
     294actor & ?|?( actor &, message & ) { // base actor and message types
     295        // boilerplate to send message to executor mail queue
     296}
     297actor | str_msg | int_msg;   // rewritten: ?|?( ?|?( actor, int_msg ), str_msg )
     298\end{cfa}
     299In the \CFA type system, calls to this routine work for any pair of parameters that inherit from the @actor@ and @message@ types via Plan-9 inheritance.
     300However, within the body the routine, all type information about the derived actor and message is lost (type erasure), so this approach is unable to find the right @receive@ routine to put in the envelope.
     301
     302If \CFA had a fully-fledged virtual system, the generic @?|?@ routine would work, since the virtual system could dynamically select the derived @receive@ routine via virtual dispatch.
     303\CFA does have a preliminary form of virtual routines, but it is not mature enough for use in this work, so a different approach is needed.
     304
     305Without virtuals, the idiomatic \CFA way to create the generic @?|?@ routine is using @forall@:
     306\begin{cfa}
     307// forall types A, M that have a receive that returns allocation
     308forall( A &, M & | { allocation receive( A &, M & ); } )
     309A & ?|?( A &, M & ) { // actor and message types
     310        // boilerplate to send message to executor mail queue
     311}
     312\end{cfa}
     313This approach should work.
     314However, the \CFA type system is still a work in progress, and there is a nontrivial bug where inherited routines are not recognized by @forall@.
     315For example, Figure~\ref{f:type_problem} shows type @B@ has an inherited @foo@ routine through type @A@ and should find the @bar@ routine defined via the @forall@, but does not due the type-system bug.
     316
     317\begin{figure}
     318\begin{cfa}
     319struct A {};
     320struct B { inline A; }
     321void foo( A & a ) { ... }
     322
     323// for all types that have a foo routine here is a bar routine
     324forall( T & | { void foo( T & ); } )
     325void bar( T & t ) { ... }
     326
     327int main() {
     328        B b;
     329        foo( b ); // B has a foo so it should find a bar via the forall
     330        bar( b ); // compilation error, no bar found for type B
     331}
     332\end{cfa}
     333\caption{\CFA Type-System Problem}
     334\label{f:type_problem}
     335\end{figure}
     336
     337Users could be expected to write the @?|?@ routines, but this approach is error prone and creates maintenance issues.
     338Until the \CFA type-system matures, I created a workaround using a template-like approach, where the compiler generates a matching @?|?@ routine for each @receive@ routine it finds with the correct actor/message type-signature.
     339This workaround is outside of the type system, but performing a type-system like action.
     340The workaround requires no annotation or additional code to be written by users;
     341thus, it resolves the maintenance and error problems.
     342It should be possible to seamlessly transition the workaround into any updated version of the \CFA type-system.
    298343
    299344Figure~\ref{f:send_gen} shows the generated send routine for the @int_msg@ receive in Figure~\ref{f:CFAActor}.
    300345Operator @?|?@ has the same parameter signature as the corresponding @receive@ routine and returns an @actor@ so the operator can be cascaded.
    301346The routine sets @rec_fn@ to the matching @receive@ routine using the left-hand type to perform the selection.
    302 Then the routine packages the base and derived actor and message and actor, along with the receive routine into an \hyperref[s:envelope]{envelope}.
     347Then the routine packages the actor and message, along with the receive routine into an envelope.
    303348Finally, the envelop is added to the executor queue designated by the actor using the executor routine @send@.
    304349
     
    306351\begin{cfa}
    307352$\LstCommentStyle{// from Figure~\ref{f:CFAActor}}$
    308 struct my_actor { inline actor; };                                              $\C[3.75in]{// actor}$
    309 struct int_msg { inline message; int i; };                              $\C{// message}$
     353struct my_actor { inline actor; };                                              $\C[3.75in]{// actor}$
     354struct int_msg { inline message; int i; };                              $\C{// message}$
    310355allocation receive( @my_actor &, int_msg & msg@ ) {...} $\C{// receiver}$
    311356
     
    314359actor & ?|?( @my_actor & receiver, int_msg & msg@ ) {
    315360        allocation (*rec_fn)( my_actor &, int_msg & ) = @receive@; // deduce receive routine
    316         request req{ &receiver, (actor *)&receiver, &msg, (message *)&msg, (receive_t)rec_fn };
    317         send( receiver, req );                                                          $\C{// queue message for execution}\CRT$
     361        request req{ (actor *)&receiver, (message *)&msg, (receive_t)rec_fn };
     362        send( receiver, req );                                                          $\C{// queue message for execution}\CRT$
    318363        return receiver;
    319364}
     
    323368\end{figure}
    324369
     370Figure~\ref{f:ConvenienceMessages} shows three builtin convenience messages and receive routines used to terminate actors, depending on how an actor is allocated: @Delete@, @Destroy@ or @Finished@.
     371For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor.
     372Note, assignment is used to initialize these messages rather than constructors because the constructor changes the allocation to @Nodelete@ for error checking
     373
     374\begin{figure}
     375\begin{cfa}
     376message __base_msg_finished $@$= { .allocation_ : Finished };
     377struct delete_msg_t { inline message; } delete_msg = __base_msg_finished;
     378struct destroy_msg_t { inline message; } destroy_msg = __base_msg_finished;
     379struct finished_msg_t { inline message; } finished_msg = __base_msg_finished;
     380
     381allocation receive( actor & this, delete_msg_t & msg ) { return Delete; }
     382allocation receive( actor & this, destroy_msg_t & msg ) { return Destroy; }
     383allocation receive( actor & this, finished_msg_t & msg ) { return Finished; }
     384\end{cfa}
     385\caption{Builtin Convenience Messages}
     386\label{f:ConvenienceMessages}
     387\end{figure}
     388
    325389\subsection{Actor Termination}\label{s:ActorTerm}
    326390During a message send, the receiving actor and message being sent are stored via pointers in the envelope.
    327 These pointers have the base actor and message types, so type information of the actor and message is lost and then recovered later when the typed receive routine is called.
     391These pointers are the base actor and message types, so type information of the derived actor and message is lost and must be recovered later when the typed receive routine is called.
    328392After the receive routine is done, the executor must clean up the actor and message according to their allocation status.
    329393If the allocation status is @Delete@ or @Destroy@, the appropriate destructor must be called by the executor.
    330 This poses a problem; the correct type of the actor or message is not available to the executor, but it needs to call the right destructor!
    331 This requires downcasting from the base type to derived type, which requires a virtual system.
    332 Thus, a rudimentary destructor-only virtual system was added to \CFA as part of this work.
     394This poses a problem;
     395the derived type of the actor or message is not available to the executor, but it needs to call the derived destructor.!
     396This requires downcasting from the base type to the derived type, which requires a virtual system.
     397To accomplish the dowcast, I implemented a rudimentary destructor-only virtual system in \CFA.
    333398This virtual system is used via Plan-9 inheritance of the @virtual_dtor@ type, shown in Figure~\ref{f:VirtDtor}.
    334399The @virtual_dtor@ type maintains a pointer to the start of the object, and a pointer to the correct destructor.
    335 When a type inherits the @virtual_dtor@ type, the compiler adds code to its destructor to make sure that any destructor calls along this segment of the inheritance tree is called are intercepted, and restart at the appropriate destructor for that object.
    336 
    337 \begin{figure}
    338 \begin{cfa}
    339 struct base_type { inline virtual_dtor; };
    340 struct intermediate_type { inline base_type; };
    341 struct derived_type { inline intermediate_type; };
     400When a type inherits @virtual_dtor@, the compiler adds code to its destructor to intercepted any destructor calls along this segment of the inheritance tree and restart at the appropriate destructor for that object.
     401
     402\begin{figure}
     403\centering
     404
     405\begin{lrbox}{\myboxA}
     406\begin{cfa}
     407struct base { inline virtual_dtor; };
     408void ^?{}( base & ) { sout | "^base"; }
     409struct intermediate { inline base; };
     410void ^?{}( intermediate & ) { sout | "^intermediate"; }
     411struct derived { inline intermediate; };
     412void ^?{}( derived & ) { sout | "^derived"; }
    342413
    343414int main() {
    344     derived_type d1, d2, d3;
    345     intermediate_type & i = d2;
    346     base_type & b = d3;
    347 
    348     // these will call the destructors in the correct order
    349     ^d1{}; ^i{}; ^b{};
    350 }
    351 
    352 \end{cfa}
     415        base & b;
     416        intermediate i;
     417        derived d1, d2, d3;
     418        intermediate & ri = d2;
     419        base & rb = d3;
     420        // explicit destructor calls
     421        ^d1{};  sout | nl;
     422        ^ri{};  sout | nl;
     423        ^rb{};  sout | nl;
     424} // ^i, ^b
     425\end{cfa}
     426\end{lrbox}
     427
     428\begin{lrbox}{\myboxB}
     429\begin{cfa}
     430^derived
     431^intermediate
     432^base
     433
     434^derived
     435^intermediate
     436^base
     437
     438^derived
     439^intermediate
     440^base
     441
     442^intermediate
     443^base
     444
     445
     446
     447
     448\end{cfa}
     449
     450\end{lrbox}
     451\subfloat[Destructor calls]{\label{l:destructor_calls}\usebox\myboxA}
     452\hspace*{10pt}
     453\vrule
     454\hspace*{10pt}
     455\subfloat[Output]{\usebox\myboxB}
     456
    353457\caption{\CFA Virtual Destructor}
    354458\label{f:VirtDtor}
    355459\end{figure}
    356460
    357 This virtual destructor system was built for this work, but is general and can be used in any type in \CFA.
     461While this virtual destructor system was built for this work, it is general and can be used in any type in \CFA.
    358462Actors and messages opt into this system by inheriting the @virtual_dtor@ type, which allows the executor to call the right destructor without knowing the derived actor or message type.
    359 
    360 Figure~\ref{f:ConvenienceMessages} shows three builtin convenience messages and receive routines used to terminate actors, depending on how an actor is allocated: @Delete@, @Destroy@ or @Finished@.
    361 For example, in Figure~\ref{f:CFAActor}, the builtin @finished_msg@ message and receive are used to terminate the actor because the actor is allocated on the stack, so no deallocation actions are performed by the executor.
    362 
    363 \begin{figure}
    364 \begin{cfa}
    365 message __base_msg_finished $@$= { .allocation_ : Finished }; // no auto-gen constructors
    366 struct __delete_msg_t { inline message; } delete_msg = __base_msg_finished;
    367 struct __destroy_msg_t { inline message; } destroy_msg = __base_msg_finished;
    368 struct __finished_msg_t { inline message; } finished_msg = __base_msg_finished;
    369 
    370 allocation receive( actor & this, __delete_msg_t & msg ) { return Delete; }
    371 allocation receive( actor & this, __destroy_msg_t & msg ) { return Destroy; }
    372 allocation receive( actor & this, __finished_msg_t & msg ) { return Finished; }
    373 \end{cfa}
    374 \caption{Builtin Convenience Messages}
    375 \label{f:ConvenienceMessages}
    376 \end{figure}
    377463
    378464\section{\CFA Executor}\label{s:executor}
     
    380466An executor of an actor system is the scheduler that organizes where actor behaviours are run and how messages are sent and delivered.
    381467In \CFA, the executor is message-centric \see{Figure~\ref{f:inverted_actor}}, but extended by over sharding of a message queue \see{left side of Figure~\ref{f:gulp}}, \ie there are $M$ message queues where $M$ is greater than the number of executor threads $N$ (usually a multiple of $N$).
    382 This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor FIFO message-delivery semantics.
     468This approach reduces contention by spreading message delivery among the $M$ queues rather than $N$, while still maintaining actor \gls{fifo} message-delivery semantics.
    383469The only extra overhead is each executor cycling (usually round-robin) through its $M$/$N$ queues.
    384470The goal is to achieve better performance and scalability for certain kinds of actor applications by reducing executor locking.
     
    444530\section{Work Stealing}\label{s:steal}
    445531Work stealing is a scheduling strategy to provide \Newterm{load balance}.
    446 The goal is to increase resource utilization by having idle threads steal work from working threads.
    447 While there are multiple parts in work-stealing scheduler, the two important components are victim selection and the stealing mechanism.
     532The goal is to increase resource utilization by having an idle thread steal work from a working thread.
     533While there are multiple parts in a work-stealing scheduler, the two important components are the stealing mechanism and victim selection.
    448534
    449535\subsection{Stealing Mechanism}
    450 In work stealing, the stealing worker is called the \Newterm{thief} and the stolen-from worker is called the \Newterm{victim}.
    451 The stealing mechanism presented here differs from existing work-stealing actor-systems because of the message-centric (inverted) actor-system.
    452 Other actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF}, have work stealing, but use an actor-centric system where stealing is dequeuing from a non-empty ready-queue to an empty ready-queue.
    453 As an example, in CAF, the sharded actor queue is a set of double-ended queues (dequeues).
    454 When an actor has messages, it is inserted into a worker's dequeue (ready queue).
    455 Workers then consume actors from the dequeue and execute their behaviours.
    456 To steal work, thieves take one or more actors from a victim's dequeue.
    457 By the pigeon hole principle, there are three dequeue operations (push/victim pop/thief pop) that can occur concurrently and only two ends to a dequeue, so work stealing in a dequeue-based system always results in a potential increase in contention on the dequeues.
     536In work stealing, the stealing worker is called the \Newterm{thief} and the stolen worker is called the \Newterm{victim}.
     537% Workers consume actors from their ready queue and execute their behaviours.
     538% Through these behaviours, a worker inserts messages onto its own and other worker ready-queues.
     539To steal, a thieve takes one or more actors from a victim's ready queue, so work stealing always results in a potential increase in contention on ready queues between the victim gulping from a queue and the thief stealing the queue.
    458540This contention can slows down the victim's throughput.
    459 Note, which end of the dequeue is used for stealing, consuming, and inserting is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations.
    460 
    461 Work steal now becomes queue stealing, where an entire actor/message queue is stolen, which trivially preserves message ordering in a queue \see{Section~\ref{s:steal}}.
     541Note, the data structure used for the ready queue is not discussed since the largest cost is the mutual exclusion and its duration for safely performing the queue operations.
    462542
    463543% C_TODO: maybe insert stealing diagram
    464544
     545The stealing mechanism in this work differs from most work-stealing actor-systems because of the message-centric (inverted) actor-system.
     546Actor systems, such as Akka~\cite{Akka} and CAF~\cite{CAF} using actor-centric systems, steal by dequeuing from a non-empty actor ready-queue and enqueue\-ing to an empty ready-queue.
     547% As an example, in CAF, the sharded actor ready queue is a set of double-ended queues (dequeues).
    465548In \CFA, the actor work-stealing implementation is unique because of the message-centric system.
    466 In this system, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue.
    467 To ensure sequential actor execution and FIFO message delivery, actor stealing requires finding and removing all of an actor's messages, and inserting them consecutively in another message queue.
     549With this approach, it is impractical to steal actors because an actor's messages are distributed in temporal order along the message queue.
     550To ensure sequential actor execution and \gls{fifo} message delivery, actor stealing requires finding and removing \emph{all} of an actor's messages, and inserting them consecutively in another message queue.
    468551This operation is $O(N)$ with a non-trivial constant.
    469 The only way for work stealing to become practical is to shard the message queue, which also reduces contention, and steal queues to eliminate queue searching.
    470 
    471 Given queue stealing, the goal is to have a zero-victim-cost stealing mechanism, which does not mean stealing has no cost.
    472 It means work stealing does not affect the performance of the victim worker.
     552The only way for work stealing to become practical is to shard each worker's message queue, which also reduces contention, and steal queues to eliminate queue searching.
     553
     554Given queue stealing, my goal is to have an essentially zero-contention-cost stealing mechanism.
     555This goal means work stealing has minimal affect on the performance of the victim.
    473556The implication is that thieves cannot contend with a victim, and that a victim should perform no stealing related work unless it becomes a thief.
    474 In theory, this goal is not achievable, but results show the goal is achieved in practice.
    475 
    476 In \CFA's actor system, workers own a set of sharded queues, which they iterate over and gulp.
    477 If a worker has iterated over its message queues twice without finding any work, it tries to steal a queue from another worker.
    478 Stealing a queue is done wait-free with a few atomic instructions that can only create contention with other stealing workers, not the victim.
    479 To steal a queue, a worker does the following:
     557In theory, this goal is not achievable, but practical results show the goal is virtually achieved.
     558
     559One important lesson I learned working on \uC actors~\cite{} and talking with fellow student Thierry Delisle, who examined work-stealing for user-threads in his Ph.D., is \emph{not} to aggressively steal.
     560With reasonable workloads, being a thief should be a temporary state, \ie eventually work appears on the thief's ready-queues and it returns to normal operation.
     561Furthermore, the act of \emph{looking} to find work is invasive (Heisenberg uncertainty principle), possibly disrupting multiple victims.
     562Therefore, stealing should be done lazily in case work appears for the thief and to minimize disruption of victims.
     563Note, the cost of stealing is not crucial for the thief because it does not have anything else to do except poll or block.
     564
     565My lazy-stealing approach is to select a victim, scan its queues once, and return immediately if a queue is stolen.
     566Then perform a regular scan looking for work, where stolen work is placed at the end of the scan.
     567Hence, only one victim is effected and there is a reasonable delay between stealing events by scanning all the thief's ready queue looking for its own work.
     568If no work is found in the thief's queues because a steal is unsuccessful, it performs another steal from a different victim.
     569This lazy examination by the thief has a low perturbation cost for victims, while still finding work in a moderately loaded system.
     570In all work-stealing algorithms, there is the pathological case where there is too little work and too many workers;
     571this scenario can only be handled by putting workers to sleep or deleting them.
     572This case is not examined in this work.
     573
     574In detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker.
     575Stealing a queue is done wait-free (\ie no busy waiting) with a few atomic instructions that only create contention with other stealing workers, not the victim.
     576The complexity in the implementation is that victim gulping does not take the mailbox queue;
     577rather it atomically transfers the mailbox queue to another list leaving the mailbox empty.
     578Hence, the original list is available for new message deliveries.
     579However, this transfer logically subdivides the mailbox, and during this period, the mailbox cannot be stolen;
     580otherwise there are two threads simultaneously running messages on actors in the two parts of the mailbox queue.
     581To solve this problem, an atomic gulp also marks the mailbox queue as subdivided, making it ineligible for stealing.
     582Hence, a thief checks for ineligible and non-empty before attempting an atomic steal of a queue,
     583
     584
     585
     586To steal a queue, a thief does the following:
    480587\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
    481588\item
    482 The thief chooses a victim, which is trivial because all workers are stored in a shared array.
    483 
    484 \item
    485 The thief starts at a random index in the array of the victim's queues and searches for a candidate queue.
    486 A candidate queue is any non-empty queue not being processed by the victim and not being stolen by another thief.
    487 These rules are not strictly enforced.
    488 A candidate is identified non-atomically, and as such, queues that do not satisfy these rules may be stolen.
    489 However, steals not meeting the rules do not affect correctness and do not constitute failed steals as the queue is always swapped.
    490 
    491 \item
    492 Once a candidate queue is chosen, the thief attempts a wait-free swap of a victim's queue to a random empty thief queue.
    493 If the swap successes, the steal is completed.
    494 If the swap fails, the victim may have been gulping that message queue or another thief must have attempted to steal the victim's queue.
    495 In either case, that message queue is highly likely to be empty.
    496 
    497 \item
    498 Once a thief fails or succeeds in stealing a queue, it iterates over its messages queues again because new messages may have arrived during stealing.
     589chooses a victim, which is trivial because there is an array of $N$ pointers to the mailboxes (copy queues) that is subdivided into $M$/$N$ worker ranges \see{Section~\ref{s:executor}}.
     590
     591\item
     592scan the victim's mailbox-range and test each mail queue in that range for a non-empty queue by performing a test-for-non-empty and swap.
     593The conditional check reduces the number of atomic operations.
     594The swap races to interchange the non-empty mail-queue pointer from the victim's range with an empty mail-queue pointer in the thief's range.
     595Regardless, either a thief swaps or the victim gulps the mail queue;
     596correctness is guaranteed, no matter which thread wins the race, because both threads are cycling through appropriate lists.
     597Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers.
     598
     599\item
     600stops searching after a non-empty queue steal or all queues in the random worker's range are examined.
     601The thief then returns to its scheduler and iterates over its messages queues, because new messages may have arrived during stealing, including a stolen queue.
    499602Stealing is only repeated after two consecutive iterations over its owned queues without finding work.
    500603\end{enumerate}
    501 
    502 The key to the stealing mechanism is that the queues can still be operated on while they are being swapped.
    503 This functionality eliminates any contention among thieves and victims.
    504 
    505 The first key to this is that actors and workers maintain two distinct arrays of references to queues.
    506 Actors will always receive messages via the same queues.
    507 Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own.
    508 Swapping queues is a matter of atomically swapping two pointers in the worker array.
    509 As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references.
    510 
    511 Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes.
    512 When a worker is not stealing it iterates across its own range of queues and gulps them one by one.
    513 When a worker operates on a queue it first copies the current pointer from the worker array of references to a local variable.
     604This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue.
     605As well, pushes to the queues by other workers can happen concurrently during the swap since pushes happen via the actor queue references.
     606
     607
     608
     609\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
     610\item
     611selects a random worker's mailbox-range and tests each mail queue in that range for a non-empty queue.
     612
     613A queue can be stolen and gulped at the same time since they are independent operations.
     614However, two workers cannot gulp from the same queue at the same time since that would violate the ordering and mutual exclusion guarantees of actors, so an avail flag is used on queues to stop this from happening.
     615If a worker tries to gulp a queue and sees an avail flag up, the gulp fails, and it will move on to other queues and try to gulp it again later.
     616stops searching after a non-empty queue steal or all queues in the random worker's range are examined.
     617It stops searching after any queue steal (even an empty one) or all queues are examined.
     618
     619Stealing is only repeated after two consecutive iterations over its owned queues without finding work.
     620Note, a thief can never exceeds its $M$/$N$ worker range because it is always exchanging queues with other workers.
     621\end{enumerate}
     622This approach largely eliminates contention among thieves and victims except for the rare moment when a thief and victim attempt to steal or gulp the same queue.
     623The first and last sentence here are correct. I'm not sure I understand the second sentence.
     624
     625
     626
     627As an aside I think I need a diagram here since it is clear that the algorithm is not clear through my writing.
     628I made a very quick one that I'll use here to explain, but I will work on one to add to the chapter.
     629
     630
     631To give some accompanying code there are two arrays, the array of copy queues, and the array of pointers to the copy queues:
     632\begin{cfa}
     633work_queue * request_queues;                   // master array of work request queues
     634work_queue ** worker_req_queues;                // secondary array of work queues to allow for swapping
     635\end{cfa}
     636\includegraphics[width=4in]{../figures/steal_diagram.eps}
     637When a send happens, it inserts a request directly into one of the queues in @request_queues@, when a steal happens two pointers in @worker_req_queues@ are swapped.
     638Each worker owns a section of @worker_req_queues@.
     639When a worker gulps, it dereferences one of the pointers in its section of @worker_req_queues@ and then gulps from the queue it points at.
     640As such, operations can happen on the queues directly independent of stealing, which avoids almost all contention between stealing threads and busy threads.
     641
     642If you want to poke some of the code, in @actor.hfa@ @steal_work()@ selects the victim, @choose_queue()@ iterates over the queues looking for non-empty queues and @try_swap_queues()@ performs the thread safe pointer swap.
     643
     644% The first key to this is that actors and workers maintain two distinct arrays of references to queues.
     645% Actors will always receive messages via the same queues.
     646% Workers, on the other hand will swap the pointers to queues in their shared array and operate on queues in the range of that array that they own.
     647% Swapping queues is a matter of atomically swapping two pointers in the worker array.
     648% As such pushes to the queues can happen concurrently during the swap since pushes happen via the actor queue references.
     649
     650% Gulping can also occur during queue swapping, but the implementation requires more nuance than the pushes.
     651% When a worker is not stealing it iterates across its own range of queues and gulps them one by one.
     652
     653Correctness of gulping is slightly more complex with stealing.
     654When a worker operates on a queue, it first copies the queue pointer from the mailbox array to a local variable.
    514655It then uses that local variable for all queue operations until it moves to the next index of its range of the queue array.
    515656This ensures that any swaps do not interrupt gulping operations, however this introduces a correctness issue.
     
    650791In the failed case of step 3 the program state is safely restored to its state it had prior to the @0p@ write in step 2, thanks to the invariant that makes the write back to the @0p@ pointer safe.
    651792
     793\begin{comment}
    652794\subsection{Stealing Guarantees}
    653795Given that the stealing operation can potentially fail, it is important to discuss the guarantees provided by the stealing implementation.
     
    754896Thus, by contrapositive, if the graph contains a cycle then there exists a situation where no swaps occur.
    755897Hence, at least one swap is guaranteed to succeed if and only if the graph does not contain a cycle.
     898\end{comment}
    756899
    757900% C_TODO: go through and use \paragraph to format to make it look nicer
     
    832975
    833976Another productivity feature that is included is a group of poison-pill messages.
    834 Poison-pill messages are common across actor systems, and are used in actor libraries Akka and ProtoActor~\cite{Akka,ProtoActor}.
     977Poison-pill messages are common across actor systems, including Akka and ProtoActor \cite{Akka,ProtoActor}.
    835978Poison-pill messages inform an actor to terminate.
    836979In \CFA, due to the allocation of actors and lack of garbage collection, there needs to be a suite of poison-pills.
     
    8761019        & \multicolumn{1}{c|}{\CFA (100M)} & \multicolumn{1}{c|}{CAF (10M)} & \multicolumn{1}{c|}{Akka (100M)} & \multicolumn{1}{c|}{\uC (100M)} & \multicolumn{1}{c@{}}{ProtoActor (100M)} \\
    8771020        \hline
    878         AMD             & \input{data/nasusSendStatic} \\
     1021        AMD             & \input{data/pykeSendStatic} \\
    8791022        \hline
    880         Intel   & \input{data/pykeSendStatic}
     1023        Intel   & \input{data/nasusSendStatic}
    8811024\end{tabular}
    8821025
     
    8891032        & \multicolumn{1}{c|}{\CFA (20M)} & \multicolumn{1}{c|}{CAF (2M)} & \multicolumn{1}{c|}{Akka (2M)} & \multicolumn{1}{c|}{\uC (20M)} & \multicolumn{1}{c@{}}{ProtoActor (2M)} \\
    8901033        \hline
    891         AMD             & \input{data/nasusSendDynamic} \\
     1034        AMD             & \input{data/pykeSendDynamic} \\
    8921035        \hline
    893         Intel   & \input{data/pykeSendDynamic}
     1036        Intel   & \input{data/nasusSendDynamic}
    8941037\end{tabular}
    8951038\end{table}
     
    9111054The results from the static/dynamic send benchmarks are shown in Figures~\ref{t:StaticActorMessagePerformance} and \ref{t:DynamicActorMessagePerformance} respectively.
    9121055\CFA leads the charts in both benchmarks, largely due to the copy queue removing the majority of the envelope allocations.
    913 Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incurrs a compile-time cost.
     1056Additionally, the receive of all messages sent in \CFA is statically known and is determined via a function pointer cast, which incurs a compile-time cost.
    9141057All the other systems use their virtual system to find the correct behaviour at message send.
    9151058This requires two virtual dispatch operations, which is an additional runtime send cost that \CFA does not have.
     
    10561199
    10571200This result is shown in Figure~\ref{f:cfaRepeatAMD} and \ref{f:cfaRepeatIntel} where the no-stealing version of \CFA performs better than both stealing variations.
    1058 In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing).
     1201In particular on the Intel machine in Figure~\ref{f:cfaRepeatIntel}, the cost of stealing is higher, which can be seen in the vertical shift of Akka, CAF and \CFA results in Figure~\ref{f:RepeatIntel} (\uC and ProtoActor do not have work stealing).
    10591202The shift for CAF is particularly large, which further supports the hypothesis that CAF's work stealing is particularly eager.
    10601203In both the executor and the repeat benchmark CAF performs poorly.
     
    10831226
    10841227Figure~\ref{t:ExecutorMemory} shows the high memory watermark of the actor systems when running the executor benchmark on 48 cores.
    1085 \CFA has a high watermark relative to the other non-garbage-collected systems \uC, and CAF.
     1228\CFA has a high watermark relative to the other non-garbage collected systems \uC, and CAF.
    10861229This is a result of the copy queue data structure, as it will over-allocate storage and not clean up eagerly, whereas the per envelope allocations will always allocate exactly the amount of storage needed.
    10871230Despite having a higher watermark, the \CFA memory usage remains comparable to other non-garbage-collected systems.
Note: See TracChangeset for help on using the changeset viewer.