Ignore:
Timestamp:
Oct 20, 2016, 12:06:38 PM (8 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
d1fbaa78
Parents:
c2183a3
Message:

Finished reviewing concurrency section v0.4

Location:
doc/proposals/concurrency
Files:
1 added
3 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/concurrency.tex

    rc2183a3 rb1bdc7d6  
    6262\newcommand{\cit}{\textsuperscript{[Citation Needed]}\xspace}
    6363\newcommand{\code}[1]{\lstinline{#1}}
     64\newcommand{\pseudo}[1]{\lstinline[language=Pseudo]{#1}}
    6465
    6566\input{glossary}
     
    510511\begin{center}
    511512\begin{tabular}{l}
    512 \begin{lstlisting}
    513         ¶if¶ critical section is free :
     513\begin{lstlisting}[language=Pseudo]
     514        if monitor is free :
    514515                enter
    515         elif critical section accepts me :
     516        elif monitor accepts me :
    516517                enter
    517         ¶else¶ :
     518        else :
    518519                block
    519520\end{lstlisting}
     
    521522\end{center}
    522523
    523 For the \code{critical section is free} condition it is easy to implement a check that can evaluate the condition in a few instruction. However, a fast check for \code{critical section accepts me} is much harder to implement depending on the constraints put on the monitors. Indeed, monitors are often expressed as an entry queue and some acceptor queue as in the following figure :
    524 
    525 \begin{center}
    526 {\resizebox{0.5\textwidth}{!}{\input{monitor}}}
    527 \end{center}
    528 
    529 There are other alternatives to these pictures but in the case of this picture implementing a fast accept check is relatively easy. Indeed simply updating a bitmask when the acceptor queue changes is enough to have a check that executes in a single instruction, even with a fairly large number of acceptor. However, this requires all the acceptable routines to be declared with the monitor declaration. For OO languages this doesn't compromise much since monitors already have an exhaustive list of member routines. However, for \CFA this isn't the case, routines can be added to a type anywhere after its declaration. A more flexible
    530 
    531 
    532 At this point we must make a decision between flexibility and performance. Many design decisions in \CFA achieve both flexibility and performance, for example polymorphic routines add significant flexibility but inlining them means the optimizer can easily remove any runtime cost.
    533 
    534 This approach leads to the \uC example being translated to :
    535 \begin{lstlisting}
    536         accept( void g(mutex struct A & mutex a) )
    537         mutex struct A {};
    538 
    539         void f(A & mutex a) { accept(g); }
    540         void g(A & mutex a);
    541 \end{lstlisting}
    542 
    543 This syntax is the most consistent with the language since it somewhat mimics the \code{forall} declarations. However, the fact that it comes before the struct declaration does means the type needs to be forward declared (done inline in the example). Here are a few alternatives to this syntax : \\
    544 \begin{tabular}[t]{l l}
     524For the \pseudo{monitor is free} condition it is easy to implement a check that can evaluate the condition in a few instruction. However, a fast check for \pseudo{monitor accepts me} is much harder to implement depending on the constraints put on the monitors. Indeed, monitors are often expressed as an entry queue and some acceptor queue as in the following figure :
     525
     526\begin{center}
     527{\resizebox{0.4\textwidth}{!}{\input{monitor}}}
     528\end{center}
     529
     530There are other alternatives to these pictures but in the case of this picture implementing a fast accept check is relatively easy. Indeed simply updating a bitmask when the acceptor queue changes is enough to have a check that executes in a single instruction, even with a fairly large number of acceptor. However, this relies on the fact that all the acceptable routines are declared with the monitor type. For OO languages this doesn't compromise much since monitors already have an exhaustive list of member routines. However, for \CFA this isn't the case, routines can be added to a type anywhere after its declaration. Its important to note that the bitmask approach does not actually require an exhaustive list of routines, but it requires a dense unique ordering of routines with an upper-bound and that ordering must be consistent across translation units.
     531The alternative would be to have a picture more like this one:
     532
     533\begin{center}
     534{\resizebox{0.4\textwidth}{!}{\input{ext_monitor}}}
     535\end{center}
     536
     537Not storing the queues inside the monitor means that the storage can vary between routines, allowing for more flexibility and extensions. Storing an array of function-pointers would solve the issue of uniquely identifying acceptable routines. However, the single instruction bitmask compare has been replaced by dereferencing a pointer followed by a linear search. Furthermore, supporting nested external scheduling may now require additionnal searches on calls to accept to check if a routine is already queued in.
     538
     539At this point we must make a decision between flexibility and performance. Many design decisions in \CFA achieve both flexibility and performance, for example polymorphic routines add significant flexibility but inlining them means the optimizer can easily remove any runtime cost. Here however, the cost of flexibility cannot be trivially removed.
     540
     541In either cases here are a few alternatives for the different syntaxes this syntax : \\
     542\begin{center}
     543{\renewcommand{\arraystretch}{1.5}
     544\begin{tabular}[t]{l @{\hskip 0.35in} l}
     545\hline
     546\multicolumn{2}{ c }{\code{accept} on type}\\
     547\hline
    545548Alternative 1 & Alternative 2 \\
    546549\begin{lstlisting}
    547550mutex struct A
    548 accept( void g(A & mutex a) )
     551accept( void f(A & mutex a) )
    549552{};
    550553\end{lstlisting} &\begin{lstlisting}
    551554mutex struct A {}
    552 accept( void g(A & mutex a) );
     555accept( void f(A & mutex a) );
    553556
    554557\end{lstlisting} \\
     
    556559\begin{lstlisting}
    557560mutex struct A {
    558         accept( void g(A & mutex a) )
     561        accept( void f(A & mutex a) )
    559562};
    560563
     
    562565mutex struct A {
    563566        accept :
    564                 void g(A & mutex a) );
     567                void f(A & mutex a) );
    565568};
    566 \end{lstlisting}
     569\end{lstlisting}\\
     570\hline
     571\multicolumn{2}{ c }{\code{accept} on routine}\\
     572\hline
     573\begin{lstlisting}
     574mutex struct A {};
     575
     576void f(A & mutex a)
     577
     578accept( void f(A & mutex a) )
     579void g(A & mutex a) {
     580        /*...*/
     581}
     582\end{lstlisting}&\\
    567583\end{tabular}
    568 
     584}
     585\end{center}
    569586
    570587An other aspect to consider is what happens if multiple overloads of the same routine are used. For the time being it is assumed that multiple overloads of the same routine should be scheduled regardless of the overload used. However, this could easily be extended in the future.
     
    597614\end{lstlisting}
    598615
    599 This is unambiguous. The both locks will be acquired and kept, when routine \code{f} is called the lock for monitor \code{a} will be temporarily transferred from \code{g} to \code{f} (while \code{g} still holds lock \code{b}). This behavior can be extended to multi-monitor accept statment as follows.
     616This is unambiguous. Both locks will be acquired and kept, when routine \code{f} is called the lock for monitor \code{a} will be temporarily transferred from \code{g} to \code{f} (while \code{g} still holds lock \code{b}). This behavior can be extended to multi-monitor accept statment as follows.
    600617
    601618\begin{lstlisting}
     
    613630
    614631\subsubsection{Implementation Details: External scheduling queues}
    615 To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they most be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects gut that is a reasonnable contraint. This algorithm choice has two consequences, the ofthe highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probably that half the multi-monitor queues will go unused for the entire duration of the program.
     632To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they must be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects but that is a reasonnable constraint. This algorithm choice has two consequences, the entry queue of the highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probable that half the multi-monitor queues will go unused for the entire duration of the program.
    616633
    617634\subsection{Other concurrency tools}
     635TO BE CONTINUED...
    618636
    619637\section{Parallelism}
  • doc/proposals/concurrency/ext_monitor.fig

    rc2183a3 rb1bdc7d6  
    88-2
    991200 2
    10 6 1275 1200 1575 1500
    11 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 1425 1350 105 105 1425 1350 1530 1350
    12 4 1 -1 0 0 0 10 0.0000 2 105 90 1425 1410 b\001
    13 -6
    14 6 1275 1500 1575 1800
    15 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 1425 1650 105 105 1425 1650 1530 1650
    16 4 1 -1 0 0 0 10 0.0000 2 75 75 1425 1695 a\001
    17 -6
    18 6 2175 1500 2475 1800
    19 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 2325 1650 105 105 2325 1650 2430 1650
    20 4 1 -1 0 0 0 10 0.0000 2 75 75 2325 1695 c\001
    21 -6
    22 6 2175 1200 2475 1500
    23 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 2325 1350 105 105 2325 1350 2430 1350
    24 4 1 -1 0 0 0 10 0.0000 2 105 90 2325 1410 d\001
    25 -6
    26 6 2775 1200 7350 5700
    27105 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 3150.000 3450.000 3150 3150 2850 3450 3150 3750
    28115 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 3150.000 4350.000 3150 4050 2850 4350 3150 4650
     
    35184 1 -1 0 0 0 10 0.0000 2 105 90 6000 1860 b\001
    3619-6
    37 6 3000 5400 6975 5700
     206 5100 1800 5400 2100
     211 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 5250 1950 105 105 5250 1950 5355 1950
     224 1 -1 0 0 0 10 0.0000 2 105 120 5250 2010 Y\001
     23-6
     246 5100 2100 5400 2400
     251 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 5250 2250 105 105 5250 2250 5355 2250
     264 1 -1 0 0 0 10 0.0000 2 105 120 5250 2295 X\001
     27-6
     286 3000 5400 7200 5700
    38291 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3150 5550 80 80 3150 5550 3230 5630
    39301 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4500 5550 105 105 4500 5550 4605 5655
    40311 3 0 1 -1 -1 0 0 4 0.000 1 0.0000 6000 5550 105 105 6000 5550 6105 5655
    41 4 0 -1 0 0 0 12 0.0000 2 180 765 6225 5625 duplicate\001
    42324 0 -1 0 0 0 12 0.0000 2 135 1035 4725 5625 blocked task\001
    43334 0 -1 0 0 0 12 0.0000 2 135 870 3300 5625 active task\001
     344 0 -1 0 0 0 12 0.0000 2 180 930 6225 5625 routine ptrs\001
    4435-6
    45361 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3300 3600 105 105 3300 3600 3405 3705
     
    50411 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 6000 2400 105 105 6000 2400 6105 2505
    51421 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 5100 4575 80 80 5100 4575 5180 4655
     432 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
     44         4050 2925 5475 2925 5475 3225 4050 3225 4050 2925
    52452 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    5346         5850 2850 6075 3000
     
    77702 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
    7871         3150 3150 3750 3150 3750 2850 5325 2850
     722 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 1 2
     73        1 1 1.00 60.00 120.00
     74        7 1 1.00 60.00 120.00
     75         5250 3150 5250 2400
     762 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
     77         5100 1800 5400 1800 5400 2400 5100 2400 5100 1800
    79784 1 -1 0 0 0 10 0.0000 2 75 75 6000 2745 a\001
    80794 1 -1 0 0 0 10 0.0000 2 75 75 6000 2445 c\001
     
    93924 1 -1 0 0 0 12 0.0000 2 135 525 5100 3675 shared\001
    94934 1 -1 0 0 0 12 0.0000 2 135 735 5100 3975 variables\001
    95 -6
    96 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    97          1275 1800 1500 1950
    98 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    99          2175 1800 2400 1950
    100 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
    101          1575 1200 1575 1800 2175 1800 2175 1200
    102 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 4
    103          1275 1200 1275 2100 2475 2100 2475 1200
    104 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
    105          4050 2925 5475 2925 5475 3225 4050 3225 4050 2925
    106 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
    107         1 1 1.00 60.00 120.00
    108          1875 2400 1875 2175
    109 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 1 2
    110         7 1 1.00 60.00 120.00
    111          5250 3075 5250 2400
    112 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
    113          5250 2400 1875 2400
    114 4 1 -1 0 0 0 12 0.0000 2 135 135 2325 1125 Y\001
    115 4 1 -1 0 0 0 12 0.0000 2 120 510 1875 675 mutex\001
    116 4 1 -1 0 0 0 12 0.0000 2 135 570 1875 900 queues\001
    117 4 1 -1 0 0 0 12 0.0000 2 135 135 1425 1125 X\001
    118 4 0 0 50 -1 0 11 0.0000 2 150 795 4275 3150 Queues Ptr\001
     944 0 0 50 -1 0 11 0.0000 2 165 855 4275 3150 Acceptables\001
  • doc/proposals/concurrency/version

    rc2183a3 rb1bdc7d6  
    1 0.4.61
     10.4.88
Note: See TracChangeset for help on using the changeset viewer.