Index: doc/proposals/concurrency/concurrency.tex
===================================================================
--- doc/proposals/concurrency/concurrency.tex	(revision 599511651bb05bbe1e34d9f88513826e4eb437ff)
+++ doc/proposals/concurrency/concurrency.tex	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
@@ -61,5 +61,5 @@
 \newcommand{\uC}{$\mu$\CC}
 \newcommand{\cit}{\textsuperscript{[Citation Needed]}\xspace}
-\newcommand{\code}[1]{\lstinline{#1}}
+\newcommand{\code}[1]{\lstinline[language=CFA]{#1}}
 \newcommand{\pseudo}[1]{\lstinline[language=Pseudo]{#1}}
 
@@ -160,5 +160,5 @@
 Here, the constructor(\code{?\{\}}) uses the \code{nomutex} keyword to signify that it does not acquire the monitor mutual exclusion when constructing. This semantics is because an object not yet constructed should never be shared and therefore does not require mutual exclusion. The prefix increment operator uses \code{mutex} to protect the incrementing process from race conditions. Finally, there is a conversion operator from \code{counter_t} to \code{size_t}. This conversion may or may not require the \code{mutex} key word depending on whether or not reading an \code{size_t} is an atomic operation or not.
 
-Having both \code{mutex} and \code{nomutex} keywords could be argued to be redundant based on the meaning of a routine having neither of these keywords. For example, given a routine without wualifiers \code{void foo(counter_t & this)} then one could argue that it should default to the safest option \code{mutex}. On the other hand, the option of having routine \code{void foo(counter_t & this)} mean \code{nomutex} is unsafe by default and may easily cause subtle errors. It can be argued that \code{nomutex} is the more "normal" behaviour, the \code{nomutex} keyword effectively stating explicitly that "this routine has nothing special". Another alternative is to make having exactly one of these keywords mandatory, which would provide the same semantics but without the ambiguity of supporting routine \code{void foo(counter_t & this)}. Mandatory keywords would also have the added benefice of being self-documented but at the cost of extra typing. In the end, which solution should be picked is still up for debate. For the reminder of this proposal, the explicit approach is used for clarity.
+Having both \code{mutex} and \code{nomutex} keywords could be argued to be redundant based on the meaning of a routine having neither of these keywords. For example, given a routine without quualifiers \code{void foo(counter_t & this)} then one could argue that it should default to the safest option \code{mutex}. On the other hand, the option of having routine \code{void foo(counter_t & this)} mean \code{nomutex} is unsafe by default and may easily cause subtle errors. It can be argued that \code{nomutex} is the more "normal" behaviour, the \code{nomutex} keyword effectively stating explicitly that "this routine has nothing special". Another alternative is to make having exactly one of these keywords mandatory, which would provide the same semantics but without the ambiguity of supporting routine \code{void foo(counter_t & this)}. Mandatory keywords would also have the added benefice of being self-documented but at the cost of extra typing. In the end, which solution should be picked is still up for debate. For the reminder of this proposal, the explicit approach is used for clarity.
 
 The next semantic decision is to establish when mutex/nomutex may be used as a type qualifier. Consider the following declarations:
@@ -368,11 +368,13 @@
 \end{lstlisting}
 
-Note that in \CFA, \code{condition} have no particular need to be stored inside a monitor, beyond any software engineering reasons. Here routine \code{foo} waits for the \code{signal} from \code{bar} before making further progress, effectively ensuring a basic ordering. This semantic can easily be extended to multi-monitor calls by offering the same guarantee.
+Note that in \CFA, \code{condition} have no particular need to be stored inside a monitor, beyond any software engineering reasons. Here routine \code{foo} waits for the \code{signal} from \code{bar} before making further progress, effectively ensuring a basic ordering. 
+
+As for simple mutual exclusion, these semantics must also be extended to include \gls{group-acquire} :
 \begin{center}
 \begin{tabular}{ c @{\hskip 0.65in} c }
 Thread 1 & Thread 2 \\
 \begin{lstlisting}
-void foo(monitor & mutex a,
-           monitor & mutex b) {
+void foo(A & mutex a,
+           A & mutex b) {
 	//...
 	wait(a.e);
@@ -382,6 +384,6 @@
 foo(a, b);
 \end{lstlisting} &\begin{lstlisting}
-void bar(monitor & mutex a,
-           monitor & mutex b) {
+void bar(A & mutex a,
+           A & mutex b) {
 	signal(a.e);
 }
@@ -393,168 +395,368 @@
 \end{tabular}
 \end{center}
-A direct extension of the single monitor semantics is to release all locks when waiting and transferring ownership of all locks when signalling. However, for the purpose of synchronization it may be usefull to only release some of the locks but keep others. It is possible to support internal scheduling and \gls{group-acquire} without any extra syntax by relying on order of acquisition. Here is an example of the different contexts in which internal scheduling can be used. (Note that here the use of helper routines is irrelevant, only routines acquire mutual exclusion have an impact on internal scheduling):
-
-\begin{center}
-\begin{tabular}{|c|c|c|}
-Context 1 & Context 2 & Context 3 \\
-\hline
-\begin{lstlisting}
-condition e;
-
-//acquire a & b
-void foo(monitor & mutex a,
-           monitor & mutex b) {
-
-	wait(e); //release a & b
-}
-
-
-
-
-
-
-foo(a,b);
-\end{lstlisting} &\begin{lstlisting}
-condition e;
-
-//acquire a
-void bar(monitor & mutex a,
-           monitor & nomutex b) {
-	foo(a,b);
-}
-
-//acquire a & b
-void foo(monitor & mutex a,
-           monitor & mutex b) {
-	wait(e);  //release a & b
-}
-
-bar(a, b);
-\end{lstlisting} &\begin{lstlisting}
-condition e;
-
-//acquire a
-void bar(monitor & mutex a,
-           monitor & nomutex b) {
-	baz(a,b);
-}
-
-//acquire b
-void baz(monitor & nomutex a,
-           monitor & mutex b) {
-	wait(e);  //release b
-}
-
-bar(a, b);
+
+To define the semantics of internal scheduling, it is important to look at nesting and \gls{group-acquire}. Indeed, beyond concerns about lock ordering, without scheduling the two following pseudo codes are mostly equivalent. In fact, if we assume monitors are ordered alphabetically, these two pseudo codes would probably lead to exactly the same implementation :
+
+\begin{table}[h!]
+\centering
+\begin{tabular}{c c}
+\begin{lstlisting}[language=pseudo]
+monitor A, B, C
+
+acquire A
+	acquire B & C
+
+			//Do stuff
+
+	release B & C
+release A
+\end{lstlisting} &\begin{lstlisting}[language=pseudo]
+monitor A, B, C
+
+acquire A
+	acquire B
+		acquire C
+			//Do stuff
+		release C
+	release B
+release A
 \end{lstlisting}
 \end{tabular}
-\end{center}
-
-Context 1 is the simplest way of acquiring more than one monitor (\gls{group-acquire}), using a routine with multiple parameters having the \code{mutex} keyword. Context 2 also uses \gls{group-acquire} as well in routine \code{foo}. However, the routine is called by routine \code{bar}, which only acquires monitor \code{a}. Since monitors can be acquired multiple times this does not cause a deadlock by itself but it does force the acquiring order to \code{a} then \code{b}. Context 3 also forces the acquiring order to be \code{a} then \code{b} but does not use \gls{group-acquire}. The previous example tries to illustrate the semantics that must be established to support releasing monitors in a \code{wait} statement. In all cases, the behavior of the wait statment is to release all the locks that were acquired my the inner-most monitor call. That is \code{a & b} in context 1 and 2 and \code{b} only in context 3. Here are a few other examples of this behavior.
-
-
-\begin{center}
-\begin{tabular}{|c|c|c|}
-\begin{lstlisting}
-condition e;
-
-//acquire b
-void foo(monitor & nomutex a,
-           monitor & mutex b) {
-	bar(a,b);
-}
-
-//acquire a
-void bar(monitor & mutex a,
-           monitor & nomutex b) {
-
-	wait(e); //release a
-	          //keep b
-}
-
-foo(a, b);
-\end{lstlisting} &\begin{lstlisting}
-condition e;
-
-//acquire a & b
-void foo(monitor & mutex a,
-           monitor & mutex b) {
-	bar(a,b);
-}
-
-//acquire b
-void bar(monitor & mutex a,
-           monitor & nomutex b) {
-
-	wait(e); //release b
-	          //keep a
-}
-
-foo(a, b);
-\end{lstlisting} &\begin{lstlisting}
-condition e;
-
-//acquire a & b
-void foo(monitor & mutex a,
-           monitor & mutex b) {
-	bar(a,b);
-}
-
-//acquire none
-void bar(monitor & nomutex a,
-           monitor & nomutex b) {
-
-	wait(e); //release a & b
-	          //keep none
-}
-
-foo(a, b);
+\end{table}
+
+Once internal scheduling is introduce however, semantics of \gls{group-acquire} become relevant. For example, let us look into the semantics of the following pseudo-code :
+
+\begin{lstlisting}[language=Pseudo]
+1: monitor A, B, C
+2: condition c1
+3: 
+4: acquire A
+5: 		acquire A & B & C
+6: 				signal c1
+7: 		release A & B & C
+8: release A
+\end{lstlisting}
+
+Without \gls{group-acquire} signal simply baton passes the monitor lock on the next release. In the case above, we therefore need to indentify the next release. If line 8 is picked at the release point, then the signal will attempt to pass A \& B \& C, without having ownership of B \& C. Since this violates mutual exclusion, we conclude that line 7 is the only valid location where signalling can occur. The traditionnal meaning of signalling is to transfer ownership of the monitor(s) and immediately schedule the longest waiting task. However, in the discussed case, the signalling thread expects to maintain ownership of monitor A. This can be expressed in two differents ways : 1) the thread transfers ownership of all locks and reacquires A when it gets schedulled again or 2) it transfers ownership of all three monitors and then expects the ownership of A to be transferred back. 
+
+However, the question is does these behavior motivate supporting acquireing non-disjoint set of monitors. Indeed, if the previous example was modified to only acquire B \& C at line 5 (an release the accordingly) then in respects to scheduling, we could add the simplifying constraint that all monitors in a bulk will behave the same way, simplifying the problem back to a single monitor problem which has already been solved. For this constraint to be acceptble however, we need to demonstrate that in does not prevent any meaningful possibilities. And, indeed, we can look at the two previous interpretation of the above pseudo-code and conclude that supporting the acquiring of non-disjoint set of monitors does not add any expressiveness to the language.
+
+Option 1 reacquires the lock after the signal statement, this can be rewritten as follows without the need for non-disjoint sets :
+\begin{lstlisting}[language=Pseudo]
+monitor A, B, C
+condition c1
+
+acquire A & B & C
+	signal c1
+release A & B & C
+acquire A
+
+release A
+\end{lstlisting}
+
+This pseudo code has almost exaclty the same semantics as the code acquiring intersecting sets of monitors. 
+
+Option 2 uses two-way lock ownership transferring instead of reacquiring monitor A. Two-way monitor ownership transfer is normally done using signalBlock semantics, which immedietely transfers ownership of a monitor before getting the ownership back when the other thread no longer needs the monitor. While the example pseudo-code for Option 2 seems toe transfer ownership of A, B and C and only getting A back, this is not a requirement. Getting back all 3 monitors and releasing B and C differs only in performance. For this reason, the second option could arguably be rewritten as : 
+
+\begin{lstlisting}[language=Pseudo]
+monitor A, B, C
+condition c1
+
+acquire A
+	acquire B & C
+		signalBlock c1
+	release B & C
+release A
+\end{lstlisting}
+
+Obviously, the difference between these two snippets of pseudo code is that the first one transfers ownership of A, B and C while the second one only transfers ownership of B and C. However, this limitation can be removed by allowing user to release extra monitors when using internal scheduling, referred to as extended internal scheduling (pattent pending) from this point on. Extended internal scheduling means the two following pseudo-codes are functionnaly equivalent : 
+\begin{table}[h!]
+\centering
+\begin{tabular}{c @{\hskip 0.65in} c}
+\begin{lstlisting}[language=pseudo]
+monitor A, B, C
+condition c1
+
+acquire A
+	acquire B & C
+		signalBlock c1 with A
+	release B & C
+release A
+\end{lstlisting} &\begin{lstlisting}[language=pseudo]
+monitor A, B, C
+condition c1
+
+acquire A
+	acquire A & B & C
+		signal c1
+	release A & B & C
+release A
 \end{lstlisting}
 \end{tabular}
-\end{center}
-Note the right-most example is actually a trick pulled on the reader. Monitor state information is stored in thread local storage rather then in the routine context, which means that helper routines and other \code{nomutex} routines are invisible to the runtime system in regards to concurrency. This means that in the right-most example, the routine parameters are completly unnecessary. However, calling this routine from outside a valid monitor context is undefined.
-
-These semantics imply that in order to release of subset of the monitors currently held, users must write (and name) a routine that only acquires the desired subset and simply calls wait. While users can use this method, \CFA offers the \code{wait_release}\footnote{Not sure if an overload of \code{wait} would work...} which will release only the specified monitors. In the center previous examples, the code in the center uses the \code{bar} routine to only release monitor \code{b}. Using the \code{wait_release} helper, this can be rewritten without having the name two routines :
-\begin{center}
-\begin{tabular}{ c c c }
-\begin{lstlisting}
-	condition e;
-
-	//acquire a & b
-	void foo(monitor & mutex a,
-	           monitor & mutex b) {
-		bar(a,b);
-	}
-
-	//acquire b
-	void bar(monitor & mutex a,
-	           monitor & nomutex b) {
-
-		wait(e); //release b
-		          //keep a
-	}
-
-	foo(a, b);
-\end{lstlisting} &\begin{lstlisting}
-	=>
-\end{lstlisting} &\begin{lstlisting}
-	condition e;
-
-	//acquire a & b
-	void foo(monitor & mutex a,
-	           monitor & mutex b) {
-		wait_release(e,b); //release b
-			                 //keep a
-	}
-
-	foo(a, b);
-\end{lstlisting}
-\end{tabular}
-\end{center}
-
-Regardless of the context in which the \code{wait} statement is used, \code{signal} must be called holding the same set of monitors. In all cases, signal only needs a single parameter, the condition variable that needs to be signalled. But \code{signal} needs to be called from the same monitor(s) that call to \code{wait}. Otherwise, mutual exclusion cannot be properly transferred back to the waiting monitor.
-
-Finally, an additional semantic which can be very usefull is the \code{signal_block} routine. This routine behaves like signal for all of the semantics discussed above, but with the subtelty that mutual exclusion is transferred to the waiting task immediately rather than wating for the end of the critical section.
-\\
+\end{table}
+
+It must be stated that the extended internal scheduling only makes sense when using wait and signalBlock, since they need to prevent barging, which cannot be done in the context of signal since the ownership transfer is strictly one-directionnal. 
+
+One critic that could arise is that extended internal schedulling is not composable since signalBlock must be explicitly aware of which context it is in. However, this argument is not relevant since acquire A, B and C in a context where a subset of them is already acquired cannot be achieved without spurriously releasing some locks or having an oracle aware of all monitors. Therefore, composability of internal scheduling is no more an issue than composability of monitors in general.
+
+The main benefit of using extended internal scheduling is that it offers the same expressiveness as intersecting monitor set acquiring but greatly simplifies the selection of a leader (or representative) for a group of monitor. Indeed, when using intersecting sets, it is not obvious which set intersects with other sets which means finding a leader representing only the smallest scope is a hard problem. Where as when using disjoint sets, any monitor that would be intersecting must be specified in the extended set, the leader can be chosen as any monitor in the primary set.
+
+% We need to make sure the semantics for internally scheduling N monitors are a natural extension of the single monitor semantics. For this reason, we introduce the concept of \gls{mon-ctx}. In terms of context internal scheduling means "releasing a \gls{mon-ctx} and waiting for an other thread to acquire the same \gls{mon-ctx} and baton-pass it back to the initial thread". This definitions requires looking into what a \gls{mon-ctx} is and what the semantics of waiting and baton-passing are.
+
+% \subsubsection{Internal scheduling: Context} \label{insched-context}
+% Monitor scheduling operations are defined in terms of the context they are in. In languages that only supports operations on a single monitor at once, the context is completly defined by which most recently acquired monitors. Indeed, acquiring several monitors will form a stack of monitors which will be released in FILO order. In \CFA, a \gls{mon-ctx} cannot be simply defined by the last monitor that was acquired since \gls{group-acquire} means multiple monitors can be "the last monitor acquired". The \gls{mon-ctx} is therefore defined as the last set of monitors to have been acquired. This means taht when any new monitor is acquired, the group it belongs to is the new \gls{mon-ctx}. Correspondingly, if any monitor is released, the \gls{mon-ctx} reverts back to the context that was used prior to the monitor being acquired. In the most common case, \gls{group-acquire} means every monitor of a group will be acquired in released at the same time. However, since every monitor has its own recursion level, \gls{group-acquire} does not prevent users from reacquiring certain monitors while acquireing new monitors in the same operation. For example :
+
+% \begin{lstlisting}
+% //Forward declarations
+% monitor a, b, c
+% void foo( monitor & mutex a, 
+%             monitor & mutex b);
+% void bar( monitor & mutex a, 
+%             monitor & mutex b);
+% void baz( monitor & mutex a, 
+%             monitor & mutex b, 
+%             monitor & mutex c);
+
+% //Routines defined inline to illustrate context changed compared to the stack
+
+% //main thread
+% foo(a, b) {
+% 	//thread calls foo
+% 	//acquiring context a & b
+
+% 	baz(a, b) {
+% 		//thread calls baz
+% 		//no context change
+
+% 		bar(a, b, c) {
+% 			//thread calls bar
+% 			//acquiring context a & b & c
+
+% 			//Do stuff
+
+% 			return;              
+% 			//call to bar returns
+% 		}
+% 		//context back to a & b
+
+% 		return;
+% 		//call to baz returns
+% 	}
+% 	//no context change
+
+% 	return;
+% 	//call to foo returns
+% }
+% //context back to initial state
+
+% \end{lstlisting}
+
+% As illustrated by the previous example, context changes can be caused by only one of the monitors comming into context or going out of context.
+
+% \subsubsection{Internal scheduling: Waiting} \label{insched-wait}
+
+% \subsubsection{Internal scheduling: Baton Passing} \label{insched-signal}
+% Baton passing in internal scheduling is done in terms of \code{signal} and \code{signalBlock}\footnote{Arguably, \code{signal_now} is a more evocative name and \code{signal} could be changed appropriately. }. While \code{signalBlock} is the more straight forward way of baton passing, transferring ownership immediately, it must rely on \code{signal} which is why t is discussed first.
+% \code{signal} has for effect to transfer the current context to another thread when the context would otherwise be released. This means that instead of releasing the concerned monitors, the first thread on the condition ready-queue is scheduled to run. The monitors are not released and when the signalled thread runs, it assumes it regained ownership of all the monitors it had in its context.
+
+% \subsubsection{Internal scheduling: Implementation} \label{insched-impl}
+% Too implement internal scheduling, three things are need : a data structure for waiting tasks, a data structure for signalled task and a leaving procedure to run the signalled task. In the case of both data structures, it is desireable to have to use intrusive data structures in order to prevent the need for any dynamic allocation. However, in both cases being able to queue several items in the same position in a queue is non trivial, even more so in the presence of concurrency. However, within a given \gls{mon-ctx}, all monitors have exactly the same behavior in regards to scheduling. Therefore, the problem of queuing multiple monitors at once can be ignored by choosing one monitor to represent every monitor in a context. While this could prove difficult in other situations, \gls{group-acquire} requires that the monitors be sorted according to some stable predicate. Since monitors are sorted in all contexts, the representative can simply be the first in the list. Choosing a representative means a simple intrusive queue inside the condition is sufficient to implement the data structure for both waiting and signalled monitors. 
+
+% Since \CFA monitors don't have a complete image of the \gls{mon-ctx}, choosing the representative and maintaning the current context information cannot easily be done by any single monitors. However, as discussed in section [Missing section here], monitor mutual exclusion is implemented using an raii object which is already in charge of sorting monitors. This object has a complete picture of the \gls{mon-ctx} which means it is well suited to choose the reprensentative and detect context changes. 
+
+% \newpage
+% \begin{lstlisting}
+% void ctor( monitor ** _monitors, int _count ) {
+% 	bool ctx_changed = false;
+% 	for( mon in _monitors ) {
+% 		ctx_changed = acquire( mon ) || ctx_changed;
+% 	}
+
+% 	if( ctx_changed ) {
+% 		set_representative();
+% 		set_context();
+% 	}
+% }
+
+% void dtor( monitor ** _monitors, int _count ) {
+% 	if( context_will_exit( _monitors, count ) ) {
+% 		baton_pass();
+% 		return;
+% 	}
+
+% 	for( mon in _monitors ) {
+% 		release( mon );
+% 	}
+% }
+
+% \end{lstlisting}
+
+
+
+% A direct extension of the single monitor semantics is to release all locks when waiting and transferring ownership of all locks when signalling. However, for the purpose of synchronization it may be usefull to only release some of the locks but keep others. It is possible to support internal scheduling and \gls{group-acquire} without any extra syntax by relying on order of acquisition. Here is an example of the different contexts in which internal scheduling can be used. (Note that here the use of helper routines is irrelevant, only routines acquire mutual exclusion have an impact on internal scheduling):
+
+% \begin{table}[h!]
+% \centering
+% \begin{tabular}{|c|c|c|}
+% Context 1 & Context 2 & Context 3 \\
+% \hline
+% \begin{lstlisting}
+% condition e;
+
+% //acquire a & b
+% void foo(monitor & mutex a,
+%            monitor & mutex b) {
+
+% 	wait(e); //release a & b
+% }
+
+
+
+
+
+
+% foo(a,b);
+% \end{lstlisting} &\begin{lstlisting}
+% condition e;
+
+% //acquire a
+% void bar(monitor & mutex a,
+%            monitor & nomutex b) {
+% 	foo(a,b);
+% }
+
+% //acquire a & b
+% void foo(monitor & mutex a,
+%            monitor & mutex b) {
+% 	wait(e);  //release a & b
+% }
+
+% bar(a, b);
+% \end{lstlisting} &\begin{lstlisting}
+% condition e;
+
+% //acquire a
+% void bar(monitor & mutex a,
+%            monitor & nomutex b) {
+% 	baz(a,b);
+% }
+
+% //acquire b
+% void baz(monitor & nomutex a,
+%            monitor & mutex b) {
+% 	wait(e);  //release b
+% }
+
+% bar(a, b);
+% \end{lstlisting}
+% \end{tabular}
+% \end{table}
+
+% Context 1 is the simplest way of acquiring more than one monitor (\gls{group-acquire}), using a routine with multiple parameters having the \code{mutex} keyword. Context 2 also uses \gls{group-acquire} as well in routine \code{foo}. However, the routine is called by routine \code{bar}, which only acquires monitor \code{a}. Since monitors can be acquired multiple times this does not cause a deadlock by itself but it does force the acquiring order to \code{a} then \code{b}. Context 3 also forces the acquiring order to be \code{a} then \code{b} but does not use \gls{group-acquire}. The previous example tries to illustrate the semantics that must be established to support releasing monitors in a \code{wait} statement. In all cases, the behavior of the wait statment is to release all the locks that were acquired my the inner-most monitor call. That is \code{a & b} in context 1 and 2 and \code{b} only in context 3. Here are a few other examples of this behavior.
+
+
+% \begin{center}
+% \begin{tabular}{|c|c|c|}
+% \begin{lstlisting}
+% condition e;
+
+% //acquire b
+% void foo(monitor & nomutex a,
+%            monitor & mutex b) {
+% 	bar(a,b);
+% }
+
+% //acquire a
+% void bar(monitor & mutex a,
+%            monitor & nomutex b) {
+
+% 	wait(e); //release a
+% 	          //keep b
+% }
+
+% foo(a, b);
+% \end{lstlisting} &\begin{lstlisting}
+% condition e;
+
+% //acquire a & b
+% void foo(monitor & mutex a,
+%            monitor & mutex b) {
+% 	bar(a,b);
+% }
+
+% //acquire b
+% void bar(monitor & mutex a,
+%            monitor & nomutex b) {
+
+% 	wait(e); //release b
+% 	          //keep a
+% }
+
+% foo(a, b);
+% \end{lstlisting} &\begin{lstlisting}
+% condition e;
+
+% //acquire a & b
+% void foo(monitor & mutex a,
+%            monitor & mutex b) {
+% 	bar(a,b);
+% }
+
+% //acquire none
+% void bar(monitor & nomutex a,
+%            monitor & nomutex b) {
+
+% 	wait(e); //release a & b
+% 	          //keep none
+% }
+
+% foo(a, b);
+% \end{lstlisting}
+% \end{tabular}
+% \end{center}
+% Note the right-most example is actually a trick pulled on the reader. Monitor state information is stored in thread local storage rather then in the routine context, which means that helper routines and other \code{nomutex} routines are invisible to the runtime system in regards to concurrency. This means that in the right-most example, the routine parameters are completly unnecessary. However, calling this routine from outside a valid monitor context is undefined.
+
+% These semantics imply that in order to release of subset of the monitors currently held, users must write (and name) a routine that only acquires the desired subset and simply calls wait. While users can use this method, \CFA offers the \code{wait_release}\footnote{Not sure if an overload of \code{wait} would work...} which will release only the specified monitors. In the center previous examples, the code in the center uses the \code{bar} routine to only release monitor \code{b}. Using the \code{wait_release} helper, this can be rewritten without having the name two routines :
+% \begin{center}
+% \begin{tabular}{ c c c }
+% \begin{lstlisting}
+% 	condition e;
+
+% 	//acquire a & b
+% 	void foo(monitor & mutex a,
+% 	           monitor & mutex b) {
+% 		bar(a,b);
+% 	}
+
+% 	//acquire b
+% 	void bar(monitor & mutex a,
+% 	           monitor & nomutex b) {
+
+% 		wait(e); //release b
+% 		          //keep a
+% 	}
+
+% 	foo(a, b);
+% \end{lstlisting} &\begin{lstlisting}
+% 	=>
+% \end{lstlisting} &\begin{lstlisting}
+% 	condition e;
+
+% 	//acquire a & b
+% 	void foo(monitor & mutex a,
+% 	           monitor & mutex b) {
+% 		wait_release(e,b); //release b
+% 			                 //keep a
+% 	}
+
+% 	foo(a, b);
+% \end{lstlisting}
+% \end{tabular}
+% \end{center}
+
+% Regardless of the context in which the \code{wait} statement is used, \code{signal} must be called holding the same set of monitors. In all cases, signal only needs a single parameter, the condition variable that needs to be signalled. But \code{signal} needs to be called from the same monitor(s) that call to \code{wait}. Otherwise, mutual exclusion cannot be properly transferred back to the waiting monitor.
+
+% Finally, an additional semantic which can be very usefull is the \code{signal_block} routine. This routine behaves like signal for all of the semantics discussed above, but with the subtelty that mutual exclusion is transferred to the waiting task immediately rather than wating for the end of the critical section.
+% \\
 
 % ####### #     # #######         #####   #####  #     # ####### ######
Index: doc/proposals/concurrency/glossary.tex
===================================================================
--- doc/proposals/concurrency/glossary.tex	(revision 599511651bb05bbe1e34d9f88513826e4eb437ff)
+++ doc/proposals/concurrency/glossary.tex	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
@@ -14,7 +14,13 @@
 
 \longnewglossaryentry{group-acquire}
-{name={bulked acquiring}}
+{name={bulk acquiring}}
 {
 Implicitly acquiring several monitors when entering a monitor.
+}
+
+\longnewglossaryentry{mon-ctx}
+{name={monitor context}}
+{
+The state of the current thread regarding which monitors are owned.
 }
 
Index: doc/proposals/concurrency/lit-review.md
===================================================================
--- doc/proposals/concurrency/lit-review.md	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
+++ doc/proposals/concurrency/lit-review.md	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
@@ -0,0 +1,25 @@
+lit review :
+
+Lister77 : nested monitor calls
+	- explains the problem
+	- no solution
+	- Lister : An implementation of monitors.
+	- Lister : Hierarchical monitors.
+
+Haddon77 : Nested monitor calls
+	- monitors should be release before acquiring a new one.
+
+Horst Wettstein : The problem of nested monitor calls revisited
+	- Solves nested monitor by allowing barging
+
+David L. Parnas : The non problem of nesied monitor calls
+	- not an actual problem in real life
+
+M. Joseph and VoR. Prasad : More on nested monitor call
+	- WTF... don't use monitors, use pure classes instead, whatever that is
+
+Joseph et al, 1978). 
+
+Toby bloom : Evaluating Synchronization Mechanisms
+	- Methods to evaluate concurrency
+
Index: doc/proposals/concurrency/notes.md
===================================================================
--- doc/proposals/concurrency/notes.md	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
+++ doc/proposals/concurrency/notes.md	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
@@ -0,0 +1,14 @@
+Internal scheduling notes.
+
+Internal scheduling requires a stack or queue to make sense.
+We also need a stack of "monitor contexts" to be able to restuore stuff.
+
+Multi scheduling try 1 
+ - adding threads to many monitors and synching the monitors
+ - Too hard
+
+Multi scheduling try 2
+ - using a leader when in a group
+ - it's hard but doable to manage who is the leader and keep the current context
+ - basically __monitor_guard_t always saves an restore the leader and current context
+ 
Index: doc/proposals/concurrency/style.tex
===================================================================
--- doc/proposals/concurrency/style.tex	(revision 599511651bb05bbe1e34d9f88513826e4eb437ff)
+++ doc/proposals/concurrency/style.tex	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
@@ -1,3 +1,5 @@
 \input{common}                                          % bespoke macros used in the document
+
+\CFADefaultStyle
 
 \lstset{
Index: doc/proposals/concurrency/version
===================================================================
--- doc/proposals/concurrency/version	(revision 599511651bb05bbe1e34d9f88513826e4eb437ff)
+++ doc/proposals/concurrency/version	(revision 03bb816f896612125c8f1acd78ef46603e99f236)
@@ -1,1 +1,1 @@
-0.7.61
+0.7.134