Index: doc/proposals/concurrency/concurrency.tex
===================================================================
--- doc/proposals/concurrency/concurrency.tex	(revision 694ee7d76476b574061b8fcd6fa52e8f67657568)
+++ doc/proposals/concurrency/concurrency.tex	(revision 7b6917426c5c0615b667c4f0eee3ebb4dfcaea00)
@@ -23,4 +23,5 @@
 \usepackage{xspace}
 \usepackage{graphicx}
+\usepackage{tabularx}
 \usepackage{varioref}									% extended references
 \usepackage{listings}									% format program code
@@ -195,5 +196,5 @@
 
 \subsubsection{Internal scheduling}
-Monitors should also be able to do some sort of synchronization to be able to somewhat schedule what threads access it. Internal scheduling is one of the simple examples of such a feature. It allows users to declare condition variables and wait for them to be signaled. Here is a simple example of such a technique :
+Monitors should also be able to schedule what threads access it as a mean of synchronization. Internal scheduling is one of the simple examples of such a feature. It allows users to declare condition variables and wait for them to be signaled. Here is a simple example of such a technique :
 
 \begin{lstlisting}
@@ -213,147 +214,205 @@
 \end{lstlisting}
 
-Here routine \texttt{foo} will wait on the \texttt{signal} from \texttt{bar} before making further progress, effectively ensuring a basic ordering. However, nothing prevents users from miss-using this syntax and therefore some additionnal protection would be useful. For example, if \texttt{bar} was rewritten as follows:
+Here routine \texttt{foo} waits on the \texttt{signal} from \texttt{bar} before making further progress, effectively ensuring a basic ordering. This can easily be extended to multi-monitor calls by offering the same guarantee.
 
 \begin{tabular}{ c c }
 Thread 1 & Thread 2 \\
 \begin{lstlisting}
-	void foo(monitor& mutex m) {
-		//...
-		wait(m.e);
-		//...
-	}
-
-	foo(a);
-\end{lstlisting}&\begin{lstlisting}
-	void bar(monitor& mutex b, condition& e) {
-		signal(e);
-	}
-
-
-
-	bar(b, a.e);
+void foo(monitor& mutex a, monitor& mutex b) {
+	//...
+	wait(a.e);
+	//...
+}
+
+foo(a, b);
+\end{lstlisting}&\begin{lstlisting}
+void bar(monitor& mutex a, monitor& mutex b) {
+	signal(a.e);
+}
+
+
+
+bar(a, b);
 \end{lstlisting}
 \end{tabular}
 \\
 
-In this example, thread 2 tries to \texttt{signal} a condition variable for which it did not acquire the lock. There are at least two solutions to this problem. Either the wait routine tries to reacquire every needed lock upon exit or the signaller must implicitly transfer lock ownership to the signalled task. The first case can be easily implemented by hand and does not prevent any barging and therefore the second approach is preferred. This effectively means that condition variables need to be both aware of the locks used by the waiting task and the signaller. However, before we look at what this lock awareness means we need to look at another example to properly grasp the problem.
-
-\begin{tabular}{ c c }
-Thread 1 & Thread 2 \\
-\begin{lstlisting}
+A direct extension of the single monitor semantics would be to release all locks when waiting and transferring ownership of all locks when signalling. However, for the purpose of synchronization it may be usefull to only release some of the locks but keep others. On the technical side, partially releasing lock is feasible but from the user perspective a choice must be made for the syntax of this feature. It is possible to do without any extra syntax by relying on order of acquisition :
+
+\begin{tabular}{|c|c|c|}
+Context 1 & Context 2 & Context 3 \\
+\hline
+\begin{lstlisting}
+void foo(monitor& mutex a,
+         monitor& mutex b) {
+	wait(a.e);
+}
+
+
+
+
+
+
+foo(a,b);
+\end{lstlisting}&\begin{lstlisting}
+void bar(monitor& mutex a,
+         monitor& nomutex b) {
+	foo(a,b);
+}
+
+void foo(monitor& mutex a,
+         monitor& mutex b) {
+	wait(a.e);
+}
+
+bar(a, b);
+\end{lstlisting}&\begin{lstlisting}
+void bar(monitor& mutex a,
+         monitor& nomutex b) {
+	foo(a,b);
+}
+
+void baz(monitor& nomutex a,
+         monitor& mutex b) {
+	wait(a.e);
+}
+
+bar(a, b);
+\end{lstlisting}
+\end{tabular}
+\\
+
+This can be interpreted in two different ways.
+\begin{enumerate}
+	\item \texttt{wait} atomically releases the monitors \underline{theoretically} acquired by the inner-most mutex routine.
+	\item \texttt{wait} atomically releases the monitors \underline{actually} acquired by the inner-most mutex routine.
+\end{enumerate}
+While the difference between these two is subtle, it has a significant impact. In the first case it means that the calls to \texttt{foo} would behave the same in Context 1 and 2. This semantic would also mean that the call to \texttt{wait} in routine \texttt{baz} would only release \texttt{monitor b}. While this may seem intuitive with these examples, it does have one significant implication, it creates a strong distinction between acquiring multiple monitors in sequence and acquiring the same monitors simulatenously.
+
+\begin{center}
+\begin{tabular}{c c c}
+\begin{lstlisting}
+enterMonitor(a);
+enterMonitor(b);
+// do stuff
+leaveMonitor(b);
+leaveMonitor(a);
+\end{lstlisting}& != &\begin{lstlisting}
+enterMonitor(a);
+enterMonitor(a, b);
+// do stuff
+leaveMonitor(a, b);
+leaveMonitor(a);
+\end{lstlisting}
+\end{tabular}
+\end{center}
+
+This is not intuitive because even if both methods will display the same monitors state both inside and outside the critical section respectively, the behavior is different. Furthermore, the actual acquiring order will be exaclty the same since acquiring a monitor from inside its mutual exclusion is a no-op. This means that even if the data and the actual control flow are the same using both methods, the behavior of the \texttt{wait} will be different. The alternative is option 2, that is releasing \underline{actually} acquired monitors. This solves the issue of having the two acquiring method differ at the cost of making routine \texttt{foo} behave differently depending on from which context it is called (Context 1 or 2). Indeed in Context 2, routine \texttt{foo} will actually behave like routine \texttt{baz} rather than having the same behavior than in context 1. The fact that both implicit approaches can be unintuitive depending on the perspective may be a sign that the explicit approach is superior.
+\\
+
+The following examples shows three alternatives of explicit wait semantics :
+\\
+
+\begin{tabular}{|c|c|c|}
+Case 1 & Case 2 & Case 3 \\
+Branding on construction & Explicit release list & Explicit ignore list \\
+\hline
+\begin{lstlisting}
+void foo(monitor& mutex a,
+         monitor& mutex b,
+	   condition& c)
+{
+	// Releases monitors
+	// branded on construction
+	wait(c);
+}
+
+monitor a;
+monitor b;
+condition1 c1 = {a};
+condition2 c2 = {a, b};
+
+//Will release only a
+foo(a,b,c1);
+
+//Will release a and b
+foo(a,b,c2);
+\end{lstlisting}&\begin{lstlisting}
+void foo(monitor& mutex a,
+         monitor& mutex b,
+	   condition& c)
+{
+	// Releases monitor a
+	// Holds monitor b
+	waitRelease(c, [a]);
+}
+
+monitor a;
+monitor b;
+condition c;
+
+
+
+foo(a,b,c);
+
+
+
+\end{lstlisting}&\begin{lstlisting}
+void foo(monitor& mutex a,
+         monitor& mutex b,
+	   condition& c)
+{
+	// Releases monitor a
+	// Holds monitor b
+	waitHold(c, [b]);
+}
+
+monitor a;
+monitor b;
+condition c;
+
+
+
+foo(a,b,c);
+
+
+
+\end{lstlisting}
+\end{tabular}
+
+(Note : Case 2 and 3 use tuple semantics to pass a variable length list of elements.)
+\\
+
+All these cases have there pros and cons. Case 1 is more distinct because it means programmers need to be carefull about where the condition was initialized as well as where it is used. On the other hand, it is very clear and explicit which monitor will be released and which monitor will stay acquired. This is similar to Case 2, which releases only the monitors explictly listed. However, in Case 2, calling the \texttt{wait} routine instead of the \texttt{waitRelease} routine will release all the acquired monitor. The Case 3 is an improvement on that since it releases all the monitors except those specified. The result is that the \texttt{wait} routine can be written as follows :
+\begin{lstlisting}
+void wait(condition& cond) {
+	waitHold(cond, []);
+}
+\end{lstlisting}
+This alternative offers nice and consistent behavior between \texttt{wait} and \texttt{waitHold}. However, one large pitfall is that mutual exclusion can now be violated by calls to library code. Indeed, even if the following example seems benign there is one significant problem :
+\begin{lstlisting}
+extern void doStuff();
+
 void foo(monitor& mutex m) {
 	//...
-	wait(m.e);
+	doStuff(); //warning can release monitor m
 	//...
 }
-
-foo(a);
-\end{lstlisting}&\begin{lstlisting}
-void bar(monitor& mutex a, monitor& mutex b) {
-	signal(a.e);
-}
-
-
-
-bar(a, b);
-\end{lstlisting}
-\end{tabular}
-\\
-
-Here, the issue is that even if thread 2 does hold the proper lock, it also holds an extra lock that must be delt with. The proposed solution is to make 2 changes to the condition variable declaration. First, the condition variable should be constructed with a reference to the monitor it syncrhonizes :
-
-\begin{lstlisting}
-	mutex struct A {
-		condition e;
-	}
-
-	void ?{}(A& this) {
-		&e{this};
-	}
-
-	void foo(A& mutex a) {
-		//...
-		wait(a.e);
-		//...
-	}
-
-	void bar(A& mutex a) {
-		signal(a.e);
-	}
-\end{lstlisting}
-
-By explicitly tying a condition variable to a particular monitor it is possible for the run-time to know which monitor needs to be signaled. This also enables run-time check to make sure that the proper context is acquired before trying to \texttt{signal} a condition variable. In this case, run time checks are probably sufficient since \texttt{signal} should be used inside a critical section and even though multi-threading applications are often non-deterministic, the inside of critical sections should be relatively reliable. This implementation of the condition variable object also means that the context of the dual monitor routine, the routine will hold-on to the monitor that is not referenced by the condition variable, i.e. :
-
-\begin{tabular}{ c c }
-Thread 1 & Thread 2 \\
-\begin{lstlisting}
-void foo(monitor& mutex a,
-	 monitor& mutex b) {
-	//...
-	wait(a.e); //releases a, holds b
-	//...
-}
-
-foo(a, b);
-\end{lstlisting}&\begin{lstlisting}
-void bar(monitor& mutex a) {
-	signal(a.e);
-}
-
-
-
-
-bar(a);
-\end{lstlisting}
-\end{tabular}
-\\
-
-The second aspect to this solution is the support for multi-monitor condition variables :
-\begin{lstlisting}
-monitor m1;
-monitor m2;
-condition2 e = {m1, m2};
-\end{lstlisting}
-\begin{tabular}{ c c }
-Thread 1 & Thread 2 \\
-\begin{lstlisting}
-void foo(monitor& mutex a, monitor& mutex b) {
-	//...
-	wait(e); //releases a & b
-	//...
-}
-
-foo(a, b);
-\end{lstlisting}&\begin{lstlisting}
-void bar(monitor& mutex a, monitor& mutex b) {
-	signal(e);
-}
-
-
-
-bar(a, b);
-\end{lstlisting}
-\end{tabular}
-\\
-
-Notice here that the type used for the condition variable (\texttt{condition2}) explicitly states the number of monitors that will be synchronized at compile time. The risk with this condition variable semantics is that the user must be in a context where all monitors were properly acquired before waiting/signalling. This can be enforced by run-time checks but would be very difficult to statically enforce. An option that can be used to alleviate this risk is to have the signal routine acquire the monitors that were used to brand the condition variable. This guarantees that the proper locks will be transferred to the signaled but does inherit the risks the come with acquiring multiple locks some of the locks were already acquired.
-This would lead to an API similar to this :
-\begin{lstlisting}
-	//default ctor which brands the condition variable on construction
-	void ?{}(condition* this, monitor& brand);
-	void ^?{}(condition* this);
-
-	//copying an condition variable is forbidden
-	void ?{}(condition* this, const condition& other) = delete;
-	void ?=?(condition* this, const condition& other) = delete;
-
-	//releases branded locks and waits for signal
-	void wait(condition* this);
-
-	//acquires branded locks and transfers them to the signalled task
-	//(upon exit for signal and dirrectly for signalBlock)
-	void signal(condition* this);
-	void signalBlock(condition* this);
-\end{lstlisting}
+\end{lstlisting}
+Indeed, if Case 2 or 3 are chosen it any code can violate the mutual exclusion of calling code by issuing calls to \texttt{wait} or \texttt{waitHold} in a nested monitor context. Case 2 can be salvaged by removing the \texttt{wait} routine from the API but Case 3 cannot prevent users from calling \texttt{waitHold(someCondition, [])}. For this reason the syntax proposed in Case 3 is rejected. Note that syntaxes proposed in case 1 and 2 are not exclusive. Indeed, by supporting two types of condition as follows both cases can be supported :
+\begin{lstlisting}
+struct condition { /*...*/ };
+
+void wait(condition& cond, [...] monitorsToRelease); // Second argument is a variable length tuple.
+void signal(condition& cond);
+
+struct conditionN { /*...*/ };
+
+void ?{}(conditionN* this, /*list of N monitors to release*/);
+void wait(conditionN& cond);
+void signal(conditionN& cond);
+\end{lstlisting}
+
+Regardless of the option chosen for wait semantics, signal must be symmetrical. In all cases, signal only needs a single parameter, the condition variable that needs to be signalled. But \texttt{signal} needs to be called from the same monitor(s) than the call to \texttt{wait}. Otherwise, mutual exclusion cannot be properly transferred back to the waiting monitor.
 
 \subsection{External scheduling}