Index: doc/papers/concurrency/Paper.tex
===================================================================
--- doc/papers/concurrency/Paper.tex	(revision 332d3c2b316eac5592b52393f9afbc00ef2439c5)
+++ doc/papers/concurrency/Paper.tex	(revision e04aec4e705020aa2c15b21c11c741bda20cc7f3)
@@ -271,5 +271,5 @@
 Hence, there are two problems to be solved: concurrency and parallelism.
 While these two concepts are often combined, they are distinct, requiring different tools~\cite[\S~2]{Buhr05a}.
-Concurrency tools handle synchronization and mutual exclusion, while parallelism tools handle performance, cost and resource utilization.
+Concurrency tools handle mutual exclusion and synchronization, while parallelism tools handle performance, cost, and resource utilization.
 
 The proposed concurrency API is implemented in a dialect of C, called \CFA.
@@ -282,7 +282,7 @@
 Extended versions and explanation of the following code examples are available at the \CFA website~\cite{Cforall} or in Moss~\etal~\cite{Moss18}.
 
-\CFA is an extension of ISO-C, and hence, supports all C paradigms.
+\CFA is a non-object-oriented extension of ISO-C, and hence, supports all C paradigms.
 %It is a non-object-oriented system-language, meaning most of the major abstractions have either no runtime overhead or can be opted out easily.
-Like C, the basics of \CFA revolve around structures and routines.
+Like C, the building blocks of \CFA are structures and routines.
 Virtually all of the code generated by the \CFA translator respects C memory layouts and calling conventions.
 While \CFA is not an object-oriented language, lacking the concept of a receiver (\eg @this@) and nominal inheritance-relationships, C does have a notion of objects: ``region of data storage in the execution environment, the contents of which can represent values''~\cite[3.15]{C11}.
@@ -296,5 +296,5 @@
 int x = 1, y = 2, z = 3;
 int * p1 = &x, ** p2 = &p1,  *** p3 = &p2,	$\C{// pointers to x}$
-	`&` r1 = x,  `&&` r2 = r1,  `&&&` r3 = r2;	$\C{// references to x}$
+    `&` r1 = x,   `&&` r2 = r1,   `&&&` r3 = r2;	$\C{// references to x}$
 int * p4 = &z, `&` r4 = z;
 
@@ -411,8 +411,7 @@
 \end{cquote}
 Overloading is important for \CFA concurrency since the runtime system relies on creating different types to represent concurrency objects.
-Therefore, overloading is necessary to prevent the need for long prefixes and other naming conventions to prevent name clashes.
+Therefore, overloading eliminates long prefixes and other naming conventions to prevent name clashes.
 As seen in Section~\ref{basics}, routine @main@ is heavily overloaded.
-
-Variable overloading is useful in the parallel semantics of the @with@ statement for fields with the same name:
+For example, variable overloading is useful in the parallel semantics of the @with@ statement for fields with the same name:
 \begin{cfa}
 struct S { int `i`; int j; double m; } s;
@@ -428,5 +427,5 @@
 }
 \end{cfa}
-For parallel semantics, both @s.i@ and @t.i@ are visible the same type, so only @i@ is ambiguous without qualification.
+For parallel semantics, both @s.i@ and @t.i@ are visible with the same type, so only @i@ is ambiguous without qualification.
 
 
@@ -468,48 +467,4 @@
 \end{cquote}
 While concurrency does not use operator overloading directly, it provides an introduction for the syntax of constructors.
-
-
-\subsection{Parametric Polymorphism}
-\label{s:ParametricPolymorphism}
-
-The signature feature of \CFA is parametric-polymorphic routines~\cite{} with routines generalized using a @forall@ clause (giving the language its name), which allow separately compiled routines to support generic usage over multiple types.
-For example, the following sum routine works for any type that supports construction from 0 and addition:
-\begin{cfa}
-forall( otype T | { void `?{}`( T *, zero_t ); T `?+?`( T, T ); } ) // constraint type, 0 and +
-T sum( T a[$\,$], size_t size ) {
-	`T` total = { `0` };					$\C{// initialize by 0 constructor}$
-	for ( size_t i = 0; i < size; i += 1 )
-		total = total `+` a[i];				$\C{// select appropriate +}$
-	return total;
-}
-S sa[5];
-int i = sum( sa, 5 );						$\C{// use S's 0 construction and +}$
-\end{cfa}
-
-\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each routine declaration:
-\begin{cfa}
-trait `sumable`( otype T ) {
-	void `?{}`( T &, zero_t );				$\C{// 0 literal constructor}$
-	T `?+?`( T, T );						$\C{// assortment of additions}$
-	T ?+=?( T &, T );
-	T ++?( T & );
-	T ?++( T & );
-};
-forall( otype T `| sumable( T )` )			$\C{// use trait}$
-T sum( T a[$\,$], size_t size );
-\end{cfa}
-
-Assertions can be @otype@ or @dtype@.
-@otype@ refers to a ``complete'' object, \ie an object has a size, default constructor, copy constructor, destructor and an assignment operator.
-@dtype@ only guarantees an object has a size and alignment.
-
-Using the return type for discrimination, it is possible to write a type-safe @alloc@ based on the C @malloc@:
-\begin{cfa}
-forall( dtype T | sized(T) ) T * alloc( void ) { return (T *)malloc( sizeof(T) ); }
-int * ip = alloc();							$\C{// select type and size from left-hand side}$
-double * dp = alloc();
-struct S {...} * sp = alloc();
-\end{cfa}
-where the return type supplies the type/size of the allocation, which is impossible in most type systems.
 
 
@@ -540,6 +495,6 @@
 \CFA also provides @new@ and @delete@, which behave like @malloc@ and @free@, in addition to constructing and destructing objects:
 \begin{cfa}
-{	struct S s = {10};						$\C{// allocation, call constructor}$
-	...
+{
+	... struct S s = {10}; ...				$\C{// allocation, call constructor}$
 }											$\C{// deallocation, call destructor}$
 struct S * s = new();						$\C{// allocation, call constructor}$
@@ -547,5 +502,50 @@
 delete( s );								$\C{// deallocation, call destructor}$
 \end{cfa}
-\CFA concurrency uses object lifetime as a means of synchronization and/or mutual exclusion.
+\CFA concurrency uses object lifetime as a means of mutual exclusion and/or synchronization.
+
+
+\subsection{Parametric Polymorphism}
+\label{s:ParametricPolymorphism}
+
+The signature feature of \CFA is parametric-polymorphic routines~\cite{} with routines generalized using a @forall@ clause (giving the language its name), which allow separately compiled routines to support generic usage over multiple types.
+For example, the following sum routine works for any type that supports construction from 0 and addition:
+\begin{cfa}
+forall( otype T | { void `?{}`( T *, zero_t ); T `?+?`( T, T ); } ) // constraint type, 0 and +
+T sum( T a[$\,$], size_t size ) {
+	`T` total = { `0` };					$\C{// initialize by 0 constructor}$
+	for ( size_t i = 0; i < size; i += 1 )
+		total = total `+` a[i];				$\C{// select appropriate +}$
+	return total;
+}
+S sa[5];
+int i = sum( sa, 5 );						$\C{// use S's 0 construction and +}$
+\end{cfa}
+The builtin type @zero_t@ (and @one_t@) overload constant 0 (and 1) for a new types, where both 0 and 1 have special meaning in C.
+
+\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each routine declaration:
+\begin{cfa}
+trait `sumable`( otype T ) {
+	void `?{}`( T &, zero_t );				$\C{// 0 literal constructor}$
+	T `?+?`( T, T );						$\C{// assortment of additions}$
+	T ?+=?( T &, T );
+	T ++?( T & );
+	T ?++( T & );
+};
+forall( otype T `| sumable( T )` )			$\C{// use trait}$
+T sum( T a[$\,$], size_t size );
+\end{cfa}
+
+Assertions can be @otype@ or @dtype@.
+@otype@ refers to a ``complete'' object, \ie an object has a size, default constructor, copy constructor, destructor and an assignment operator.
+@dtype@ only guarantees an object has a size and alignment.
+
+Using the return type for discrimination, it is possible to write a type-safe @alloc@ based on the C @malloc@:
+\begin{cfa}
+forall( dtype T | sized(T) ) T * alloc( void ) { return (T *)malloc( sizeof(T) ); }
+int * ip = alloc();							$\C{// select type and size from left-hand side}$
+double * dp = alloc();
+struct S {...} * sp = alloc();
+\end{cfa}
+where the return type supplies the type/size of the allocation, which is impossible in most type systems.
 
 
@@ -727,11 +727,7 @@
 
 Using a coroutine, it is possible to express the Fibonacci formula directly without any of the C problems.
-Figure~\ref{f:Coroutine3States} creates a @coroutine@ type:
-\begin{cfa}
-`coroutine` Fib { int fn; };
-\end{cfa}
-which provides communication, @fn@, for the \newterm{coroutine main}, @main@, which runs on the coroutine stack, and possibly multiple interface routines @next@.
+Figure~\ref{f:Coroutine3States} creates a @coroutine@ type, @`coroutine` Fib { int fn; }@, which provides communication, @fn@, for the \newterm{coroutine main}, @main@, which runs on the coroutine stack, and possibly multiple interface routines, \eg @next@.
 Like the structure in Figure~\ref{f:ExternalState}, the coroutine type allows multiple instances, where instances of this type are passed to the (overloaded) coroutine main.
-The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's resume.
+The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's @resume@.
 The interface routine @next@, takes a Fibonacci instance and context switches to it using @resume@;
 on restart, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
@@ -843,6 +839,6 @@
 \end{figure}
 
-The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming routine for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal routines
-However, there is no stack growth because @resume@/@suspend@ context switch to existing stack-frames rather than create new ones.
+The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming routine for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal routines.
+However,@resume@/@suspend@ context switch to existing stack-frames rather than create new ones so there is no stack growth.
 \newterm{Symmetric (full) coroutine}s have a coroutine call a resuming routine for another coroutine, which eventually forms a resuming-call cycle.
 (The trivial cycle is a coroutine resuming itself.)
@@ -933,5 +929,5 @@
 The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
 For the first resume, @cons@'s stack is initialized, creating local variables retained between subsequent activations of the coroutine.
-The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer via @payment@, and on return from @payment@, prints the receipt from the producer and increments @money@ (inflation).
+The consumer iterates until the @done@ flag is set, prints the values delivered by the producer, increments status, and calls back to the producer via @payment@, and on return from @payment@, prints the receipt from the producer and increments @money@ (inflation).
 The call from the consumer to the @payment@ introduces the cycle between producer and consumer.
 When @payment@ is called, the consumer copies values into the producer's communication variable and a resume is executed.
@@ -963,6 +959,6 @@
 \end{cfa}
 and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack.
-Furthermore, the execution of constructs/destructors is in the wrong order for certain operations, \eg for threads;
-\eg, if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
+Furthermore, the execution of constructs/destructors is in the wrong order for certain operations.
+For example, for threads if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
 
 An alternatively is composition:
@@ -984,5 +980,5 @@
 symmetric_coroutine<>::yield_type
 \end{cfa}
-Similarly, the canonical threading paradigm is often based on routine pointers, \eg @pthread@~\cite{pthreads}, \Csharp~\cite{Csharp}, Go~\cite{Go}, and Scala~\cite{Scala}.
+Similarly, the canonical threading paradigm is often based on routine pointers, \eg @pthreads@~\cite{pthreads}, \Csharp~\cite{Csharp}, Go~\cite{Go}, and Scala~\cite{Scala}.
 However, the generic thread-handle (identifier) is limited (few operations), unless it is wrapped in a custom type.
 \begin{cfa}
@@ -1001,5 +997,5 @@
 Note, the type @coroutine_t@ must be an abstract handle to the coroutine, because the coroutine descriptor and its stack are non-copyable.
 Copying the coroutine descriptor results in copies being out of date with the current state of the stack.
-Correspondingly, copying the stack results is copies being out of date with coroutine descriptor, and pointers in the stack being out of date to data on the stack.
+Correspondingly, copying the stack results is copies being out of date with the coroutine descriptor, and pointers in the stack being out of date to data on the stack.
 (There is no mechanism in C to find all stack-specific pointers and update them as part of a copy.)
 
@@ -1015,5 +1011,5 @@
 Furthermore, implementing coroutines without language supports also displays the power of a programming language.
 While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed without using the language support.
-The reserved keyword eases use for the common cases.
+The reserved keyword simply eases use for the common cases.
 
 Part of the mechanism to generalize coroutines is using a \CFA trait, which defines a coroutine as anything satisfying the trait @is_coroutine@, and this trait is used to restrict coroutine-manipulation routines:
@@ -1030,5 +1026,5 @@
 The @main@ routine has no return value or additional parameters because the coroutine type allows an arbitrary number of interface routines with corresponding arbitrary typed input/output values versus fixed ones.
 The generic routines @suspend@ and @resume@ can be redefined, but any object passed to them is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
-The advantage of this approach is that users can easily create different types of coroutines, for example, changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine, and possibly redefining @suspend@ and @resume@.
+The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine, and possibly redefining @suspend@ and @resume@.
 The \CFA keyword @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main:
 \begin{cquote}
@@ -1098,5 +1094,5 @@
 The difference is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates an instance of @main@;
 whereas, a user thread receives its own thread from the runtime system, which starts in @main@ as some point after the thread constructor is run.\footnote{
-The \lstinline@main@ routine is already a special routine in C (where the program begins), so it is a natural extension of the semantics to use overloading to declare mains for different coroutines/threads (the normal main being the main of the initial thread).}
+The \lstinline@main@ routine is already a special routine in C, \ie where the program's initial thread begins, so it is a natural extension of this semantics to use overloading to declare \lstinline@main@s for user coroutines and threads.}
 No return value or additional parameters are necessary for this routine because the task type allows an arbitrary number of interface routines with corresponding arbitrary typed input/output values.
 
@@ -1189,7 +1185,5 @@
 void main( Adder & adder ) with( adder ) {
     subtotal = 0;
-    for ( int c = 0; c < cols; c += 1 ) {
-		subtotal += row[c];
-    }
+    for ( int c = 0; c < cols; c += 1 ) { subtotal += row[c]; }
 }
 int main() {
@@ -1216,8 +1210,8 @@
 
 Uncontrolled non-deterministic execution is meaningless.
-To reestablish meaningful execution requires mechanisms to reintroduce determinism (\ie restrict non-determinism), called mutual exclusion and synchronization, where mutual exclusion is an access-control mechanism on data shared by threads, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.
+To reestablish meaningful execution requires mechanisms to reintroduce determinism, \ie restrict non-determinism, called mutual exclusion and synchronization, where mutual exclusion is an access-control mechanism on data shared by threads, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.
 Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it, \eg Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala).
-In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (\eg channels~\cite{CSP,Go}).
-However, in call/return-based languages, these approaches force a clear distinction (\ie introduce a new programming paradigm) between regular and concurrent computation (\ie routine call versus message passing).
+In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts, \eg channels~\cite{CSP,Go}.
+However, in call/return-based languages, these approaches force a clear distinction, \ie introduce a new programming paradigm, between regular and concurrent computation, \eg routine call versus message passing.
 Hence, a programmer must learn and manipulate two sets of design patterns.
 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
@@ -1244,6 +1238,6 @@
 However, many solutions exist for mutual exclusion, which vary in terms of performance, flexibility and ease of use.
 Methods range from low-level locks, which are fast and flexible but require significant attention for correctness, to higher-level concurrency techniques, which sacrifice some performance to improve ease of use.
-Ease of use comes by either guaranteeing some problems cannot occur (\eg deadlock free), or by offering a more explicit coupling between shared data and critical section.
-For example, the \CC @std::atomic<T>@ offers an easy way to express mutual-exclusion on a restricted set of operations (\eg reading/writing) for numerical types.
+Ease of use comes by either guaranteeing some problems cannot occur, \eg deadlock free, or by offering a more explicit coupling between shared data and critical section.
+For example, the \CC @std::atomic<T>@ offers an easy way to express mutual-exclusion on a restricted set of operations, \eg reading/writing, for numerical types.
 However, a significant challenge with locks is composability because it takes careful organization for multiple locks to be used while preventing deadlock.
 Easing composability is another feature higher-level mutual-exclusion mechanisms can offer.
@@ -1254,5 +1248,5 @@
 Synchronization enforces relative ordering of execution, and synchronization tools provide numerous mechanisms to establish these timing relationships.
 Low-level synchronization primitives offer good performance and flexibility at the cost of ease of use;
-higher-level mechanisms often simplify usage by adding better coupling between synchronization and data (\eg message passing), or offering a simpler solution to otherwise involved challenges, \eg barrier lock.
+higher-level mechanisms often simplify usage by adding better coupling between synchronization and data, \eg message passing, or offering a simpler solution to otherwise involved challenges, \eg barrier lock.
 Often synchronization is used to order access to a critical section, \eg ensuring a reader thread is the next kind of thread to enter a critical section.
 If a writer thread is scheduled for next access, but another reader thread acquires the critical section first, that reader has \newterm{barged}.
@@ -1272,5 +1266,5 @@
 The strong association with the call/return paradigm eases programmability, readability and maintainability, at a slight cost in flexibility and efficiency.
 
-Note, like coroutines/threads, both locks and monitors require an abstract handle to reference them, because at their core, both mechanisms are manipulating non-copyable shared state.
+Note, like coroutines/threads, both locks and monitors require an abstract handle to reference them, because at their core, both mechanisms are manipulating non-copyable shared-state.
 Copying a lock is insecure because it is possible to copy an open lock and then use the open copy when the original lock is closed to simultaneously access the shared data.
 Copying a monitor is secure because both the lock and shared data are copies, but copying the shared data is meaningless because it no longer represents a unique entity.
@@ -1375,9 +1369,9 @@
 \end{cfa}
 (While object-oriented monitors can be extended with a mutex qualifier for multiple-monitor members, no prior example of this feature could be found.)
-In practice, writing multi-locking routines that do not deadlocks is tricky.
+In practice, writing multi-locking routines that do not deadlock is tricky.
 Having language support for such a feature is therefore a significant asset for \CFA.
 
 The capability to acquire multiple locks before entering a critical section is called \newterm{bulk acquire}.
-In previous example, \CFA guarantees the order of acquisition is consistent across calls to different routines using the same monitors as arguments.
+In the previous example, \CFA guarantees the order of acquisition is consistent across calls to different routines using the same monitors as arguments.
 This consistent ordering means acquiring multiple monitors is safe from deadlock.
 However, users can force the acquiring order.
@@ -1395,5 +1389,5 @@
 In the calls to @bar@ and @baz@, the monitors are acquired in opposite order.
 
-However, such use leads to lock acquiring order problems resulting in deadlock~\cite{Lister77}, where detecting it requires dynamically tracking of monitor calls, and dealing with it requires implement rollback semantics~\cite{Dice10}.
+However, such use leads to lock acquiring order problems resulting in deadlock~\cite{Lister77}, where detecting it requires dynamically tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}.
 In \CFA, safety is guaranteed by using bulk acquire of all monitors to shared objects, whereas other monitor systems provide no aid.
 While \CFA provides only a partial solution, the \CFA partial solution handles many useful cases.
@@ -1440,6 +1434,6 @@
 
 
-\section{Internal Scheduling}
-\label{s:InternalScheduling}
+\section{Scheduling}
+\label{s:Scheduling}
 
 While monitor mutual-exclusion provides safe access to shared data, the monitor data may indicate that a thread accessing it cannot proceed.
@@ -1454,4 +1448,5 @@
 The appropriate condition lock is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer.
 Signalling is unconditional, because signalling an empty condition lock does nothing.
+
 Signalling semantics cannot have the signaller and signalled thread in the monitor simultaneously, which means:
 \begin{enumerate}
@@ -1463,5 +1458,5 @@
 The signalling thread blocks but is marked for urgrent unblocking at the next scheduling point and the signalled thread continues.
 \end{enumerate}
-The first approach is too restrictive, as it precludes solving a reasonable class of problems (\eg dating service).
+The first approach is too restrictive, as it precludes solving a reasonable class of problems, \eg dating service.
 \CFA supports the next two semantics as both are useful.
 Finally, while it is common to store a @condition@ as a field of the monitor, in \CFA, a @condition@ variable can be created/stored independently.
@@ -1539,25 +1534,28 @@
 If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer.
 Threads making calls to routines that are currently excluded block outside (external) of the monitor on a calling queue, versus blocking on condition queues inside (internal) of the monitor.
+% External scheduling is more constrained and explicit, which helps programmers reduce the non-deterministic nature of concurrency.
+External scheduling allows users to wait for events from other threads without concern of unrelated events occurring.
+The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go channels.
+Of course, both of these paradigms have their own strengths and weaknesses, but for this project, control-flow semantics was chosen to stay consistent with the rest of the languages semantics.
+Two challenges specific to \CFA arise when trying to add external scheduling with loose object definitions and multiple-monitor routines.
+The previous example shows a simple use @_Accept@ versus @wait@/@signal@ and its advantages.
+Note that while other languages often use @accept@/@select@ as the core external scheduling keyword, \CFA uses @waitfor@ to prevent name collisions with existing socket \textbf{api}s.
 
 For internal scheduling, non-blocking signalling (as in the producer/consumer example) is used when the signaller is providing the cooperation for a waiting thread;
 the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently.
-The waiter unblocks next, takes the state, and exits the monitor.
+The waiter unblocks next, uses/takes the state, and exits the monitor.
 Blocking signalling is the reverse, where the waiter is providing the cooperation for the signalling thread;
 the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter.
-The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to take the state.
+The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state.
 
 Figure~\ref{f:DatingService} shows a dating service demonstrating the two forms of signalling: non-blocking and blocking.
 The dating service matches girl and boy threads with matching compatibility codes so they can exchange phone numbers.
 A thread blocks until an appropriate partner arrives.
-The complexity is exchanging phone number in the monitor, 
-While the non-barging monitor prevents a caller from stealing a phone number, the monitor mutual-exclusion property 
-
-The dating service is an example of a monitor that cannot be written using external scheduling because:
-
-The example in table \ref{tbl:datingservice} highlights the difference in behaviour.
-As mentioned, @signal@ only transfers ownership once the current critical section exits; this behaviour requires additional synchronization when a two-way handshake is needed.
-To avoid this explicit synchronization, the @condition@ type offers the @signal_block@ routine, which handles the two-way handshake as shown in the example.
-This feature removes the need for a second condition variables and simplifies programming.
-Like every other monitor semantic, @signal_block@ uses barging prevention, which means mutual-exclusion is baton-passed both on the front end and the back end of the call to @signal_block@, meaning no other thread can acquire the monitor either before or after the call.
+The complexity is exchanging phone number in the monitor because the monitor mutual-exclusion property prevents exchanging numbers.
+For internal scheduling, the @exchange@ condition is necessary to block the thread finding the match, while the matcher unblocks to take the oppose number, post its phone number, and unblock the partner. 
+For external scheduling, the implicit urgent-condition replaces the explict @exchange@-condition and @signal_block@ puts the finding thread on the urgent condition and unblocks the matcher..
+
+The dating service is an example of a monitor that cannot be written using external scheduling because it requires knowledge of calling parameters to make scheduling decisions, and parameters of waiting threads are unavailable;
+as well, an arriving thread may not find a partner and must wait, which requires a condition variable, and condition variables imply internal scheduling.
 
 \begin{figure}
@@ -1655,6 +1653,6 @@
 }
 \end{cfa}
-must have acquired monitor locks that are greater than or equal to the number of locks for the waiting thread signalled from the front of the condition queue.
-In general, the signaller does not know the order of waiting threads, so in general, it must acquire the maximum number of mutex locks for the worst-case waiting thread.
+must have acquired monitor locks that are greater than or equal to the number of locks for the waiting thread signalled from the condition queue.
+{\color{red}In general, the signaller does not know the order of waiting threads, so in general, it must acquire the maximum number of mutex locks for the worst-case waiting thread.}
 
 Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex types in the parameter list, \ie @waitfor( rtn, m1, m2 )@.
@@ -1667,8 +1665,8 @@
 void foo( M & mutex m1, M & mutex m2 ) {
 	... wait( `e, m1` ); ...				$\C{// release m1, keeping m2 acquired )}$
-void baz( M & mutex m1, M & mutex m2 ) {	$\C{// must acquire m1 and m2 )}$
+void bar( M & mutex m1, M & mutex m2 ) {	$\C{// must acquire m1 and m2 )}$
 	... signal( `e` ); ...
 \end{cfa}
-The @wait@ only releases @m1@ so the signalling thread cannot acquire both @m1@ and @m2@ to  enter @baz@ to get to the @signal@.
+The @wait@ only releases @m1@ so the signalling thread cannot acquire both @m1@ and @m2@ to  enter @bar@ to get to the @signal@.
 While deadlock issues can occur with multiple/nesting acquisition, this issue results from the fact that locks, and by extension monitors, are not perfectly composable.
 
@@ -1755,5 +1753,5 @@
 However, Figure~\ref{f:OtherWaitingThread} shows this solution is complex depending on other waiters, resulting is choices when the signaller finishes the inner mutex-statement.
 The singaller can retain @m2@ until completion of the outer mutex statement and pass the locks to waiter W1, or it can pass @m2@ to waiter W2 after completing the inner mutex-statement, while continuing to hold @m1@.
-In the latter case, waiter W2 must eventually pass @m2@ to waiter W1, which is complex because W2 may have waited before W1 so it is unaware of W1.
+In the latter case, waiter W2 must eventually pass @m2@ to waiter W1, which is complex because W1 may have waited before W2, so W2 is unaware of it.
 Furthermore, there is an execution sequence where the signaller always finds waiter W2, and hence, waiter W1 starves.
 
@@ -1861,9 +1859,7 @@
 
 
+\begin{comment}
 \section{External scheduling} \label{extsched}
 
-An alternative to internal scheduling is external scheduling (see Table~\ref{tbl:sched}).
-
-\begin{comment}
 \begin{table}
 \begin{tabular}{|c|c|c|}
@@ -1929,22 +1925,12 @@
 \label{tbl:sched}
 \end{table}
-\end{comment}
-
-This method is more constrained and explicit, which helps users reduce the non-deterministic nature of concurrency.
-Indeed, as the following examples demonstrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occurring.
-External scheduling can generally be done either in terms of control flow (\eg Ada with @accept@, \uC with @_Accept@) or in terms of data (\eg Go with channels).
-Of course, both of these paradigms have their own strengths and weaknesses, but for this project, control-flow semantics was chosen to stay consistent with the rest of the languages semantics.
-Two challenges specific to \CFA arise when trying to add external scheduling with loose object definitions and multiple-monitor routines.
-The previous example shows a simple use @_Accept@ versus @wait@/@signal@ and its advantages.
-Note that while other languages often use @accept@/@select@ as the core external scheduling keyword, \CFA uses @waitfor@ to prevent name collisions with existing socket \textbf{api}s.
 
 For the @P@ member above using internal scheduling, the call to @wait@ only guarantees that @V@ is the last routine to access the monitor, allowing a third routine, say @isInUse()@, acquire mutual exclusion several times while routine @P@ is waiting.
 On the other hand, external scheduling guarantees that while routine @P@ is waiting, no other routine than @V@ can acquire the monitor.
-
-% ======================================================================
-% ======================================================================
+\end{comment}
+
+
 \subsection{Loose Object Definitions}
-% ======================================================================
-% ======================================================================
+
 In \uC, a monitor class declaration includes an exhaustive list of monitor operations.
 Since \CFA is not object oriented, monitors become both more difficult to implement and less clear for a user:
