Index: doc/papers/concurrency/Paper.tex
===================================================================
--- doc/papers/concurrency/Paper.tex	(revision 397edf7a638bd273ce80fb1fb107f1eaabf93eec)
+++ doc/papers/concurrency/Paper.tex	(revision e98c7ab8e4ef76d28a872e441e8ec07ed2328e28)
@@ -280,5 +280,5 @@
 The runtime also ensures multiple monitors can be safely acquired \emph{simultaneously} (deadlock free), and this feature is fully integrated with all monitor synchronization mechanisms.
 All control-flow features integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers.
-Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
+Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming languages.
 }%
 
@@ -301,5 +301,5 @@
 In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control-flow capabilities on top of a highly popular programming language allowing immediate dissemination.
 Within the \CFA framework, new control-flow features are created from scratch because ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}.
-However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone;
+However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error-prone;
 no high-level language concurrency features are defined.
 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-9 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach.
@@ -312,23 +312,23 @@
 As a result, languages like Java, Scala, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopt the 1:1 kernel-threading model, with a variety of presentation mechanisms.
 From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,Marcel}, including putting green threads back into Java~\cite{Quasar}.
-The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work-units to facilitate load balancing by the runtime~\cite{Verch12}.
+The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work units to facilitate load balancing by the runtime~\cite{Verch12}.
 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{Adya02,vonBehren03}.
 Finally, performant user-threading implementations (both time and space) meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
 
-A further effort over the past two decades is the development of language memory-models to deal with the conflict between language features and compiler/hardware optimizations, i.e., some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.
+A further effort over the past two decades is the development of language memory models to deal with the conflict between language features and compiler/hardware optimizations, \ie, some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.
 The consequence is that a language must provide sufficient tools to program around safety issues, as inline and library code is all sequential to the compiler.
-One solution is low-level qualifiers and functions (e.g., @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs.
+One solution is low-level qualifiers and functions (\eg, @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs.
 A safer solution is high-level language constructs so the \emph{compiler} knows the optimization boundaries, and hence, provides implicit safety.
-This problem is best know with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{
+This problem is best known with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{
 \CFA exception handling will be presented in a separate paper.
-The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
+The key feature that dovetails with this paper is nonlocal exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
 } and coroutines.
-Finally, language solutions allow matching constructs with language paradigm, i.e., imperative and functional languages often have different presentations of the same concept to fit their programming model.
+Finally, language solutions allow matching constructs with language paradigm, \ie, imperative and functional languages often have different presentations of the same concept to fit their programming model.
 
 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary.
-Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows.
+Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, \ie, once there is spurious wakeup, signals-as-hints follow.
 However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice.
-Similarly, signals-as-hints is often a performance decision.
-We argue removing spurious wakeup and signals-as-hints makes concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation.
+Similarly, signals-as-hints are often a performance decision.
+We argue removing spurious wakeup and signals-as-hints make concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation.
 (Author experience teaching concurrency is that students are highly confused by these semantics.)
 Clawing back performance, when local non-determinism is unimportant, should be an option not the default.
@@ -337,5 +337,5 @@
 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory.
 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten.
-While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
+While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (\eg, objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base.
 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems.
@@ -343,5 +343,5 @@
 
 \CFA embraces user-level threading, language extensions for advanced control-flow, and safety as the default.
-We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming-languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
+We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
 The main contributions of this work are:
 \begin{itemize}
@@ -349,5 +349,5 @@
 language-level generators, coroutines and user-level threading, which respect the expectations of C programmers.
 \item
-monitor synchronization without barging, and the ability to safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating these capability with all monitor synchronization mechanisms.
+monitor synchronization without barging, and the ability to safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating these capabilities with all monitor synchronization mechanisms.
 \item
 providing statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features.
@@ -367,14 +367,14 @@
 \section{Stateful Function}
 
-The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, e.g., plugin, device driver, finite-state machine.
+The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, \eg, plugin, device driver, finite-state machine.
 Hence, a stateful function may not end when it returns to its caller, allowing it to be restarted with the data and execution location present at the point of suspension.
 This capability is accomplished by retaining a data/execution \emph{closure} between invocations.
-If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, e.g., suspending outside the generator is prohibited.
-If the closure is variable sized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions.
+If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, \eg, suspending outside the generator is prohibited.
+If the closure is variably sized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions.
 Hence, refactoring a stackless coroutine may require changing it to stackful.
-A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, i.e., resume/suspend operations are remembered through the closure not the stack.
+A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, \ie, resume/suspend operations are remembered through the closure not the stack.
 As well, activating a stateful function is \emph{asymmetric} or \emph{symmetric}, identified by resume/suspend (no cycles) and resume/resume (cycles).
 A fixed closure activated by modified call/return is faster than a variable closure activated by context switching.
-Additionally, any storage management for the closure (especially in unmanaged languages, i.e., no garbage collection) must also be factored into design and performance.
+Additionally, any storage management for the closure (especially in unmanaged languages, \ie, no garbage collection) must also be factored into design and performance.
 Therefore, selecting between stackless and stackful semantics is a tradeoff between programming requirements and performance, where stackless is faster and stackful is more general.
 Note, creation cost is amortized across usage, so activation cost is usually the dominant factor.
@@ -603,5 +603,5 @@
 the top initialization state appears at the start and the middle execution state is denoted by statement @suspend@.
 Any local variables in @main@ \emph{are not retained} between calls;
-hence local variable are only for temporary computations \emph{between} suspends.
+hence local variables are only for temporary computations \emph{between} suspends.
 All retained state \emph{must} appear in the generator's type.
 As well, generator code containing a @suspend@ cannot be refactored into a helper function called by the generator, because @suspend@ is implemented via @return@, so a return from the helper function goes back to the current generator not the resumer.
@@ -618,5 +618,5 @@
 sout | (int)f1() | (double)f1() | f2( 2 ); // alternative interface, cast selects call based on return type, step 2 values
 \end{cfa}
-Now, the generator can be a separately-compiled opaque-type only accessed through its interface functions.
+Now, the generator can be a separately compiled opaque-type only accessed through its interface functions.
 For contrast, Figure~\ref{f:PythonFibonacci} shows the equivalent Python Fibonacci generator, which does not use a generator type, and hence only has a single interface, but an implicit closure.
 
@@ -624,5 +624,5 @@
 (This restriction is removed by the coroutine in Section~\ref{s:Coroutine}.)
 This requirement follows from the generality of variable-size local-state, \eg local state with a variable-length array requires dynamic allocation because the array size is unknown at compile time.
-However, dynamic allocation significantly increases the cost of generator creation/destruction and is a show-stopper for embedded real-time programming.
+However, dynamic allocation significantly increases the cost of generator creation/destruction and is a showstopper for embedded real-time programming.
 But more importantly, the size of the generator type is tied to the local state in the generator main, which precludes separate compilation of the generator main, \ie a generator must be inlined or local state must be dynamically allocated.
 With respect to safety, we believe static analysis can discriminate local state from temporary variables in a generator, \ie variable usage spanning @suspend@, and generate a compile-time error.
@@ -648,5 +648,5 @@
 \end{center}
 The example takes advantage of resuming a generator in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops.
-The destructor provides a newline, if formatted text ends with a full line.
+The destructor provides a newline if formatted text ends with a full line.
 Figure~\ref{f:CFormatSim} shows the C implementation of the \CFA input generator with one additional field and the computed @goto@.
 For contrast, Figure~\ref{f:PythonFormatter} shows the equivalent Python format generator with the same properties as the Fibonacci generator.
@@ -669,5 +669,5 @@
 In contrast, the execution state is large, with one @resume@ and seven @suspend@s.
 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach.
-Because FSMs can be complex and occur frequently in important domains, direct support of the generator is crucial in a systems programming-language.
+Because FSMs can be complex and frequently occur in important domains, direct support of the generator is crucial in a system programming language.
 
 \begin{figure}
@@ -782,5 +782,5 @@
 The steps for symmetric control-flow are creating, executing, and terminating the cycle.
 Constructing the cycle must deal with definition-before-use to close the cycle, \ie, the first generator must know about the last generator, which is not within scope.
-(This issues occurs for any cyclic data-structure.)
+(This issue occurs for any cyclic data structure.)
 % The example creates all the generators and then assigns the partners that form the cycle.
 % Alternatively, the constructor can assign the partners as they are declared, except the first, and the first-generator partner is set after the last generator declaration to close the cycle.
@@ -792,5 +792,5 @@
 
 Figure~\ref{f:CPingPongSim} shows the implementation of the symmetric generator, where the complexity is the @resume@, which needs an extension to the calling convention to perform a forward rather than backward jump.
-This jump starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables.
+This jump-starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables.
 However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward.
 This semantics is basically a tail-call optimization, which compilers already perform.
@@ -862,6 +862,6 @@
 
 Finally, part of this generator work was inspired by the recent \CCtwenty generator proposal~\cite{C++20Coroutine19} (which they call coroutines).
-Our work provides the same high-performance asymmetric-generators as \CCtwenty, and extends their work with symmetric generators.
-An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound-statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax:
+Our work provides the same high-performance asymmetric generators as \CCtwenty, and extends their work with symmetric generators.
+An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax:
 \begin{cfa}
 ... suspend`{ ... }`;
@@ -879,5 +879,5 @@
 A coroutine is specified by replacing @generator@ with @coroutine@ for the type.
 Coroutine generality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation.
-A series of different kinds of coroutines and their implementation demonstrate how coroutines extend generators.
+A series of different kinds of coroutines and their implementations demonstrate how coroutines extend generators.
 
 First, the previous generator examples are converted to their coroutine counterparts, allowing local-state variables to be moved from the generator type into the coroutine main.
@@ -1164,5 +1164,5 @@
 \end{cfa}
 \end{tabular}
-\caption{Producer / consumer: resume-resume cycle, bi-directional communication}
+\caption{Producer / consumer: resume-resume cycle, bidirectional communication}
 \label{f:ProdCons}
 \end{figure}
@@ -1208,5 +1208,5 @@
 Furthermore, each deallocated coroutine must guarantee all destructors are run for object allocated in the coroutine type \emph{and} allocated on the coroutine's stack at the point of suspension, which can be arbitrarily deep.
 When a coroutine's main ends, its stack is already unwound so any stack allocated objects with destructors have been finalized.
-The na\"{i}ve semantics for coroutine-cycle termination is context switch to the last resumer, like executing a @suspend@/@return@ in a generator.
+The na\"{i}ve semantics for coroutine-cycle termination is to context switch to the last resumer, like executing a @suspend@/@return@ in a generator.
 However, for coroutines, the last resumer is \emph{not} implicitly below the current stack frame, as for generators, because each coroutine's stack is independent.
 Unfortunately, it is impossible to determine statically if a coroutine is in a cycle and unrealistic to check dynamically (graph-cycle problem).
@@ -1214,8 +1214,8 @@
 
 Our solution is to context switch back to the first resumer (starter) once the coroutine ends.
-This semantics works well for the most common asymmetric and symmetric coroutine usage-patterns.
+This semantics works well for the most common asymmetric and symmetric coroutine usage patterns.
 For asymmetric coroutines, it is common for the first resumer (starter) coroutine to be the only resumer.
 All previous generators converted to coroutines have this property.
-For symmetric coroutines, it is common for the cycle creator to persist for the life-time of the cycle.
+For symmetric coroutines, it is common for the cycle creator to persist for the lifetime of the cycle.
 Hence, the starter coroutine is remembered on the first resume and ending the coroutine resumes the starter.
 Figure~\ref{f:ProdConsRuntimeStacks} shows this semantic by the dashed lines from the end of the coroutine mains: @prod@ starts @cons@ so @cons@ resumes @prod@ at the end, and the program main starts @prod@ so @prod@ resumes the program main at the end.
@@ -1285,10 +1285,10 @@
 \end{cfa}
 Note, copying generators/coroutines/threads is not meaningful.
-For example, both the resumer and suspender descriptors can have bi-directional pointers;
+For example, both the resumer and suspender descriptors can have bidirectional pointers;
 copying these coroutines does not update the internal pointers so behaviour of both copies would be difficult to understand.
 Furthermore, two coroutines cannot logically execute on the same stack.
 A deep coroutine copy, which copies the stack, is also meaningless in an unmanaged language (no garbage collection), like C, because the stack may contain pointers to object within it that require updating for the copy.
 The \CFA @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer).
-The function definitions ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the coroutine descriptor from its handle.
+The function definitions ensure there is a statically typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the coroutine descriptor from its handle.
 The @main@ function has no return value or additional parameters because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values versus fixed ones.
 The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining \textsf{suspend} and @resume@.
@@ -1342,5 +1342,5 @@
 For a VLS stack allocation/deallocation is an inexpensive adjustment of the stack pointer, modulo any stack constructor costs (\eg initial frame setup).
 For heap stack allocation, allocation/deallocation is an expensive heap allocation (where the heap can be a shared resource), modulo any stack constructor costs.
-With heap stack allocation, it is also possible to use a split (segmented) stack calling-convention, available with gcc and clang, so the stack is variable sized.
+With heap stack allocation, it is also possible to use a split (segmented) stack calling convention, available with gcc and clang, so the stack is variable sized.
 Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks.
 In \CFA debug-mode, the fixed-sized stack is terminated with a write-only page, which catches most stack overflows.
@@ -1359,5 +1359,5 @@
 \label{s:Concurrency}
 
-Concurrency is nondeterministic scheduling of independent sequential execution-paths (threads), where each thread has its own stack.
+Concurrency is nondeterministic scheduling of independent sequential execution paths (threads), where each thread has its own stack.
 A single thread with multiple call stacks, \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}.
 In coroutining, coroutines self-schedule the thread across stacks so execution is deterministic.
@@ -1367,5 +1367,5 @@
 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a}.
 Therefore, a minimal concurrency system requires coroutines \emph{in conjunction with a nondeterministic scheduler}.
-The resulting execution system now follows a cooperative threading-model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}.
+The resulting execution system now follows a cooperative threading model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}.
 Adding \newterm{preemption} introduces non-cooperative scheduling, where context switching occurs randomly between any two instructions often based on a timer interrupt, called \newterm{preemptive scheduling}.
 While a scheduler introduces uncertain execution among explicit context switches, preemption introduces uncertainty by introducing implicit context switches.
@@ -1497,5 +1497,5 @@
 \end{cquote}
 Like coroutines, the @dtype@ property prevents \emph{implicit} copy operations and the @is_thread@ trait provides no \emph{explicit} copy operations, so threads must be passed by reference (pointer).
-Similarly, the function definitions ensures there is a statically-typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the thread descriptor from its handle, and a special destructor to prevent deallocation while the thread is executing.
+Similarly, the function definitions ensure there is a statically typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the thread descriptor from its handle, and a special destructor to prevent deallocation while the thread is executing.
 (The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitor}.)
 The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack and starts running the coroutine main on the stack;
@@ -1512,5 +1512,5 @@
 Hence, a programmer must learn and manipulate two sets of design/programming patterns.
 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
-In contrast, approaches based on stateful models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
+In contrast, approaches based on stateful models more closely resemble the standard call/return programming model, resulting in a single programming paradigm.
 
 At the lowest level, concurrent control is implemented by atomic operations, upon which different kinds of locking mechanisms are constructed, \eg semaphores~\cite{Dijkstra68b}, barriers, and path expressions~\cite{Campbell74}.
@@ -1520,7 +1520,7 @@
 
 One of the most natural, elegant, and efficient mechanisms for mutual exclusion and synchronization for shared-memory systems is the \emph{monitor}.
-First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, many concurrent programming-languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
+First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, many concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
 In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as mutex locks or semaphores to simulate monitors.
-For these reasons, \CFA selected monitors as the core high-level concurrency-construct, upon which higher-level approaches can be easily constructed.
+For these reasons, \CFA selected monitors as the core high-level concurrency construct, upon which higher-level approaches can be easily constructed.
 
 
@@ -1571,5 +1571,5 @@
 % (While a constructor may publish its address into a global variable, doing so generates a race-condition.)
 The prefix increment operation, @++?@, is normally @mutex@, indicating mutual exclusion is necessary during function execution, to protect the incrementing from race conditions, unless there is an atomic increment instruction for the implementation type.
-The assignment operators provide bi-directional conversion between an atomic and normal integer without accessing field @cnt@;
+The assignment operators provide bidirectional conversion between an atomic and normal integer without accessing field @cnt@;
 these operations only need @mutex@, if reading/writing the implementation type is not atomic.
 The atomic counter is used without any explicit mutual-exclusion and provides thread-safe semantics, which is similar to the \CC template @std::atomic@.
@@ -1593,5 +1593,5 @@
 Similar safety is offered by \emph{explicit} mechanisms like \CC RAII;
 monitor \emph{implicit} safety ensures no programmer usage errors.
-Furthermore, RAII mechansims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread;
+Furthermore, RAII mechanisms cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread;
 RAII is purely a mutual-exclusion mechanism (see Section~\ref{s:Scheduling}).
 
@@ -1632,5 +1632,5 @@
 
 The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant.
-Instead, the semantics have one qualifier as the default and the other required.
+Instead, the semantics has one qualifier as the default and the other required.
 For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors.
 Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare.
@@ -1809,5 +1809,5 @@
 Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion, so either the signaller or signallee can proceed.
 For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line).
-Multiple signals move multiple signallees to urgent, until the condition is empty.
+Multiple signals move multiple signallees to urgent until the condition is empty.
 When the signaller exits or waits, a thread blocked on urgent is processed before calling threads to prevent barging.
 (Java conceptually moves the signalled thread to the calling queue, and hence, allows barging.)
@@ -1819,5 +1819,5 @@
 The @waitfor@ has the same semantics as @signal_block@, where the signalled thread executes before the signallee, which waits on urgent.
 Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent.
-External scheduling requires urgent to be a stack, because the signaller excepts to execute immediately after the specified monitor call has exited or waited.
+External scheduling requires urgent to be a stack, because the signaller expects to execute immediately after the specified monitor call has exited or waited.
 Internal scheduling behaves the same for an urgent stack or queue, except for multiple signalling, where the threads unblock from urgent in reverse order from signalling.
 If the restart order is important, multiple signalling by a signal thread can be transformed into daisy-chain signalling among threads, where each thread signals the next thread.
@@ -2143,5 +2143,5 @@
 \end{figure}
 
-Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:
+Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, \eg:
 \begin{cfa}
 if ( C1 ) waitfor( mem1 );			 when ( C1 ) waitfor( mem1 );
@@ -2256,5 +2256,5 @@
 \label{s:LooseObjectDefinitions}
 
-In an object-oriented programming-language, a class includes an exhaustive list of operations.
+In an object-oriented programming language, a class includes an exhaustive list of operations.
 A new class can add members via static inheritance but the subclass still has an exhaustive list of operations.
 (Dynamic member adding, \eg JavaScript~\cite{JavaScript}, is not considered.)
@@ -2671,9 +2671,9 @@
 \subsection{Preemption}
 
-Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on code between cooperative scheduling to be atomic.
+Nondeterministic preemption provides fairness from long-running threads, and forces concurrent programmers to write more robust programs, rather than relying on code between cooperative scheduling to be atomic.
 This atomic reliance can fail on multi-core machines, because execution across cores is nondeterministic.
 A different reason for not supporting preemption is that it significantly complicates the runtime system, \eg Microsoft runtime does not support interrupts and on Linux systems, interrupts are complex (see below).
-Preemption is normally handled by setting a count-down timer on each virtual processor.
-When the timer expires, an interrupt is delivered, and the interrupt handler resets the count-down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.
+Preemption is normally handled by setting a countdown timer on each virtual processor.
+When the timer expires, an interrupt is delivered, and the interrupt handler resets the countdown timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.
 Multiple signal handlers may be pending.
 When control eventually switches back to the signal handler, it returns normally, and execution continues in the interrupted user thread, even though the return from the signal handler may be on a different kernel thread than the one where the signal is delivered.
@@ -2685,5 +2685,5 @@
 \begin{cquote}
 A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked.
-If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which to deliver the signal.
+If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which it will deliver the signal.
 SIGNAL(7) - Linux Programmer's Manual
 \end{cquote}
@@ -2691,5 +2691,5 @@
 To ensure each virtual processor receives a preemption signal, a discrete-event simulation is run on a special virtual processor, and only it sets and receives timer events.
 Virtual processors register an expiration time with the discrete-event simulator, which is inserted in sorted order.
-The simulation sets the count-down timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed.
+The simulation sets the countdown timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed.
 Processing a preemption event sends an \emph{internal} @SIGUSR1@ signal to the registered virtual processor, which is always delivered to that processor.
 
@@ -2913,5 +2913,5 @@
 \paragraph{Mutual-Exclusion}
 
-Uncontented mutual exclusion, which occurs frequently, is measured by entering/leaving a critical section.
+Uncontented mutual exclusion, which frequently occurs, is measured by entering/leaving a critical section.
 For monitors, entering and leaving a monitor function is measured.
 To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a @pthread_mutex@ lock is also measured.
