Index: doc/papers/concurrency/Paper.tex
===================================================================
--- doc/papers/concurrency/Paper.tex	(revision b54118ad23651a1d9e49d473d666db9fc0d8cd48)
+++ doc/papers/concurrency/Paper.tex	(revision bdce852941b067571ca3876cdcd3c249e5a8aae1)
@@ -292,9 +292,10 @@
 
 \CFA~\cite{Moss18,Cforall} is a modern, polymorphic, non-object-oriented\footnote{
-\CFA has object-oriented features, such as constructors, destructors, virtuals and simple trait/interface inheritance.
+\CFA has object-oriented features, such as constructors, destructors, and simple trait/interface inheritance.
 % Go interfaces, Rust traits, Swift Protocols, Haskell Type Classes and Java Interfaces.
 % "Trait inheritance" works for me. "Interface inheritance" might also be a good choice, and distinguish clearly from implementation inheritance.
-% You'll want to be a little bit careful with terms like "structural" and "nominal" inheritance as well. CFA has structural inheritance (I think Go as well) -- it's inferred based on the structure of the code. Java, Rust, and Haskell (not sure about Swift) have nominal inheritance, where there needs to be a specific statement that "this type inherits from this type".
-However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions implemented by an implicit \lstinline@this@ (receiver) parameter.},
+% You'll want to be a little bit careful with terms like "structural" and "nominal" inheritance as well. CFA has structural inheritance (I think Go as well) -- it's inferred based on the structure of the code.
+% Java, Rust, and Haskell (not sure about Swift) have nominal inheritance, where there needs to be a specific statement that "this type inherits from this type".
+However, functions \emph{cannot} be nested in structures and there is no mechanism to designate a function parameter as a receiver, \lstinline@this@, parameter.},
 backwards-compatible extension of the C programming language.
 In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a vehicle for new typing and control-flow capabilities on top of a highly popular programming language\footnote{
@@ -317,5 +318,5 @@
 Coroutines are only a stepping stone towards concurrency where the commonality is that coroutines and threads retain state between calls.
 
-\Celeven/\CCeleven define concurrency~\cite[\S~7.26]{C11}, but it is largely wrappers for a subset of the pthreads library~\cite{Pthreads}.\footnote{Pthreads concurrency is based on simple thread fork and join in a function and mutex or condition locks, which is low-level and error-prone}
+\Celeven and \CCeleven define concurrency~\cite[\S~7.26]{C11}, but it is largely wrappers for a subset of the pthreads library~\cite{Pthreads}.\footnote{Pthreads concurrency is based on simple thread fork and join in a function and mutex or condition locks, which is low-level and error-prone}
 Interestingly, almost a decade after the \Celeven standard, the most recent versions of gcc, clang, and msvc do not support the \Celeven include @threads.h@, indicating no interest in the C11 concurrency approach (possibly because of the recent effort to add concurrency to \CC).
 While the \Celeven standard does not state a threading model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}, as for \CC.
@@ -392,16 +393,17 @@
 \label{s:FundamentalExecutionProperties}
 
-The features in a programming language should be composed from a set of fundamental properties rather than an ad hoc collection chosen by the designers.
+The features in a programming language should be composed of a set of fundamental properties rather than an ad hoc collection chosen by the designers.
 To this end, the control-flow features created for \CFA are based on the fundamental properties of any language with function-stack control-flow (see also \uC~\cite[pp.~140-142]{uC++}).
-The fundamental properties are execution state, thread, and mutual-exclusion/synchronization (MES).
+The fundamental properties are execution state, thread, and mutual-exclusion/synchronization.
 These independent properties can be used to compose different language features, forming a compositional hierarchy, where the combination of all three is the most advanced feature, called a thread.
 While it is possible for a language to only provide threads for composing programs~\cite{Hermes90}, this unnecessarily complicates and makes inefficient solutions to certain classes of problems.
 As is shown, each of the non-rejected composed language features solves a particular set of problems, and hence, has a defensible position in a programming language.
-If a compositional feature is missing, a programmer has too few fundamental properties resulting in a complex and/or is inefficient solution.
+If a compositional feature is missing, a programmer has too few fundamental properties resulting in a complex and/or inefficient solution.
 
 In detail, the fundamental properties are:
 \begin{description}[leftmargin=\parindent,topsep=3pt,parsep=0pt]
 \item[\newterm{execution state}:]
-is the state information needed by a control-flow feature to initialize, manage compute data and execution location(s), and de-initialize, \eg calling a function initializes a stack frame including contained objects with constructors, manages local data in blocks and return locations during calls, and de-initializes the frame by running any object destructors and management operations.
+is the state information needed by a control-flow feature to initialize and manage both compute data and execution location(s), and de-initialize.
+For example, calling a function initializes a stack frame including contained objects with constructors, manages local data in blocks and return locations during calls, and de-initializes the frame by running any object destructors and management operations.
 State is retained in fixed-sized aggregate structures (objects) and dynamic-sized stack(s), often allocated in the heap(s) managed by the runtime system.
 The lifetime of state varies with the control-flow feature, where longer life-time and dynamic size provide greater power but also increase usage complexity and cost.
@@ -414,12 +416,12 @@
 Multiple threads provide \emph{concurrent execution};
 concurrent execution becomes parallel when run on multiple processing units, \eg hyper-threading, cores, or sockets.
-There must be language mechanisms to create, block and unblock, and join with a thread, even if the mechanism is indirect.
-
-\item[\newterm{MES}:]
-is the concurrency mechanisms to perform an action without interruption and establish timing relationships among multiple threads.
+A programmer needs mechanisms to create, block and unblock, and join with a thread, even if these basic mechanisms are supplied indirectly through high-level features.
+
+\item[\newterm{mutual-exclusion / synchronization (MES)}:]
+is the concurrency mechanism to perform an action without interruption and establish timing relationships among multiple threads.
 We contented these two properties are independent, \ie mutual exclusion cannot provide synchronization and vice versa without introducing additional threads~\cite[\S~4]{Buhr05a}.
-Limiting MES, \eg no access to shared data, results in contrived solutions and inefficiency on multi-core von Neumann computers where shared memory is a foundational aspect of its design.
+Limiting MES functionality results in contrived solutions and inefficiency on multi-core von Neumann computers where shared memory is a foundational aspect of its design.
 \end{description}
-These properties are fundamental because they cannot be built from existing language features, \eg a basic programming language like C99~\cite{C99} cannot create new control-flow features, concurrency, or provide MES without atomic hardware mechanisms.
+These properties are fundamental as they cannot be built from existing language features, \eg a basic programming language like C99~\cite{C99} cannot create new control-flow features, concurrency, or provide MES without (atomic) hardware mechanisms.
 
 
@@ -443,4 +445,5 @@
 \renewcommand{\arraystretch}{1.25}
 %\setlength{\tabcolsep}{5pt}
+\vspace*{-5pt}
 \begin{tabular}{c|c||l|l}
 \multicolumn{2}{c||}{execution properties} & \multicolumn{2}{c}{mutual exclusion / synchronization} \\
@@ -461,4 +464,5 @@
 Yes (stackful)		& Yes		& \textbf{11}\ \ \ @thread@				& \textbf{12}\ \ @mutex@ @thread@		\\
 \end{tabular}
+\vspace*{-8pt}
 \end{table}
 
@@ -468,5 +472,5 @@
 A @mutex@ structure, often called a \newterm{monitor}, provides a high-level interface for race-free access of shared data in concurrent programming-languages.
 Case 3 is case 1 where the structure can implicitly retain execution state and access functions use this execution state to resume/suspend across \emph{callers}, but resume/suspend does not retain a function's local state.
-A stackless structure, often called a \newterm{generator} or \emph{iterator}, is \newterm{stackless} because it still borrow the caller's stack and thread, but the stack is used only to preserve state across its callees not callers.
+A stackless structure, often called a \newterm{generator} or \emph{iterator}, is \newterm{stackless} because it still borrows the caller's stack and thread, but the stack is used only to preserve state across its callees not callers.
 Generators provide the first step toward directly solving problems like finite-state machines that retain data and execution state between calls, whereas normal functions restart on each call.
 Case 4 is cases 2 and 3 with thread safety during execution of the generator's access functions.
@@ -475,7 +479,7 @@
 A stackful generator, often called a \newterm{coroutine}, is \newterm{stackful} because resume/suspend now context switch to/from the caller's and coroutine's stack.
 A coroutine extends the state retained between calls beyond the generator's structure to arbitrary call depth in the access functions.
-Cases 7 and 8 are rejected because a new thread must have its own stack, where the thread begins and stack frames are stored for calls, \ie it is unrealistic for a thread to borrow a stack.
-Cases 9 and 10 are rejected because a thread needs a growable stack to accept calls, make calls, block, or be preempted, all of which compound to require an unknown amount of execution state.
-If this kind of thread exists, it must execute to completion, \ie computation only, which severely restricts runtime management.
+Cases 7, 8, 9 and 10 are rejected because a new thread must have its own stack, where the thread begins and stack frames are stored for calls, \ie it is unrealistic for a thread to borrow a stack.
+For cases 9 and 10, the stackless frame is not growable, precluding accepting nested calls, making calls, blocking as it requires calls, or preemption as it requires pushing an interrupt frame, all of which compound to require an unknown amount of execution state.
+Hence, if this kind of uninterruptable thread exists, it must execute to completion, \ie computation only, which severely restricts runtime management.
 Cases 11 and 12 are a stackful thread with and without safe access to shared state.
 A thread is the language mechanism to start another thread of control in a program with growable execution state for call/return execution.
@@ -1396,9 +1400,9 @@
 The call to @start@ is the first @resume@ of @prod@, which remembers the program main as the starter and creates @prod@'s stack with a frame for @prod@'s coroutine main at the top, and context switches to it.
 @prod@'s coroutine main starts, creates local-state variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random values, calling the consumer's @deliver@ function to transfer the values, and printing the status returned from the consumer.
-The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
+The producer's call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
 Similarly on the first resume, @cons@'s stack is created and initialized, holding local-state variables retained between subsequent activations of the coroutine.
 The symmetric coroutine cycle forms when the consumer calls the producer's @payment@ function, which resumes the producer in the consumer's delivery function.
 When the producer calls @delivery@ again, it resumes the consumer in the @payment@ function.
-Both interface function than return to the their corresponding coroutine-main functions for the next cycle.
+Both interface functions then return to their corresponding coroutine-main functions for the next cycle.
 Figure~\ref{f:ProdConsRuntimeStacks} shows the runtime stacks of the program main, and the coroutine mains for @prod@ and @cons@ during the cycling.
 As a consequence of a coroutine retaining its last resumer for suspending back, these reverse pointers allow @suspend@ to cycle \emph{backwards} around a symmetric coroutine cycle.
@@ -1414,5 +1418,5 @@
 
 Terminating a coroutine cycle is more complex than a generator cycle, because it requires context switching to the program main's \emph{stack} to shutdown the program, whereas generators started by the program main run on its stack.
-Furthermore, each deallocated coroutine must execute all destructors for object allocated in the coroutine type \emph{and} allocated on the coroutine's stack at the point of suspension, which can be arbitrarily deep.
+Furthermore, each deallocated coroutine must execute all destructors for objects allocated in the coroutine type \emph{and} allocated on the coroutine's stack at the point of suspension, which can be arbitrarily deep.
 In the example, termination begins with the producer's loop stopping after N iterations and calling the consumer's @stop@ function, which sets the @done@ flag, resumes the consumer in function @payment@, terminating the call, and the consumer's loop in its coroutine main.
 % (Not shown is having @prod@ raise a nonlocal @stop@ exception at @cons@ after it finishes generating values and suspend back to @cons@, which catches the @stop@ exception to terminate its loop.)
@@ -1438,6 +1442,6 @@
 if @ping@ ends first, it resumes its starter the program main on return.
 Regardless of the cycle complexity, the starter structure always leads back to the program main, but the path can be entered at an arbitrary point.
-Once back at the program main (creator), coroutines @ping@ and @pong@ are deallocated, runnning any destructors for objects within the coroutine and possibly deallocating any coroutine stacks for non-terminated coroutines, where stack deallocation implies stack unwinding to find destructors for allocated objects on the stack.
-Hence, the \CFA termination semantics for the generator and coroutine ensure correct deallocation semnatics, regardless of the coroutine's state (terminated or active), like any other aggregate object.
+Once back at the program main (creator), coroutines @ping@ and @pong@ are deallocated, running any destructors for objects within the coroutine and possibly deallocating any coroutine stacks for non-terminated coroutines, where stack deallocation implies stack unwinding to find destructors for allocated objects on the stack.
+Hence, the \CFA termination semantics for the generator and coroutine ensure correct deallocation semantics, regardless of the coroutine's state (terminated or active), like any other aggregate object.
 
 
@@ -1445,5 +1449,5 @@
 
 A significant implementation challenge for generators and coroutines (and threads in Section~\ref{s:threads}) is adding extra fields to the custom types and related functions, \eg inserting code after/before the coroutine constructor/destructor and @main@ to create/initialize/de-initialize/destroy any extra fields, \eg the coroutine stack.
-There are several solutions to these problem, which follow from the object-oriented flavour of adopting custom types.
+There are several solutions to this problem, which follow from the object-oriented flavour of adopting custom types.
 
 For object-oriented languages, inheritance is used to provide extra fields and code via explicit inheritance:
@@ -1480,5 +1484,5 @@
 forall( `dtype` T | is_coroutine(T) ) void $suspend$( T & ), resume( T & );
 \end{cfa}
-Note, copying generators, coroutines, and threads is undefined because muliple objects cannot execute on a shared stack and stack copying does not work in unmanaged languages (no garbage collection), like C, because the stack may contain pointers to objects within it that require updating for the copy.
+Note, copying generators, coroutines, and threads is undefined because multiple objects cannot execute on a shared stack and stack copying does not work in unmanaged languages (no garbage collection), like C, because the stack may contain pointers to objects within it that require updating for the copy.
 The \CFA @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference or pointer.
 The function definitions ensure there is a statically typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to read the coroutine descriptor from its handle.
@@ -1625,5 +1629,5 @@
 	MyThread * team = factory( 10 );
 	// concurrency
-	`delete( team );` $\C{// deallocate heap-based threads, implicit joins before destruction}\CRT$
+	`adelete( team );` $\C{// deallocate heap-based threads, implicit joins before destruction}\CRT$
 }
 \end{cfa}
@@ -1702,5 +1706,5 @@
 Unrestricted nondeterminism is meaningless as there is no way to know when a result is completed and safe to access.
 To produce meaningful execution requires clawing back some determinism using mutual exclusion and synchronization, where mutual exclusion provides access control for threads using shared data, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.
-The shared data protected by mutual exlusion is called a \newterm{critical section}~\cite{Dijkstra65}, and the protection can be simple, only 1 thread, or complex, only N kinds of threads, \eg group~\cite{Joung00} or readers/writer~\cite{Courtois71} problems.
+The shared data protected by mutual exclusion is called a \newterm{critical section}~\cite{Dijkstra65}, and the protection can be simple, only 1 thread, or complex, only N kinds of threads, \eg group~\cite{Joung00} or readers/writer~\cite{Courtois71} problems.
 Without synchronization control in a critical section, an arriving thread can barge ahead of preexisting waiter threads resulting in short/long-term starvation, staleness and freshness problems, and incorrect transfer of data.
 Preventing or detecting barging is a challenge with low-level locks, but made easier through higher-level constructs.
@@ -1826,5 +1830,5 @@
 \end{cquote}
 The @dtype@ property prevents \emph{implicit} copy operations and the @is_monitor@ trait provides no \emph{explicit} copy operations, so monitors must be passed by reference or pointer.
-Similarly, the function definitions ensures there is a mechanism to read the monitor descriptor from its handle, and a special destructor to prevent deallocation if a thread is using the shared data.
+Similarly, the function definitions ensure there is a mechanism to read the monitor descriptor from its handle, and a special destructor to prevent deallocation if a thread is using the shared data.
 The custom monitor type also inserts any locks needed to implement the mutual exclusion semantics.
 \CFA relies heavily on traits as an abstraction mechanism, so the @mutex@ qualifier prevents coincidentally matching of a monitor trait with a type that is not a monitor, similar to coincidental inheritance where a shape and playing card can both be drawable.
@@ -2479,10 +2483,10 @@
 
 One scheduling solution is for the signaller S to keep ownership of all locks until the last lock is ready to be transferred, because this semantics fits most closely to the behaviour of single-monitor scheduling.
-However, this solution is inefficient if W2 waited first and can be immediate passed @m2@ when released, while S retains @m1@ until completion of the outer mutex statement.
+However, this solution is inefficient if W2 waited first and immediate passed @m2@ when released, while S retains @m1@ until completion of the outer mutex statement.
 If W1 waited first, the signaller must retain @m1@ amd @m2@ until completion of the outer mutex statement and then pass both to W1.
 % Furthermore, there is an execution sequence where the signaller always finds waiter W2, and hence, waiter W1 starves.
-To support this efficient semantics and prevent barging, the implementation maintains a list of monitors acquired for each blocked thread.
+To support these efficient semantics and prevent barging, the implementation maintains a list of monitors acquired for each blocked thread.
 When a signaller exits or waits in a mutex function or statement, the front waiter on urgent is unblocked if all its monitors are released.
-Implementing a fast subset check for the necessary released monitors is important and discussed in the following sections.
+Implementing a fast subset check for the necessarily released monitors is important and discussed in the following sections.
 % The benefit is encapsulating complexity into only two actions: passing monitors to the next owner when they should be released and conditionally waking threads if all conditions are met.
 
@@ -2543,5 +2547,5 @@
 Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array.
 Then, the same implementation approach used for the urgent stack (see Section~\ref{s:Scheduling}) is used for the calling queue.
-Each caller has a list of monitors acquired, and the @waitfor@ statement performs a short linear search matching functions in the @waitfor@ list with called functions, and then verifying the associated mutex locks can be transfers.
+Each caller has a list of monitors acquired, and the @waitfor@ statement performs a short linear search matching functions in the @waitfor@ list with called functions, and then verifying the associated mutex locks can be transferred.
 
 
@@ -2778,9 +2782,9 @@
 The \CFA program @main@ uses the call/return paradigm to directly communicate with the @GoRtn main@, whereas Go switches to the unbuffered channel paradigm to indirectly communicate with the goroutine.
 Communication by multiple threads is safe for the @gortn@ thread via mutex calls in \CFA or channel assignment in Go.
-The different between call and channel send occurs for buffered channels making the send asynchronous.
-In \CFA, asynchronous call and multiple buffers is provided using an administrator and worker threads~\cite{Gentleman81} and/or futures (not discussed).
+The difference between call and channel send occurs for buffered channels making the send asynchronous.
+In \CFA, asynchronous call and multiple buffers are provided using an administrator and worker threads~\cite{Gentleman81} and/or futures (not discussed).
 
 Figure~\ref{f:DirectCommunicationDatingService} shows the dating-service problem in Figure~\ref{f:DatingServiceMonitor} extended from indirect monitor communication to direct thread communication.
-When converting a monitor to a thread (server), the coding pattern is to move as much code as possible from the accepted functions into the thread main so it does an much work as possible.
+When converting a monitor to a thread (server), the coding pattern is to move as much code as possible from the accepted functions into the thread main so it does as much work as possible.
 Notice, the dating server is postponing requests for an unspecified time while continuing to accept new requests.
 For complex servers, \eg web-servers, there can be hundreds of lines of code in the thread main and safe interaction with clients can be complex.
@@ -2790,5 +2794,5 @@
 
 For completeness and efficiency, \CFA provides a standard set of low-level locks: recursive mutex, condition, semaphore, barrier, \etc, and atomic instructions: @fetchAssign@, @fetchAdd@, @testSet@, @compareSet@, \etc.
-Some of these low-level mechanism are used to build the \CFA runtime, but we always advocate using high-level mechanisms whenever possible.
+Some of these low-level mechanisms are used to build the \CFA runtime, but we always advocate using high-level mechanisms whenever possible.
 
 
@@ -2980,5 +2984,5 @@
 
 To test the performance of the \CFA runtime, a series of microbenchmarks are used to compare \CFA with pthreads, Java 11.0.6, Go 1.12.6, Rust 1.37.0, Python 3.7.6, Node.js 12.14.1, and \uC 7.0.0.
-For comparison, the package must be multi-processor (M:N), which excludes libdil and /libmil~\cite{libdill} (M:1)), and use a shared-memory programming model, \eg not message passing.
+For comparison, the package must be multi-processor (M:N), which excludes libdil and libmil~\cite{libdill} (M:1)), and use a shared-memory programming model, \eg not message passing.
 The benchmark computer is an AMD Opteron\texttrademark\ 6380 NUMA 64-core, 8 socket, 2.5 GHz processor, running Ubuntu 16.04.6 LTS, and pthreads/\CFA/\uC are compiled with gcc 9.2.1.
 
@@ -3049,5 +3053,5 @@
 Figure~\ref{f:schedint} shows the code for \CFA, with results in Table~\ref{t:schedint}.
 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects.
-Java scheduling is significantly greater because the benchmark explicitly creates multiple thread in order to prevent the JIT from making the program sequential, \ie removing all locking.
+Java scheduling is significantly greater because the benchmark explicitly creates multiple threads in order to prevent the JIT from making the program sequential, \ie removing all locking.
 
 \begin{multicols}{2}
@@ -3308,5 +3312,5 @@
 This type of concurrency can be achieved both at the language level and at the library level.
 The canonical example of implicit concurrency is concurrent nested @for@ loops, which are amenable to divide and conquer algorithms~\cite{uC++book}.
-The \CFA language features should make it possible to develop a reasonable number of implicit concurrency mechanism to solve basic HPC data-concurrency problems.
+The \CFA language features should make it possible to develop a reasonable number of implicit concurrency mechanisms to solve basic HPC data-concurrency problems.
 However, implicit concurrency is a restrictive solution with significant limitations, so it can never replace explicit concurrent programming.
 
Index: doc/papers/concurrency/figures/RunTimeStructure.fig
===================================================================
--- doc/papers/concurrency/figures/RunTimeStructure.fig	(revision b54118ad23651a1d9e49d473d666db9fc0d8cd48)
+++ doc/papers/concurrency/figures/RunTimeStructure.fig	(revision bdce852941b067571ca3876cdcd3c249e5a8aae1)
@@ -8,164 +8,164 @@
 -2
 1200 2
-6 3855 2775 4155 2925
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 3930 2850 30 30 3930 2850 3960 2880
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4035 2850 30 30 4035 2850 4065 2880
+6 3255 2475 3555 2625
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 3330 2550 30 30 3330 2550 3360 2580
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 3435 2550 30 30 3435 2550 3465 2580
 -6
-6 4755 3525 5055 3675
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4830 3600 30 30 4830 3600 4860 3630
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4935 3600 30 30 4935 3600 4965 3630
+6 4155 3225 4455 3375
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4230 3300 30 30 4230 3300 4260 3330
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4335 3300 30 30 4335 3300 4365 3330
 -6
-6 4650 2775 4950 2925
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4725 2850 15 15 4725 2850 4740 2865
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4800 2850 15 15 4800 2850 4815 2865
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4875 2850 15 15 4875 2850 4890 2865
+6 4050 2475 4350 2625
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4125 2550 15 15 4125 2550 4140 2565
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4200 2550 15 15 4200 2550 4215 2565
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4275 2550 15 15 4275 2550 4290 2565
 -6
-6 3225 2400 3525 2550
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3300 2475 15 15 3300 2475 3315 2490
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3375 2475 15 15 3375 2475 3390 2490
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3450 2475 15 15 3450 2475 3465 2490
+6 2625 2100 2925 2250
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 2700 2175 15 15 2700 2175 2715 2190
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 2775 2175 15 15 2775 2175 2790 2190
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 2850 2175 15 15 2850 2175 2865 2190
 -6
-6 5475 3450 5625 3750
-1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 5550 3525 15 15 5550 3525 5535 3540
-1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 5550 3600 15 15 5550 3600 5535 3615
-1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 5550 3675 15 15 5550 3675 5535 3690
+6 4875 3150 5025 3450
+1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 4950 3225 15 15 4950 3225 4935 3240
+1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 4950 3300 15 15 4950 3300 4935 3315
+1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 4950 3375 15 15 4950 3375 4935 3390
 -6
-6 4275 3525 4575 3675
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4350 3600 15 15 4350 3600 4365 3615
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4425 3600 15 15 4425 3600 4440 3615
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4500 3600 15 15 4500 3600 4515 3615
+6 3675 3225 3975 3375
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3750 3300 15 15 3750 3300 3765 3315
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3825 3300 15 15 3825 3300 3840 3315
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3900 3300 15 15 3900 3300 3915 3315
 -6
-6 3225 4125 4650 4425
-6 4350 4200 4650 4350
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4425 4275 15 15 4425 4275 4440 4290
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4500 4275 15 15 4500 4275 4515 4290
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4575 4275 15 15 4575 4275 4590 4290
+6 2625 3825 4050 4125
+6 3750 3900 4050 4050
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3825 3975 15 15 3825 3975 3840 3990
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3900 3975 15 15 3900 3975 3915 3990
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3975 3975 15 15 3975 3975 3990 3990
 -6
-1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3450 4275 225 150 3450 4275 3675 4425
-1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4050 4275 225 150 4050 4275 4275 4425
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2850 3975 225 150 2850 3975 3075 4125
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3450 3975 225 150 3450 3975 3675 4125
 -6
-6 6675 4125 7500 4425
-6 7200 4200 7500 4350
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 7275 4275 15 15 7275 4275 7290 4290
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 7350 4275 15 15 7350 4275 7365 4290
-1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 7425 4275 15 15 7425 4275 7440 4290
+6 6075 3825 6900 4125
+6 6600 3900 6900 4050
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 6675 3975 15 15 6675 3975 6690 3990
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 6750 3975 15 15 6750 3975 6765 3990
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 6825 3975 15 15 6825 3975 6840 3990
 -6
-1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 6900 4275 225 150 6900 4275 7125 4425
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 6300 3975 225 150 6300 3975 6525 4125
 -6
-6 6675 3525 8025 3975
+6 6075 3225 7425 3675
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 6675 3750 6975 3750
+	 6075 3450 6375 3450
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 7125 3750 7350 3750
+	 6525 3450 6750 3450
 2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
-	 7800 3975 7800 3525 7350 3525 7350 3975 7800 3975
+	 7200 3675 7200 3225 6750 3225 6750 3675 7200 3675
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 7800 3750 8025 3750
+	 7200 3450 7425 3450
 -6
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5550 2625 150 150 5550 2625 5700 2625
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5550 3225 150 150 5550 3225 5700 3225
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5550 3975 150 150 5550 3975 5700 3975
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3525 2850 150 150 3525 2850 3675 2850
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4200 2475 150 150 4200 2475 4350 2475
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4425 2850 150 150 4425 2850 4575 2850
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4650 2475 150 150 4650 2475 4800 2475
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3975 3600 150 150 3975 3600 4125 3600
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 3525 3600 30 30 3525 3600 3555 3630
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3750 2475 150 150 3750 2475 3900 2625
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4875 3600 150 150 4875 3600 5025 3750
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3975 2850 150 150 3975 2850 4125 2850
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 7200 2775 150 150 7200 2775 7350 2775
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 2250 4830 30 30 2250 4830 2280 4860
-1 3 0 1 0 0 0 0 0 0.000 1 0.0000 7200 2775 30 30 7200 2775 7230 2805
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3525 3600 150 150 3525 3600 3675 3600
-1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3875 4800 100 100 3875 4800 3975 4800
-1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4650 4800 150 75 4650 4800 4800 4875
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 2325 150 150 4950 2325 5100 2325
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 2925 150 150 4950 2925 5100 2925
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 3675 150 150 4950 3675 5100 3675
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2925 2550 150 150 2925 2550 3075 2550
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3600 2175 150 150 3600 2175 3750 2175
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3825 2550 150 150 3825 2550 3975 2550
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4050 2175 150 150 4050 2175 4200 2175
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3375 3300 150 150 3375 3300 3525 3300
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 2925 3300 30 30 2925 3300 2955 3330
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3150 2175 150 150 3150 2175 3300 2325
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4275 3300 150 150 4275 3300 4425 3450
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3375 2550 150 150 3375 2550 3525 2550
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 6600 2475 150 150 6600 2475 6750 2475
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 1650 4530 30 30 1650 4530 1680 4560
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 6600 2475 30 30 6600 2475 6630 2505
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2925 3300 150 150 2925 3300 3075 3300
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3275 4500 100 100 3275 4500 3375 4500
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4050 4500 150 75 4050 4500 4200 4575
 2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
-	 2400 4200 2400 3750 1950 3750 1950 4200 2400 4200
+	 1800 3900 1800 3450 1350 3450 1350 3900 1800 3900
 2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5
-	 6300 4500 6300 1800 3000 1800 3000 4500 6300 4500
+	 5700 4200 5700 1500 2400 1500 2400 4200 5700 4200
 2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
-	 5775 2850 5775 2400 5325 2400 5325 2850 5775 2850
+	 5175 2550 5175 2100 4725 2100 4725 2550 5175 2550
 2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
-	 5775 4200 5775 3750 5325 3750 5325 4200 5775 4200
+	 5175 3900 5175 3450 4725 3450 4725 3900 5175 3900
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5175 3975 5325 3975
+	 4575 3675 4725 3675
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5175 3225 5325 3225
+	 4575 2925 4725 2925
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5175 2625 5325 2625
+	 4575 2325 4725 2325
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5775 3975 5925 3975
+	 5175 3675 5325 3675
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5775 3225 5925 3225
+	 5175 2925 5325 2925
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5775 2625 5925 2625
+	 5175 2325 5325 2325
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
-	 5175 3975 5175 2625
+	 4575 3675 4575 2325
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5925 3975 5925 2025
+	 5325 3675 5325 1725
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5925 3750 6225 3750
+	 5325 3450 5625 3450
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 3450 2625 3225 2625
+	 2850 2325 2625 2325
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 3
 	1 1 1.00 45.00 90.00
-	 5925 2025 4200 2025 4200 2250
+	 5325 1725 3600 1725 3600 1950
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
-	 3225 2625 3225 3600
+	 2625 2325 2625 3300
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 3075 3600 3375 3600
+	 2475 3300 2775 3300
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 3675 3600 3825 3600
+	 3075 3300 3225 3300
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 4125 3600 4275 3600
+	 3525 3300 3675 3300
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 4575 3600 4725 3600
+	 3975 3300 4125 3300
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 5025 3600 5175 3600
+	 4425 3300 4575 3300
 2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
-	 5775 3450 5775 3000 5325 3000 5325 3450 5775 3450
+	 5175 3150 5175 2700 4725 2700 4725 3150 5175 3150
 2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5
-	 8100 4500 8100 1800 6600 1800 6600 4500 8100 4500
+	 7500 4200 7500 1500 6000 1500 6000 4200 7500 4200
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
 	1 1 1.00 45.00 90.00
-	 7050 2775 6825 2775
+	 6450 2475 6225 2475
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
-	 6825 2775 6825 3750
+	 6225 2475 6225 3450
 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 4
 	1 1 1.00 45.00 90.00
-	 7875 3750 7875 2325 7200 2325 7200 2550
+	 7275 3450 7275 2025 6600 2025 6600 2250
 2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
-	 5850 4950 5850 4725 5625 4725 5625 4950 5850 4950
+	 5250 4650 5250 4425 5025 4425 5025 4650 5250 4650
 2 2 1 1 -1 -1 0 0 -1 3.000 0 0 0 0 0 5
-	 6975 4950 6750 4950 6750 4725 6975 4725 6975 4950
-4 1 -1 0 0 0 10 0.0000 2 105 720 5550 4425 Processors\001
-4 1 -1 0 0 0 10 0.0000 2 120 1005 4200 3225 Blocked Tasks\001
-4 1 -1 0 0 0 10 0.0000 2 150 870 4200 3975 Ready Tasks\001
-4 1 -1 0 0 0 10 0.0000 2 135 1095 7350 1725 Other Cluster(s)\001
-4 1 -1 0 0 0 10 0.0000 2 105 840 4650 1725 User Cluster\001
-4 1 -1 0 0 0 10 0.0000 2 150 615 2175 3675 Manager\001
-4 1 -1 0 0 0 10 0.0000 2 105 990 2175 3525 Discrete-event\001
-4 1 -1 0 0 0 10 0.0000 2 135 795 2175 4350 preemption\001
-4 0 -1 0 0 0 10 0.0000 2 150 1290 2325 4875 genrator/coroutine\001
-4 0 -1 0 0 0 10 0.0000 2 120 270 4050 4875 task\001
-4 0 -1 0 0 0 10 0.0000 2 105 450 7050 4875 cluster\001
-4 0 -1 0 0 0 10 0.0000 2 105 660 5925 4875 processor\001
-4 0 -1 0 0 0 10 0.0000 2 105 555 4875 4875 monitor\001
+	 6375 4650 6150 4650 6150 4425 6375 4425 6375 4650
+4 1 -1 0 0 0 10 0.0000 2 105 720 4950 4125 Processors\001
+4 1 -1 0 0 0 10 0.0000 2 120 1005 3600 2925 Blocked Tasks\001
+4 1 -1 0 0 0 10 0.0000 2 150 870 3600 3675 Ready Tasks\001
+4 1 -1 0 0 0 10 0.0000 2 135 1095 6750 1425 Other Cluster(s)\001
+4 1 -1 0 0 0 10 0.0000 2 105 840 4050 1425 User Cluster\001
+4 1 -1 0 0 0 10 0.0000 2 150 615 1575 3375 Manager\001
+4 1 -1 0 0 0 10 0.0000 2 105 990 1575 3225 Discrete-event\001
+4 1 -1 0 0 0 10 0.0000 2 135 795 1575 4050 preemption\001
+4 0 -1 0 0 0 10 0.0000 2 150 1365 1725 4575 generator/coroutine\001
+4 0 -1 0 0 0 10 0.0000 2 120 270 3450 4575 task\001
+4 0 -1 0 0 0 10 0.0000 2 105 450 6450 4575 cluster\001
+4 0 -1 0 0 0 10 0.0000 2 105 660 5325 4575 processor\001
+4 0 -1 0 0 0 10 0.0000 2 105 555 4275 4575 monitor\001
Index: doc/papers/concurrency/mail2
===================================================================
--- doc/papers/concurrency/mail2	(revision b54118ad23651a1d9e49d473d666db9fc0d8cd48)
+++ doc/papers/concurrency/mail2	(revision bdce852941b067571ca3876cdcd3c249e5a8aae1)
@@ -934,2 +934,27 @@
 Page 18, line 17: is using
 
+
+
+Date: Tue, 16 Jun 2020 13:45:03 +0000
+From: Aaron Thomas <onbehalfof@manuscriptcentral.com>
+Reply-To: speoffice@wiley.com
+To: tdelisle@uwaterloo.ca, pabuhr@uwaterloo.ca
+Subject: SPE-19-0219.R2 successfully submitted
+
+16-Jun-2020
+
+Dear Dr Buhr,
+
+Your manuscript entitled "Advanced Control-flow and Concurrency in Cforall" has been successfully submitted online and is presently being given full consideration for publication in Software: Practice and Experience.
+
+Your manuscript number is SPE-19-0219.R2.  Please mention this number in all future correspondence regarding this submission.
+
+You can view the status of your manuscript at any time by checking your Author Center after logging into https://mc.manuscriptcentral.com/spe.  If you have difficulty using this site, please click the 'Get Help Now' link at the top right corner of the site.
+
+
+Thank you for submitting your manuscript to Software: Practice and Experience.
+
+Sincerely,
+
+Software: Practice and Experience Editorial Office
+
Index: doc/papers/concurrency/response2
===================================================================
--- doc/papers/concurrency/response2	(revision b54118ad23651a1d9e49d473d666db9fc0d8cd48)
+++ doc/papers/concurrency/response2	(revision bdce852941b067571ca3876cdcd3c249e5a8aae1)
@@ -27,24 +27,9 @@
       thread creation and destruction?
 
-The best description of Smalltalk concurrency I can find is in J. Hunt,
-Smalltalk and Object Orientation, Springer-Verlag London Limited, 1997, Chapter
-31 Concurrency in Smalltalk. It states on page 332:
-
-  For a process to be spawned from the current process there must be some way
-  of creating a new process. This is done using one of four messages to a
-  block. These messages are:
-
-    aBlock fork: This creates and schedules a process which will execute the
-    block. The priority of this process is inherited from the parent process.
-    ...
-
-  The Semaphore class provides facilities for achieving simple synchronization,
-  it is simple because it only allows for two forms of communication signal and
-  wait.
-
-Hence, "aBlock fork" creates, "Semaphore" blocks/unblocks (as does message send
-to an aBlock object), and garbage collection of an aBlock joins with its
-thread. The fact that a programmer *implicitly* does "fork", "block"/"unblock",
-and "join", does not change their fundamental requirement.
+Fixed, changed sentence to:
+
+ A programmer needs mechanisms to create, block and unblock, and join with a
+ thread, even if these basic mechanisms are supplied indirectly through
+ high-level features.
 
 
@@ -103,5 +88,5 @@
 storage, too, which is a single instance across all generator instances of that
 type, as for static storage in an object type. All the kinds of storage are
-at play with semantics that is virtually the same as in other languages.
+at play with semantics that is the same as in other languages.
 
 
@@ -118,11 +103,9 @@
 Just-in-Time Compiler. We modified our test programs based on his advise, and
 he validated our programs as correctly measuring the specified language
-feature. Hence, we have taken into account all issues related to performing
-benchmarks in JITTED languages.  Dave's help is recognized in the
-Acknowledgment section. Also, all the benchmark programs are publicly available
-for independent verification.
-
-Similarly, we verified our Node.js programs with Gregor Richards, an expert in
-just-in-time compilation for dynamic typing.
+feature. Dave's help is recognized in the Acknowledgment section.  Similarly,
+we verified our Node.js programs with Gregor Richards, an expert in
+just-in-time compilation for dynamic typing.  Hence, we have taken into account
+all issues related to performing benchmarks in JITTED languages.  Also, all the
+benchmark programs are publicly available for independent verification.
 
 
@@ -155,32 +138,9 @@
 Since many aspects of Cforall are not OO, the rest of the paper *does* depend
 on Cforall being identified as non-OO, otherwise readers would have
-significantly different expectations for the design. We believe your definition
-of OO is too broad, such as including C. Just because a programming language
-can support aspects of the OO programming style, does not make it OO. (Just
-because a duck can swim does not make it a fish.)
-
-Our definition of non-OO follows directly from the Wikipedia entry:
-
-  Object-oriented programming (OOP) is a programming paradigm based on the
-  concept of "objects", which can contain data, in the form of fields (often
-  known as attributes or properties), and code, in the form of procedures
-  (often known as methods). A feature of objects is an object's procedures that
-  can access and often modify the data fields of the object with which they are
-  associated (objects have a notion of "this" or "self").
-  https://en.wikipedia.org/wiki/Object-oriented_programming
-
-Cforall fails this definition as code cannot appear in an "object" and there is
-no implicit receiver. As well, Cforall, Go, and Rust do not have nominal
-inheritance and they not considered OO languages, e.g.:
-
- "**Is Go an object-oriented language?** Yes and no. Although Go has types and
- methods and allows an object-oriented style of programming, there is no type
- hierarchy. The concept of "interface" in Go provides a different approach
- that we believe is easy to use and in some ways more general. There are also
- ways to embed types in other types to provide something analogous-but not
- identical-to subclassing. Moreover, methods in Go are more general than in
- C++ or Java: they can be defined for any sort of data, even built-in types
- such as plain, "unboxed" integers. They are not restricted to structs (classes).
- https://golang.org/doc/faq#Is_Go_an_object-oriented_language
+significantly different expectations for the design. We did not mean to suggest
+that languages that support function pointers with structures support an OO
+style. We revised the footnote to avoid this interpretation. Finally, Golang
+does not identify itself as an OO language.
+https://golang.org/doc/faq#Is_Go_an_object-oriented_language
 
 
@@ -219,5 +179,5 @@
 Whereas, the coroutine only needs the array allocated when needed. Now a
 coroutine has a stack which occupies storage, but the maximum stack size only
-needs to be the call chain allocating the most storage, where as the generator
+needs to be the call chain allocating the most storage, whereas the generator
 has a maximum size of all variable that could be created.
 
@@ -314,8 +274,8 @@
 
  \item[\newterm{execution state}:] is the state information needed by a
- control-flow feature to initialize, manage compute data and execution
- location(s), and de-initialize, \eg calling a function initializes a stack
- frame including contained objects with constructors, manages local data in
- blocks and return locations during calls, and de-initializes the frame by
+ control-flow feature to initialize and manage both compute data and execution
+ location(s), and de-initialize. For example, calling a function initializes a
+ stack frame including contained objects with constructors, manages local data
+ in blocks and return locations during calls, and de-initializes the frame by
  running any object destructors and management operations.
 
@@ -330,4 +290,5 @@
 appropriate word?
 
+
     "computation only" as opposed to what?
 
@@ -335,4 +296,5 @@
 i.e., the operation starts with everything it needs to compute its result and
 runs to completion, blocking only when it is done and returns its result.
+Computation only occurs in "embarrassingly parallel" problems.
 
 
@@ -596,5 +558,5 @@
     * coroutines/generators/threads: here there is some discussion, but it can
       be improved.
-    * interal/external scheduling: I didn't find any direct comparison between
+    * internal/external scheduling: I didn't find any direct comparison between
       these features, except by way of example.
 
@@ -672,5 +634,6 @@
   }
 
-Additonal text has been added to the start of Section 3.2 address this comment.
+Additional text has been added to the start of Section 3.2 addressing this
+comment.
 
 
@@ -787,8 +750,8 @@
 and its shorthand form (not shown in the paper)
 
-  waitfor( remove, remove2 : t );
-
-A call to one these remove functions satisfies the waitfor (exact selection
-details are discussed in Section 6.4).
+  waitfor( remove, remove2 : buffer );
+
+A call to either of these remove functions satisfies the waitfor (exact
+selection details are discussed in Section 6.4).
 
 
@@ -835,4 +798,5 @@
 signal or signal_block.
 
+
     I believe that one difference between the Go program and the Cforall
     equivalent is that the Goroutine has an associated queue, so that
@@ -842,4 +806,5 @@
 Actually, the buffer length is 0 for the Cforall call and the Go unbuffered
 send so both are synchronous communication.
+
 
     I think this should be stated explicitly. (Presumably, one could modify the
@@ -985,4 +950,5 @@
       sout | "join";
   }
+
   int main() {
       T t[3]; // threads start and delay