Index: doc/papers/concurrency/Paper.tex
===================================================================
--- doc/papers/concurrency/Paper.tex	(revision d078afd321a2bcee7aa03836fda5824f0ec823b5)
+++ doc/papers/concurrency/Paper.tex	(revision 1ecee819d36fefa357d2da5d98142d3c3727350e)
@@ -228,5 +228,5 @@
 }
 
-\title{\texorpdfstring{Advanced Control-flow in \protect\CFA}{Advanced Control-flow in Cforall}}
+\title{\texorpdfstring{Advanced Control-flow and Concurrency in \protect\CFA}{Advanced Control-flow in Cforall}}
 
 \author[1]{Thierry Delisle}
@@ -242,9 +242,9 @@
 \abstract[Summary]{
 \CFA is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language.
-This paper discusses the advanced control-flow features in \CFA, which include concurrency and parallelism, and its supporting runtime system.
-These features are created from scratch as ISO C's concurrency is low-level and unimplemented, so C programmers continue to rely on the C pthreads library.
-\CFA provides high-level control-flow mechanisms, like coroutines and user-level threads, and monitors for mutual exclusion and synchronization.
-A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with all monitor synchronization mechanisms.
-All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features.
+This paper discusses some advanced control-flow and concurrency/parallelism features in \CFA, along with the supporting runtime.
+These features are created from scratch because they do not exist in ISO C, or are low-level and/or unimplemented, so C programmers continue to rely on library features, like C pthreads.
+\CFA introduces language-level control-flow mechanisms, like coroutines, user-level threading, and monitors for mutual exclusion and synchronization.
+A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with monitor synchronization mechanisms.
+These features also integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers.
 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
 }%
@@ -261,15 +261,17 @@
 \section{Introduction}
 
-This paper discusses the design of advanced, high-level control-flow extensions (especially concurrency and parallelism) in \CFA and its runtime.
+This paper discusses the design of language-level control-flow and concurrency/parallelism extensions in \CFA and its runtime.
 \CFA is a modern, polymorphic, non-object-oriented\footnote{
 \CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance.
 However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.},
 backwards-compatible extension of the C programming language~\cite{Moss18}.
-Within the \CFA framework, new control-flow features were created from scratch.
-ISO \Celeven defines only a subset of the \CFA extensions, and with respect to concurrency~\cite[\S~7.26]{C11}, the features are largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
+Within the \CFA framework, new control-flow features are created from scratch.
+ISO \Celeven defines only a subset of the \CFA extensions.
+The overlapping features are concurrency~\cite[\S~7.26]{C11};
+however, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
 Furthermore, \Celeven and pthreads concurrency is basic, based on thread fork/join in a function and a few locks, which is low-level and error prone;
-no high-level language concurrency features exist.
-Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C concurrency approach.
-Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}.
+no high-level language concurrency features are defined.
+Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach.
+Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}.
 
 In contrast, there has been a renewed interest during the past decade in user-level (M:N, green) threading in old and new programming languages.
@@ -284,13 +286,13 @@
 
 A further effort over the past decade is the development of language memory-models to deal with the conflict between certain language features and compiler/hardware optimizations.
-This issue can be rephrased as some features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
+This issue can be rephrased as: some language features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
 The consequence is that a language must be cognizant of these features and provide sufficient tools to program around any safety issues.
 For example, C created the @volatile@ qualifier to provide correct execution for @setjmp@/@logjmp@ (concurrency came later).
-The simplest solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.
+The common solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.
 
 While having a sufficient memory-model allows sound libraries to be constructed, writing these libraries can quickly become awkward and error prone, and using these low-level libraries has the same issues.
 Essentially, using low-level explicit locks is the concurrent equivalent of assembler programming.
 Just as most assembler programming is replaced with programming in a high-level language, explicit locks can be replaced with high-level concurrency constructs in a programming language.
-The goal is to get the compiler to check for correct usage and follow any complex coding conventions implicitly.
+Then the goal is for the compiler to check for correct usage and follow any complex coding conventions implicitly.
 The drawback is that language constructs may preclude certain specialized techniques, therefore introducing inefficiency or inhibiting concurrency.
 For most concurrent programs, these drawbacks are insignificant in comparison to the speed of composition, and subsequent reliability and maintainability of the high-level concurrent program.
@@ -299,5 +301,5 @@
 As stated, this observation applies to non-concurrent forms of complex control-flow, like exception handling and coroutines.
 
-Adapting the programming language allows matching the control-flow model with the programming-language style, versus adapting to one general (sound) library/paradigm.
+Adapting the programming language to these features also allows matching the control-flow model with the programming-language style, versus adopting one general (sound) library/paradigm.
 For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.
 It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing.
@@ -308,6 +310,9 @@
 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
 
-\CFA embraces language extensions and user-level threading to provide advanced control-flow and concurrency.
-We attempt to show the \CFA extensions and runtime are demonstrably better than those proposed for \CC and other concurrent, imperative programming languages.
+\CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{
+\CFA exception handling will be presented in a separate paper.
+The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
+} and coroutines) and concurrency.
+We show the \CFA language extensions are demonstrably better than those proposed for \Celeven, \CC and other concurrent, imperative programming languages, and that the \CFA runtime is competitive with other similar extensions.
 The contributions of this work are:
 \begin{itemize}
