Index: doc/papers/concurrency/Paper.tex
===================================================================
--- doc/papers/concurrency/Paper.tex	(revision 200fcb3c496b08f843f691550554b21d786aad38)
+++ doc/papers/concurrency/Paper.tex	(revision 43a56038cae1657543d903e445c36fbbd2fb698f)
@@ -241,9 +241,8 @@
 
 \abstract[Summary]{
-\CFA is a modern, polymorphic, \emph{non-object-oriented} extension of the C programming language.
-This paper discusses the design of the concurrency and parallelism features in \CFA, and its concurrent runtime-system.
-These features are created from scratch as ISO C lacks concurrency, relying largely on the pthreads library for concurrency.
-Coroutines and lightweight (user) threads are introduced into \CFA;
-as well, monitors are added as a high-level mechanism for mutual exclusion and synchronization.
+\CFA is a modern, polymorphic, \emph{non-object-oriented}, backwards-compatible extension of the C programming language.
+This paper discusses the concurrency and parallelism features in \CFA, and its runtime system.
+These features are created from scratch as ISO C's concurrency is low-level and unimplemented, so C programmers continue to rely on the C pthreads library.
+\CFA provides high-level control-flow mechanisms, like coroutines and lightweight (user) threads, and monitors for mutual exclusion and synchronization.
 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously}.
 All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features.
@@ -251,5 +250,5 @@
 }%
 
-\keywords{concurrency, parallelism, coroutines, threads, monitors, runtime, C, Cforall}
+\keywords{concurrency, parallelism, coroutines, threads, monitors, runtime, C, \CFA (Cforall)}
 
 
@@ -262,4 +261,36 @@
 \section{Introduction}
 
+This paper discusses the design of the high-level concurrency and parallelism features in \CFA, and its runtime.
+\CFA is a modern, polymorphic, \emph{non-object-oriented}, backwards-compatible extension of the C programming language~\cite{Moss18}.
+Within the \CFA framework, new concurrency features were created from scratch.
+While ISO \Celeven defines concurrency~\cite[\S~7.26]{C11}, it is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
+Furthermore, \Celeven and pthreads concurrency is simple: create/join threads in a function and a few locks, which is low-level and error prone;
+no high-level language concurrency features exist.
+Interestingly, 8 years since publication of the \Celeven standard, neither gcc-8 nor clang-8 (most recent versions) support \Celeven @threads.h@, indicating little interest in the C concurrency approach.
+Finally, while the \Celeven standard does not state a concurrent threading-model, the strong association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}.
+
+There has been a re-interest during the past decade in user-level (M:N, green) threading in new and old programming languages, and providing high-level constructs like coroutines, monitors, tasks, and actors for presenting advanced control-flow.
+As multi-core hardware became available in the 1980/90s, both user and kernel threading were examined.
+Kernel threading was chosen, largely because of its simplicity and fit with the simpler operating systems and hardware architectures at the time, which gave it a performance advantage~\cite{Drepper03}.
+Libraries like pthreads were developed for C and the Solaris operating-system switched from user (JDK 1.1~\cite{JDK1.1}) to kernel threads.
+As a result, languages like Java, Scala~\cite{Scala}, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopted the 1:1 kernel-threading model, with a variety of presentation mechanisms.
+From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,BoostThreads}, including putting green threads back into Java~\cite{Quasar}.
+Because advanced control-flow (including exception handling) is pervasive in a programming language and its runtime, these features must be understood by the language (i.e., not added via a library) to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
+
+The main argument for user-level threading is matching the concurrency model with the programming-language style, versus adapting language concurrency to one general approach.
+For example, it is possible to provide coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.
+The user-threading approach facilitates a simpler concurrency construction using thread objects and leveraging sequential patterns versus call-backs and events~\cite{vonBehren03}.
+As well, user-level threads are lighter weight than kernel threads, so there is less restriction on programming styles that encourage large numbers of threads performing smaller work-units to facilitate load balancing by the runtime~\cite{Verch12}.
+User threading is also able to layer multiple concurrency models into a single language (locks, monitors, tasks, actors, futures), so programmers can chose the model that best fits an application.
+Finally, it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
+Performant user-threading implementations (both time and space) are appearing that are competitive with direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
+
+Adding advanced control-flow to \CFA is similar to current and future extensions in \CCeleven through to \CCtwenty.
+However, we contend the \CFA extensions are demonstrably better than those proposed for \CC.
+For example, a unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with all monitor synchronization mechanisms.
+As well, all control-flow features respect the expectations of C programmers, with statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features.
+Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
+
+\begin{comment}
 This paper provides a minimal concurrency \newterm{Application Program Interface} (API) that is simple, efficient and can be used to build other concurrency features.
 While the simplest concurrency system is a thread and a lock, this low-level approach is hard to master.
@@ -281,6 +312,8 @@
 The proposed concurrency API is implemented in a dialect of C, called \CFA (pronounced C-for-all).
 The paper discusses how the language features are added to the \CFA translator with respect to parsing, semantics, and type checking, and the corresponding high-performance runtime-library to implement the concurrent features.
-
-
+\end{comment}
+
+
+\begin{comment}
 \section{\CFA Overview}
 
@@ -551,4 +584,5 @@
 \end{cfa}
 where the return type supplies the type/size of the allocation, which is impossible in most type systems.
+\end{comment}
 
 
