Index: doc/papers/concurrency/Paper.tex
===================================================================
--- doc/papers/concurrency/Paper.tex	(revision 3f8ec70a8c7b25f83becf23d1f9eb88ff70e0aba)
+++ doc/papers/concurrency/Paper.tex	(revision a9276626ec4caba86cf7ab2282d94188f1f65e6d)
@@ -215,5 +215,8 @@
 {}
 \lstnewenvironment{Go}[1][]
-{\lstset{#1}}
+{\lstset{language=go,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}}
+{}
+\lstnewenvironment{python}[1][]
+{\lstset{language=python,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}}
 {}
 
@@ -267,8 +270,7 @@
 backwards-compatible extension of the C programming language~\cite{Moss18}.
 Within the \CFA framework, new control-flow features are created from scratch.
-ISO \Celeven defines only a subset of the \CFA extensions.
-The overlapping features are concurrency~\cite[\S~7.26]{C11};
-however, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
-Furthermore, \Celeven and pthreads concurrency is basic, based on thread fork/join in a function and a few locks, which is low-level and error prone;
+ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}.
+However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
+Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone;
 no high-level language concurrency features are defined.
 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach.
@@ -293,5 +295,5 @@
 While having a sufficient memory-model allows sound libraries to be constructed, writing these libraries can quickly become awkward and error prone, and using these low-level libraries has the same issues.
 Essentially, using low-level explicit locks is the concurrent equivalent of assembler programming.
-Just as most assembler programming is replaced with programming in a high-level language, explicit locks can be replaced with high-level concurrency constructs in a programming language.
+Just as most assembler programming is replaced with high-level programming, explicit locks can be replaced with high-level concurrency in a programming language.
 Then the goal is for the compiler to check for correct usage and follow any complex coding conventions implicitly.
 The drawback is that language constructs may preclude certain specialized techniques, therefore introducing inefficiency or inhibiting concurrency.
@@ -309,11 +311,17 @@
 however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.}
 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
-
 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{
 \CFA exception handling will be presented in a separate paper.
 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
 } and coroutines) and concurrency.
-We show the \CFA language extensions are demonstrably better than those proposed for \Celeven, \CC and other concurrent, imperative programming languages, and that the \CFA runtime is competitive with other similar extensions.
-The contributions of this work are:
+
+Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory.
+As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten.
+While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
+Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base.
+\CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems.
+
+We present comparative examples so the reader can judge if the \CFA control-flow extensions are equivalent or better than those in or proposed for \Celeven, \CC and other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
+The detailed contributions of this work are:
 \begin{itemize}
 \item
@@ -620,7 +628,7 @@
 
 
-\section{Coroutines: A Stepping Stone}\label{coroutine}
-
-Advanced controlWhile the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves).
+\section{Coroutines: Stepping Stone}
+\label{coroutine}
+
 Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed.
 Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension.
@@ -648,11 +656,11 @@
 \begin{lrbox}{\myboxA}
 \begin{cfa}[aboveskip=0pt,belowskip=0pt]
-`int f1, f2, state = 1;`   // single global variables
+`int fn1, fn2, state = 1;`   // single global variables
 int fib() {
 	int fn;
 	`switch ( state )` {  // explicit execution state
-	  case 1: fn = 0;  f1 = fn;  state = 2;  break;
-	  case 2: fn = 1;  f2 = f1;  f1 = fn;  state = 3;  break;
-	  case 3: fn = f1 + f2;  f2 = f1;  f1 = fn;  break;
+	  case 1: fn = 0;  fn1 = fn;  state = 2;  break;
+	  case 2: fn = 1;  fn2 = fn1;  fn1 = fn;  state = 3;  break;
+	  case 3: fn = fn1 + fn2;  fn2 = fn1;  fn1 = fn;  break;
 	}
 	return fn;
@@ -671,10 +679,10 @@
 \begin{cfa}[aboveskip=0pt,belowskip=0pt]
 #define FIB_INIT `{ 0, 1 }`
-typedef struct { int f2, f1; } Fib;
+typedef struct { int fn2, fn1; } Fib;
 int fib( Fib * f ) {
 
-	int ret = f->f2;
-	int fn = f->f1 + f->f2;
-	f->f2 = f->f1; f->f1 = fn;
+	int ret = f->fn2;
+	int fn = f->fn1 + f->fn2;
+	f->fn2 = f->fn1; f->fn1 = fn;
 
 	return ret;
@@ -683,5 +691,5 @@
 	Fib f1 = FIB_INIT, f2 = FIB_INIT;
 	for ( int i = 0; i < 10; i += 1 ) {
-		printf( "%d %d\n", fib( &f1 ), fib( &f2 ) );
+		printf( "%d %d\n", fib( &fn1 ), fib( &f2 ) );
 	}
 }
@@ -689,8 +697,8 @@
 \end{lrbox}
 
-\subfloat[3 States: global variables]{\label{f:GlobalVariables}\usebox\myboxA}
+\subfloat[3 states: global variables]{\label{f:GlobalVariables}\usebox\myboxA}
 \qquad
-\subfloat[1 State: external variables]{\label{f:ExternalState}\usebox\myboxB}
-\caption{C Fibonacci Implementations}
+\subfloat[1 state: encapsulated variables]{\label{f:ExternalState}\usebox\myboxB}
+\caption{C fibonacci fsm}
 \label{f:C-fibonacci}
 
@@ -702,20 +710,14 @@
 `coroutine` Fib { int fn; };
 void main( Fib & fib ) with( fib ) {
-	int f1, f2;
-	fn = 0;  f1 = fn;  `suspend()`;
-	fn = 1;  f2 = f1;  f1 = fn;  `suspend()`;
-	for ( ;; ) {
-		fn = f1 + f2;  f2 = f1;  f1 = fn;  `suspend()`;
-	}
-}
-int next( Fib & fib ) with( fib ) {
-	`resume( fib );`
-	return fn;
-}
+	fn = 0;  int fn1 = fn; `suspend()`;
+	fn = 1;  int fn2 = fn1;  fn1 = fn; `suspend()`;
+	for () {
+		fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `suspend()`; }
+}
+int next( Fib & fib ) with( fib ) { `resume( fib );` return fn; }
 int main() {
 	Fib f1, f2;
-	for ( int i = 1; i <= 10; i += 1 ) {
+	for ( 10 )
 		sout | next( f1 ) | next( f2 );
-	}
 }
 \end{cfa}
@@ -723,30 +725,24 @@
 \newbox\myboxB
 \begin{lrbox}{\myboxB}
-\begin{cfa}[aboveskip=0pt,belowskip=0pt]
-`coroutine` Fib { int ret; };
-void main( Fib & f ) with( fib ) {
-	int fn, f1 = 1, f2 = 0;
-	for ( ;; ) {
-		ret = f2;
-
-		fn = f1 + f2;  f2 = f1;  f1 = fn; `suspend();`
-	}
-}
-int next( Fib & fib ) with( fib ) {
-	`resume( fib );`
-	return ret;
-}
-
-
-
-
-
-
-\end{cfa}
+\begin{python}[aboveskip=0pt,belowskip=0pt]
+
+def Fibonacci():
+	fn = 0;	fn1 = fn; `yield fn`  # suspend
+	fn = 1; fn2 = fn1; fn1 = fn; `yield fn`
+	while True:
+		fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `yield fn`
+
+
+f1 = Fibonacci()
+f2 = Fibonacci()
+for i in range( 10 ):
+	print( `next( f1 )`, `next( f2 )` ) # resume
+
+\end{python}
 \end{lrbox}
-\subfloat[3 States, internal variables]{\label{f:Coroutine3States}\usebox\myboxA}
-\qquad\qquad
-\subfloat[1 State, internal variables]{\label{f:Coroutine1State}\usebox\myboxB}
-\caption{\CFA Coroutine Fibonacci Implementations}
+\subfloat[\CFA]{\label{f:Coroutine3States}\usebox\myboxA}
+\qquad
+\subfloat[Python]{\label{f:Coroutine1State}\usebox\myboxB}
+\caption{Fibonacci input coroutine, 3 states, internal variables}
 \label{f:cfa-fibonacci}
 \end{figure}
@@ -789,34 +785,27 @@
 \begin{lrbox}{\myboxA}
 \begin{cfa}[aboveskip=0pt,belowskip=0pt]
-`coroutine` Format {
-	char ch;   // used for communication
-	int g, b;  // global because used in destructor
+`coroutine` Fmt {
+	char ch;   // communication variables
+	int g, b;   // needed in destructor
 };
-void main( Format & fmt ) with( fmt ) {
-	for ( ;; ) {
-		for ( g = 0; g < 5; g += 1 ) {      // group
-			for ( b = 0; b < 4; b += 1 ) { // block
+void main( Fmt & fmt ) with( fmt ) {
+	for () {
+		for ( g = 0; g < 5; g += 1 ) { // groups
+			for ( b = 0; b < 4; b += 1 ) { // blocks
 				`suspend();`
-				sout | ch;              // separator
-			}
-			sout | "  ";               // separator
-		}
-		sout | nl;
-	}
-}
-void ?{}( Format & fmt ) { `resume( fmt );` }
-void ^?{}( Format & fmt ) with( fmt ) {
-	if ( g != 0 || b != 0 ) sout | nl;
-}
-void format( Format & fmt ) {
-	`resume( fmt );`
-}
+				sout | ch; } // print character
+			sout | "  "; } // block separator
+		sout | nl; }  // group separator
+}
+void ?{}( Fmt & fmt ) { `resume( fmt );` } // prime
+void ^?{}( Fmt & fmt ) with( fmt ) { // destructor
+	if ( g != 0 || b != 0 )	// special case
+		sout | nl; }
+void send( Fmt & fmt, char c ) { fmt.ch = c; `resume( fmt )`; }
 int main() {
-	Format fmt;
-	eof: for ( ;; ) {
-		sin | fmt.ch;
-	  if ( eof( sin ) ) break eof;
-		format( fmt );
-	}
+	Fmt fmt;
+ 	sout | nlOff;	// turn off auto newline
+	for ( 41 )
+		send( fmt, 'a' );
 }
 \end{cfa}
@@ -825,42 +814,35 @@
 \newbox\myboxB
 \begin{lrbox}{\myboxB}
-\begin{cfa}[aboveskip=0pt,belowskip=0pt]
-struct Format {
-	char ch;
-	int g, b;
-};
-void format( struct Format * fmt ) {
-	if ( fmt->ch != -1 ) {      // not EOF ?
-		printf( "%c", fmt->ch );
-		fmt->b += 1;
-		if ( fmt->b == 4 ) {  // block
-			printf( "  " );      // separator
-			fmt->b = 0;
-			fmt->g += 1;
-		}
-		if ( fmt->g == 5 ) {  // group
-			printf( "\n" );     // separator
-			fmt->g = 0;
-		}
-	} else {
-		if ( fmt->g != 0 || fmt->b != 0 ) printf( "\n" );
-	}
-}
-int main() {
-	struct Format fmt = { 0, 0, 0 };
-	for ( ;; ) {
-		scanf( "%c", &fmt.ch );
-	  if ( feof( stdin ) ) break;
-		format( &fmt );
-	}
-	fmt.ch = -1;
-	format( &fmt );
-}
-\end{cfa}
+\begin{python}[aboveskip=0pt,belowskip=0pt]
+
+
+
+def Fmt():
+	try:
+		while True:
+			for g in range( 5 ):
+				for b in range( 4 ):
+
+					print( `(yield)`, end='' )
+				print( '  ', end='' )
+			print()
+
+
+	except GeneratorExit:
+		if g != 0 | b != 0:
+			print()
+
+
+fmt = Fmt()
+`next( fmt )`			 # prime
+for i in range( 41 ):
+	`fmt.send( 'a' );`	# send to yield
+
+\end{python}
 \end{lrbox}
-\subfloat[\CFA Coroutine]{\label{f:CFAFmt}\usebox\myboxA}
+\subfloat[\CFA]{\label{f:CFAFmt}\usebox\myboxA}
 \qquad
-\subfloat[C Linearized]{\label{f:CFmt}\usebox\myboxB}
-\caption{Formatting text into lines of 5 blocks of 4 characters.}
+\subfloat[Python]{\label{f:CFmt}\usebox\myboxB}
+\caption{Output formatting text}
 \label{f:fmt-line}
 \end{figure}
@@ -883,5 +865,5 @@
 void main( Prod & prod ) with( prod ) {
 	// 1st resume starts here
-	for ( int i = 0; i < N; i += 1 ) {
+	for ( i; N ) {
 		int p1 = random( 100 ), p2 = random( 100 );
 		sout | p1 | " " | p2;
@@ -899,5 +881,5 @@
 }
 void start( Prod & prod, int N, Cons &c ) {
-	&prod.c = &c;
+	&prod.c = &c; // reassignable reference
 	prod.[N, receipt] = [N, 0];
 	`resume( prod );`
@@ -914,8 +896,8 @@
 	Prod & p;
 	int p1, p2, status;
-	_Bool done;
+	bool done;
 };
 void ?{}( Cons & cons, Prod & p ) {
-	&cons.p = &p;
+	&cons.p = &p; // reassignable reference
 	cons.[status, done ] = [0, false];
 }
@@ -974,4 +956,59 @@
 The program main restarts after the resume in @start@.
 @start@ returns and the program main terminates.
+
+One \emph{killer} application for a coroutine is device drivers, which at one time caused 70\%-85\% of failures in Windows/Linux~\cite{Swift05}.
+Many device drivers are a finite state-machine parsing a protocol, e.g.:
+\begin{tabbing}
+\ldots STX \= \ldots message \ldots \= ESC \= ETX \= \ldots message \ldots  \= ETX \= 2-byte crc \= \ldots	\kill
+\ldots STX \> \ldots message \ldots \> ESC \> ETX \> \ldots message \ldots  \> ETX \> 2-byte crc \> \ldots
+\end{tabbing}
+where a network message begins with the control character STX and ends with an ETX, followed by a 2-byte cyclic-redundancy check.
+Control characters may appear in a message if preceded by an ESC.
+Because FSMs can be complex and occur frequently in important domains, direct support of the coroutine is crucial in a systems programminglanguage.
+
+\begin{figure}
+\begin{cfa}
+enum Status { CONT, MSG, ESTX, ELNTH, ECRC };
+`coroutine` Driver {
+	Status status;
+	char * msg, byte;
+};
+void ?{}( Driver & d, char * m ) { d.msg = m; }		$\C[3.0in]{// constructor}$
+Status next( Driver & d, char b ) with( d ) {		$\C{// 'with' opens scope}$
+	byte = b; `resume( d );` return status;
+}
+void main( Driver & d ) with( d ) {
+	enum { STX = '\002', ESC = '\033', ETX = '\003', MaxMsg = 64 };
+	unsigned short int crc;							$\C{// error checking}$
+  msg: for () {										$\C{// parse message}$
+		status = CONT;
+		unsigned int lnth = 0, sum = 0;
+		while ( byte != STX ) `suspend();`
+	  emsg: for () {
+			`suspend();`							$\C{// process byte}$
+			choose ( byte ) {						$\C{// switch with default break}$
+			  case STX:
+				status = ESTX; `suspend();` continue msg;
+			  case ETX:
+				break emsg;
+			  case ESC:
+				suspend();
+			} // choose
+			if ( lnth >= MaxMsg ) {					$\C{// buffer full ?}$
+				status = ELNTH; `suspend();` continue msg; }
+			msg[lnth++] = byte;
+			sum += byte;
+		} // for
+		msg[lnth] = '\0';							$\C{// terminate string}\CRT$
+		`suspend();`
+		crc = (unsigned char)byte << 8;	// prevent sign extension for signed char
+		`suspend();`
+		status = (crc | (unsigned char)byte) == sum ? MSG : ECRC;
+		`suspend();`
+	} // for
+}
+\end{cfa}
+\caption{Device driver for simple communication protocol}
+\end{figure}
 
 
