Index: doc/proposals/closure.md
===================================================================
--- doc/proposals/closure.md	(revision fef8293d2a86e9538cbd5646cd7a188ea65a33c6)
+++ doc/proposals/closure.md	(revision fef8293d2a86e9538cbd5646cd7a188ea65a33c6)
@@ -0,0 +1,150 @@
+## Light-weight Closures for Cforall ##
+
+A core capability of the Cforall type system is the ability to use 
+monomorphic specializations of polymorphic functions seamlessly and 
+invisibly to the user programmer, primarily in type assertions:
+
+    forall(otype T | { T ?+?(T, T); })
+    T double( T x ) { return x + x; }
+
+    forall(otype R | { R double(R); })
+    R quadruple( R y ) { return double( double( y ) ); }
+
+    void fred() {
+        float magic = quadruple( 10.5f );
+    }
+
+In the example above, `R` and `T` are both bound to `float`, and the 
+`double` type assertion on `quadruple` is satisfied by monomorphizing the 
+polymorphic `double` function to `float` (using the builtin `float` addition 
+operator). The existing implementation uses GCC nested functions to 
+implement this monomorphization, as in the following (much simplified) 
+generated code:
+
+    void double(size_t _sizeof_T,  
+                void (*_assign_T)(void*, void*),
+                void (*_add_T)(void*, void*, void*),
+                void *_rtn, 
+                void *x ) {
+        _add_T( _rtn, x, x );
+    }
+
+    void quadruple(size_t _sizeof_R,  
+                   void (*_assign_R)(void*, void*),
+                   void (*_double_R)(void*, void*), 
+                   void *_rtn,
+                   void *y ) {
+        void *_tmp0 = alloca(_sizeof_R);
+        _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) );
+     }
+
+     void fred() {
+         // nested thunk to adapt double() to _double_R()
+         void _thunk0( void *_rtn, void *x ) {
+             double( sizeof(float), _builtin_assign_float, 
+                     _builtin_add_float,
+                     _rtn, x );
+         }
+
+         float magic;
+         float _tmp1 = 10.5f;
+         quadruple( sizeof(float), _builtin_assign_float, _thunk0,
+                    &magic, &_tmp1 );
+     }
+
+Now, in the example above, `_thunk0` could be hoisted to static scope, as 
+`sizeof(float)`, `_builtin_assign_float`, and `builtin_add_float` all exist 
+in static scope. In general, however, these parameters which are used to 
+monomorphize polymorphic functions could be local to the calling scope (e.g. 
+if `fred()` was a polymorphic function itself, or had a local overload of 
+`float` addition).
+
+The crux of the issue is that these monomorphization thunks need a lexical 
+link to their creation context, but C's standard calling convention provides 
+no way to include such a lexical link. GCC fixes this for nested functions 
+by placing an executable trampoline on the stack to modify the calling 
+convention; this trampoline has the standard calling convention, and calls 
+the nested function after setting up the lexical link. This prevents the 
+stack from being marked as non-executable, opening a variety of potential 
+security vulnerabilities. More to the point of this proposal, it also means 
+that the thunk exists on the stack, and may go out of scope before it is 
+used if it is copied somewhere else (for instance, to the root of a new 
+stack in a coroutine).
+
+The standard solution to this sort of problem is a *closure*; this proposal 
+describes how to integrate a restricted sort of closure into Cforall that 
+would be sufficiently powerful to monomorphize polymorphic functions. 
+
+Monomorphization parameters in the current implementation fall into four 
+categories:
+  1. Size/alignment of types; a single unsigned integer
+  2. Field offsets for generic types; a fixed length array of unsigned 
+  3. Functions used to satisfy type assertions: a single function pointer
+  4. Variables used to satisfy type assertions: a single void pointer
+
+The gist of this proposal is to develop a copyable closure object (similar  
+in concept to `std::function` in C++) that can encapsulate a function 
+pointer and an arbitrary list of these monomorphization parameters and 
+provide a function call operator that passes the appropriate parameters to 
+the underlying function. In (very-)pseudo-Cforall, it might look something 
+like the following:
+
+    forall(ttype Rtns, ttype Args) struct Fn {
+        [assertion...] closed;
+        forall(closed) Rtns (*f)(Args...);
+    };
+
+    forall(ttype Rtns, ttype Args, [assertion...] Closed)
+    void ?{}( Fn(Rtns, Args) *this, 
+              forall(Closed) Rtns (*f)(Args), Closed closed ) {
+        this->closed = closed;
+        this->f = f;
+    }
+    // ^ function pointers convert implicitly to Fn, as they have an empty 
+    // assertion list here
+
+    forall(ttype Rtns, ttype Args)
+    Rtns ?() ( Fn(Rtns, Args) fn, Args args ) {
+        return fn.f( fn.closed, args );
+    }
+
+Using this `Fn` closure internally (even if it was never exposed to user 
+programmers), the top example would codegen something like this, with `Fn` 
+substituted for the implicit function pointers:
+
+    void double(size_t _sizeof_T,  
+                Fn(void, [void*, void*]) _assign_T,
+                Fn(void, [void*, void*, void*]) _add_T,
+                void *_rtn, 
+                void *x ) {
+        _add_T( _rtn, x, x );
+    }
+
+    void quadruple(size_t _sizeof_R,  
+                   Fn(void, [void*, void*]) _assign_R,
+                   Fn(void, [void*, void*]) _double_R, 
+                   void *_rtn,
+                   void *y ) {
+        void *_tmp0 = alloca(_sizeof_R);
+        _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) );
+     }
+
+     void fred() {
+         // closure wrapper for static function
+         Fn(void, [void*, void*]) _thunk0 = { _builtin_assign_float };
+         // nested closure to adapt double() to _double_R()
+         Fn(void, [void*, void*]) _thunk1 = { double, 
+             [sizeof(float), _builtin_assign_float, _builtin_add_float] };
+            
+         float magic;
+         float _tmp1 = 10.5f;
+         quadruple( sizeof(float), _thunk0, _thunk1,
+                    &magic, &_tmp1 );
+     }
+
+The main challenge with this approach is that the `Fn` closure is 
+(necessarily) variable in size, as it can close over an arbitrary (but fixed 
+at construction time) number of parameters. This will make memory management 
+for it somewhat challenging, and writing the code in the translator to 
+implement the function call operator passing a variable number of type 
+assertions should also be non-trivial, but tractable.
