## Light-weight Closures for Cforall ## A core capability of the Cforall type system is the ability to use monomorphic specializations of polymorphic functions seamlessly and invisibly to the user programmer, primarily in type assertions: forall(otype T | { T ?+?(T, T); }) T double( T x ) { return x + x; } forall(otype R | { R double(R); }) R quadruple( R y ) { return double( double( y ) ); } void fred() { float magic = quadruple( 10.5f ); } In the example above, `R` and `T` are both bound to `float`, and the `double` type assertion on `quadruple` is satisfied by monomorphizing the polymorphic `double` function to `float` (using the builtin `float` addition operator). The existing implementation uses GCC nested functions to implement this monomorphization, as in the following (much simplified) generated code: void double(size_t _sizeof_T, void (*_assign_T)(void*, void*), void (*_add_T)(void*, void*, void*), void *_rtn, void *x ) { _add_T( _rtn, x, x ); } void quadruple(size_t _sizeof_R, void (*_assign_R)(void*, void*), void (*_double_R)(void*, void*), void *_rtn, void *y ) { void *_tmp0 = alloca(_sizeof_R); _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) ); } void fred() { // nested thunk to adapt double() to _double_R() void _thunk0( void *_rtn, void *x ) { double( sizeof(float), _builtin_assign_float, _builtin_add_float, _rtn, x ); } float magic; float _tmp1 = 10.5f; quadruple( sizeof(float), _builtin_assign_float, _thunk0, &magic, &_tmp1 ); } Now, in the example above, `_thunk0` could be hoisted to static scope, as `sizeof(float)`, `_builtin_assign_float`, and `builtin_add_float` all exist in static scope. In general, however, these parameters which are used to monomorphize polymorphic functions could be local to the calling scope (e.g. if `fred()` was a polymorphic function itself, or had a local overload of `float` addition). The crux of the issue is that these monomorphization thunks need a lexical link to their creation context, but C's standard calling convention provides no way to include such a lexical link. GCC fixes this for nested functions by placing an executable trampoline on the stack to modify the calling convention; this trampoline has the standard calling convention, and calls the nested function after setting up the lexical link. This prevents the stack from being marked as non-executable, opening a variety of potential security vulnerabilities. More to the point of this proposal, it also means that the thunk exists on the stack, and may go out of scope before it is used if it is copied somewhere else (for instance, to the root of a new stack in a coroutine). The standard solution to this sort of problem is a *closure*; this proposal describes how to integrate a restricted sort of closure into Cforall that would be sufficiently powerful to monomorphize polymorphic functions. Monomorphization parameters in the current implementation fall into four categories: 1. Size/alignment of types; a single unsigned integer 2. Field offsets for generic types; a fixed length array of unsigned 3. Functions used to satisfy type assertions: a single function pointer 4. Variables used to satisfy type assertions: a single void pointer The gist of this proposal is to develop a copyable closure object (similar in concept to `std::function` in C++) that can encapsulate a function pointer and an arbitrary list of these monomorphization parameters and provide a function call operator that passes the appropriate parameters to the underlying function. In (very-)pseudo-Cforall, it might look something like the following: forall(ttype Rtns, ttype Args) struct Fn { [assertion...] closed; forall(closed) Rtns (*f)(Args...); }; forall(ttype Rtns, ttype Args, [assertion...] Closed) void ?{}( Fn(Rtns, Args) *this, forall(Closed) Rtns (*f)(Args), Closed closed ) { this->closed = closed; this->f = f; } // ^ function pointers convert implicitly to Fn, as they have an empty // assertion list here forall(ttype Rtns, ttype Args) Rtns ?() ( Fn(Rtns, Args) fn, Args args ) { return fn.f( fn.closed, args ); } Using this `Fn` closure internally (even if it was never exposed to user programmers), the top example would codegen something like this, with `Fn` substituted for the implicit function pointers: void double(size_t _sizeof_T, Fn(void, [void*, void*]) _assign_T, Fn(void, [void*, void*, void*]) _add_T, void *_rtn, void *x ) { _add_T( _rtn, x, x ); } void quadruple(size_t _sizeof_R, Fn(void, [void*, void*]) _assign_R, Fn(void, [void*, void*]) _double_R, void *_rtn, void *y ) { void *_tmp0 = alloca(_sizeof_R); _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) ); } void fred() { // closure wrapper for static function Fn(void, [void*, void*]) _thunk0 = { _builtin_assign_float }; // nested closure to adapt double() to _double_R() Fn(void, [void*, void*]) _thunk1 = { double, [sizeof(float), _builtin_assign_float, _builtin_add_float] }; float magic; float _tmp1 = 10.5f; quadruple( sizeof(float), _thunk0, _thunk1, &magic, &_tmp1 ); } The main challenge with this approach is that the `Fn` closure is (necessarily) variable in size, as it can close over an arbitrary (but fixed at construction time) number of parameters. This will make memory management for it somewhat challenging, and writing the code in the translator to implement the function call operator passing a variable number of type assertions should also be non-trivial, but tractable.