| 1 | ## Light-weight Closures for Cforall ##
 | 
|---|
| 2 | 
 | 
|---|
| 3 | A core capability of the Cforall type system is the ability to use 
 | 
|---|
| 4 | monomorphic specializations of polymorphic functions seamlessly and 
 | 
|---|
| 5 | invisibly to the user programmer, primarily in type assertions:
 | 
|---|
| 6 | 
 | 
|---|
| 7 |     forall(otype T | { T ?+?(T, T); })
 | 
|---|
| 8 |     T double( T x ) { return x + x; }
 | 
|---|
| 9 | 
 | 
|---|
| 10 |     forall(otype R | { R double(R); })
 | 
|---|
| 11 |     R quadruple( R y ) { return double( double( y ) ); }
 | 
|---|
| 12 | 
 | 
|---|
| 13 |     void fred() {
 | 
|---|
| 14 |         float magic = quadruple( 10.5f );
 | 
|---|
| 15 |     }
 | 
|---|
| 16 | 
 | 
|---|
| 17 | In the example above, `R` and `T` are both bound to `float`, and the 
 | 
|---|
| 18 | `double` type assertion on `quadruple` is satisfied by monomorphizing the 
 | 
|---|
| 19 | polymorphic `double` function to `float` (using the builtin `float` addition 
 | 
|---|
| 20 | operator). The existing implementation uses GCC nested functions to 
 | 
|---|
| 21 | implement this monomorphization, as in the following (much simplified) 
 | 
|---|
| 22 | generated code:
 | 
|---|
| 23 | 
 | 
|---|
| 24 |     void double(size_t _sizeof_T,  
 | 
|---|
| 25 |                 void (*_assign_T)(void*, void*),
 | 
|---|
| 26 |                 void (*_add_T)(void*, void*, void*),
 | 
|---|
| 27 |                 void *_rtn, 
 | 
|---|
| 28 |                 void *x ) {
 | 
|---|
| 29 |         _add_T( _rtn, x, x );
 | 
|---|
| 30 |     }
 | 
|---|
| 31 | 
 | 
|---|
| 32 |     void quadruple(size_t _sizeof_R,  
 | 
|---|
| 33 |                    void (*_assign_R)(void*, void*),
 | 
|---|
| 34 |                    void (*_double_R)(void*, void*), 
 | 
|---|
| 35 |                    void *_rtn,
 | 
|---|
| 36 |                    void *y ) {
 | 
|---|
| 37 |         void *_tmp0 = alloca(_sizeof_R);
 | 
|---|
| 38 |         _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) );
 | 
|---|
| 39 |      }
 | 
|---|
| 40 | 
 | 
|---|
| 41 |      void fred() {
 | 
|---|
| 42 |          // nested thunk to adapt double() to _double_R()
 | 
|---|
| 43 |          void _thunk0( void *_rtn, void *x ) {
 | 
|---|
| 44 |              double( sizeof(float), _builtin_assign_float, 
 | 
|---|
| 45 |                      _builtin_add_float,
 | 
|---|
| 46 |                      _rtn, x );
 | 
|---|
| 47 |          }
 | 
|---|
| 48 | 
 | 
|---|
| 49 |          float magic;
 | 
|---|
| 50 |          float _tmp1 = 10.5f;
 | 
|---|
| 51 |          quadruple( sizeof(float), _builtin_assign_float, _thunk0,
 | 
|---|
| 52 |                     &magic, &_tmp1 );
 | 
|---|
| 53 |      }
 | 
|---|
| 54 | 
 | 
|---|
| 55 | Now, in the example above, `_thunk0` could be hoisted to static scope, as 
 | 
|---|
| 56 | `sizeof(float)`, `_builtin_assign_float`, and `builtin_add_float` all exist 
 | 
|---|
| 57 | in static scope. In general, however, these parameters which are used to 
 | 
|---|
| 58 | monomorphize polymorphic functions could be local to the calling scope (e.g. 
 | 
|---|
| 59 | if `fred()` was a polymorphic function itself, or had a local overload of 
 | 
|---|
| 60 | `float` addition).
 | 
|---|
| 61 | 
 | 
|---|
| 62 | The crux of the issue is that these monomorphization thunks need a lexical 
 | 
|---|
| 63 | link to their creation context, but C's standard calling convention provides 
 | 
|---|
| 64 | no way to include such a lexical link. GCC fixes this for nested functions 
 | 
|---|
| 65 | by placing an executable trampoline on the stack to modify the calling 
 | 
|---|
| 66 | convention; this trampoline has the standard calling convention, and calls 
 | 
|---|
| 67 | the nested function after setting up the lexical link. This prevents the 
 | 
|---|
| 68 | stack from being marked as non-executable, opening a variety of potential 
 | 
|---|
| 69 | security vulnerabilities. More to the point of this proposal, it also means 
 | 
|---|
| 70 | that the thunk exists on the stack, and may go out of scope before it is 
 | 
|---|
| 71 | used if it is copied somewhere else (for instance, to the root of a new 
 | 
|---|
| 72 | stack in a coroutine).
 | 
|---|
| 73 | 
 | 
|---|
| 74 | The standard solution to this sort of problem is a *closure*; this proposal 
 | 
|---|
| 75 | describes how to integrate a restricted sort of closure into Cforall that 
 | 
|---|
| 76 | would be sufficiently powerful to monomorphize polymorphic functions. 
 | 
|---|
| 77 | 
 | 
|---|
| 78 | Monomorphization parameters in the current implementation fall into four 
 | 
|---|
| 79 | categories:
 | 
|---|
| 80 |   1. Size/alignment of types; a single unsigned integer
 | 
|---|
| 81 |   2. Field offsets for generic types; a fixed length array of unsigned 
 | 
|---|
| 82 |   3. Functions used to satisfy type assertions: a single function pointer
 | 
|---|
| 83 |   4. Variables used to satisfy type assertions: a single void pointer
 | 
|---|
| 84 | 
 | 
|---|
| 85 | The gist of this proposal is to develop a copyable closure object (similar  
 | 
|---|
| 86 | in concept to `std::function` in C++) that can encapsulate a function 
 | 
|---|
| 87 | pointer and an arbitrary list of these monomorphization parameters and 
 | 
|---|
| 88 | provide a function call operator that passes the appropriate parameters to 
 | 
|---|
| 89 | the underlying function. In (very-)pseudo-Cforall, it might look something 
 | 
|---|
| 90 | like the following:
 | 
|---|
| 91 | 
 | 
|---|
| 92 |     forall(ttype Rtns, ttype Args) struct Fn {
 | 
|---|
| 93 |         [assertion...] closed;
 | 
|---|
| 94 |         forall(closed) Rtns (*f)(Args...);
 | 
|---|
| 95 |     };
 | 
|---|
| 96 | 
 | 
|---|
| 97 |     forall(ttype Rtns, ttype Args, [assertion...] Closed)
 | 
|---|
| 98 |     void ?{}( Fn(Rtns, Args) *this, 
 | 
|---|
| 99 |               forall(Closed) Rtns (*f)(Args), Closed closed ) {
 | 
|---|
| 100 |         this->closed = closed;
 | 
|---|
| 101 |         this->f = f;
 | 
|---|
| 102 |     }
 | 
|---|
| 103 |     // ^ function pointers convert implicitly to Fn, as they have an empty 
 | 
|---|
| 104 |     // assertion list here
 | 
|---|
| 105 | 
 | 
|---|
| 106 |     forall(ttype Rtns, ttype Args)
 | 
|---|
| 107 |     Rtns ?() ( Fn(Rtns, Args) fn, Args args ) {
 | 
|---|
| 108 |         return fn.f( fn.closed, args );
 | 
|---|
| 109 |     }
 | 
|---|
| 110 | 
 | 
|---|
| 111 | Using this `Fn` closure internally (even if it was never exposed to user 
 | 
|---|
| 112 | programmers), the top example would codegen something like this, with `Fn` 
 | 
|---|
| 113 | substituted for the implicit function pointers:
 | 
|---|
| 114 | 
 | 
|---|
| 115 |     void double(size_t _sizeof_T,  
 | 
|---|
| 116 |                 Fn(void, [void*, void*]) _assign_T,
 | 
|---|
| 117 |                 Fn(void, [void*, void*, void*]) _add_T,
 | 
|---|
| 118 |                 void *_rtn, 
 | 
|---|
| 119 |                 void *x ) {
 | 
|---|
| 120 |         _add_T( _rtn, x, x );
 | 
|---|
| 121 |     }
 | 
|---|
| 122 | 
 | 
|---|
| 123 |     void quadruple(size_t _sizeof_R,  
 | 
|---|
| 124 |                    Fn(void, [void*, void*]) _assign_R,
 | 
|---|
| 125 |                    Fn(void, [void*, void*]) _double_R, 
 | 
|---|
| 126 |                    void *_rtn,
 | 
|---|
| 127 |                    void *y ) {
 | 
|---|
| 128 |         void *_tmp0 = alloca(_sizeof_R);
 | 
|---|
| 129 |         _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) );
 | 
|---|
| 130 |      }
 | 
|---|
| 131 | 
 | 
|---|
| 132 |      void fred() {
 | 
|---|
| 133 |          // closure wrapper for static function
 | 
|---|
| 134 |          Fn(void, [void*, void*]) _thunk0 = { _builtin_assign_float };
 | 
|---|
| 135 |          // nested closure to adapt double() to _double_R()
 | 
|---|
| 136 |          Fn(void, [void*, void*]) _thunk1 = { double, 
 | 
|---|
| 137 |              [sizeof(float), _builtin_assign_float, _builtin_add_float] };
 | 
|---|
| 138 |             
 | 
|---|
| 139 |          float magic;
 | 
|---|
| 140 |          float _tmp1 = 10.5f;
 | 
|---|
| 141 |          quadruple( sizeof(float), _thunk0, _thunk1,
 | 
|---|
| 142 |                     &magic, &_tmp1 );
 | 
|---|
| 143 |      }
 | 
|---|
| 144 | 
 | 
|---|
| 145 | The main challenge with this approach is that the `Fn` closure is 
 | 
|---|
| 146 | (necessarily) variable in size, as it can close over an arbitrary (but fixed 
 | 
|---|
| 147 | at construction time) number of parameters. This will make memory management 
 | 
|---|
| 148 | for it somewhat challenging, and writing the code in the translator to 
 | 
|---|
| 149 | implement the function call operator passing a variable number of type 
 | 
|---|
| 150 | assertions should also be non-trivial, but tractable.
 | 
|---|