| [fef8293] | 1 | ## Light-weight Closures for Cforall ##
 | 
|---|
 | 2 | 
 | 
|---|
 | 3 | A core capability of the Cforall type system is the ability to use 
 | 
|---|
 | 4 | monomorphic specializations of polymorphic functions seamlessly and 
 | 
|---|
 | 5 | invisibly to the user programmer, primarily in type assertions:
 | 
|---|
 | 6 | 
 | 
|---|
 | 7 |     forall(otype T | { T ?+?(T, T); })
 | 
|---|
 | 8 |     T double( T x ) { return x + x; }
 | 
|---|
 | 9 | 
 | 
|---|
 | 10 |     forall(otype R | { R double(R); })
 | 
|---|
 | 11 |     R quadruple( R y ) { return double( double( y ) ); }
 | 
|---|
 | 12 | 
 | 
|---|
 | 13 |     void fred() {
 | 
|---|
 | 14 |         float magic = quadruple( 10.5f );
 | 
|---|
 | 15 |     }
 | 
|---|
 | 16 | 
 | 
|---|
 | 17 | In the example above, `R` and `T` are both bound to `float`, and the 
 | 
|---|
 | 18 | `double` type assertion on `quadruple` is satisfied by monomorphizing the 
 | 
|---|
 | 19 | polymorphic `double` function to `float` (using the builtin `float` addition 
 | 
|---|
 | 20 | operator). The existing implementation uses GCC nested functions to 
 | 
|---|
 | 21 | implement this monomorphization, as in the following (much simplified) 
 | 
|---|
 | 22 | generated code:
 | 
|---|
 | 23 | 
 | 
|---|
 | 24 |     void double(size_t _sizeof_T,  
 | 
|---|
 | 25 |                 void (*_assign_T)(void*, void*),
 | 
|---|
 | 26 |                 void (*_add_T)(void*, void*, void*),
 | 
|---|
 | 27 |                 void *_rtn, 
 | 
|---|
 | 28 |                 void *x ) {
 | 
|---|
 | 29 |         _add_T( _rtn, x, x );
 | 
|---|
 | 30 |     }
 | 
|---|
 | 31 | 
 | 
|---|
 | 32 |     void quadruple(size_t _sizeof_R,  
 | 
|---|
 | 33 |                    void (*_assign_R)(void*, void*),
 | 
|---|
 | 34 |                    void (*_double_R)(void*, void*), 
 | 
|---|
 | 35 |                    void *_rtn,
 | 
|---|
 | 36 |                    void *y ) {
 | 
|---|
 | 37 |         void *_tmp0 = alloca(_sizeof_R);
 | 
|---|
 | 38 |         _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) );
 | 
|---|
 | 39 |      }
 | 
|---|
 | 40 | 
 | 
|---|
 | 41 |      void fred() {
 | 
|---|
 | 42 |          // nested thunk to adapt double() to _double_R()
 | 
|---|
 | 43 |          void _thunk0( void *_rtn, void *x ) {
 | 
|---|
 | 44 |              double( sizeof(float), _builtin_assign_float, 
 | 
|---|
 | 45 |                      _builtin_add_float,
 | 
|---|
 | 46 |                      _rtn, x );
 | 
|---|
 | 47 |          }
 | 
|---|
 | 48 | 
 | 
|---|
 | 49 |          float magic;
 | 
|---|
 | 50 |          float _tmp1 = 10.5f;
 | 
|---|
 | 51 |          quadruple( sizeof(float), _builtin_assign_float, _thunk0,
 | 
|---|
 | 52 |                     &magic, &_tmp1 );
 | 
|---|
 | 53 |      }
 | 
|---|
 | 54 | 
 | 
|---|
 | 55 | Now, in the example above, `_thunk0` could be hoisted to static scope, as 
 | 
|---|
 | 56 | `sizeof(float)`, `_builtin_assign_float`, and `builtin_add_float` all exist 
 | 
|---|
 | 57 | in static scope. In general, however, these parameters which are used to 
 | 
|---|
 | 58 | monomorphize polymorphic functions could be local to the calling scope (e.g. 
 | 
|---|
 | 59 | if `fred()` was a polymorphic function itself, or had a local overload of 
 | 
|---|
 | 60 | `float` addition).
 | 
|---|
 | 61 | 
 | 
|---|
 | 62 | The crux of the issue is that these monomorphization thunks need a lexical 
 | 
|---|
 | 63 | link to their creation context, but C's standard calling convention provides 
 | 
|---|
 | 64 | no way to include such a lexical link. GCC fixes this for nested functions 
 | 
|---|
 | 65 | by placing an executable trampoline on the stack to modify the calling 
 | 
|---|
 | 66 | convention; this trampoline has the standard calling convention, and calls 
 | 
|---|
 | 67 | the nested function after setting up the lexical link. This prevents the 
 | 
|---|
 | 68 | stack from being marked as non-executable, opening a variety of potential 
 | 
|---|
 | 69 | security vulnerabilities. More to the point of this proposal, it also means 
 | 
|---|
 | 70 | that the thunk exists on the stack, and may go out of scope before it is 
 | 
|---|
 | 71 | used if it is copied somewhere else (for instance, to the root of a new 
 | 
|---|
 | 72 | stack in a coroutine).
 | 
|---|
 | 73 | 
 | 
|---|
 | 74 | The standard solution to this sort of problem is a *closure*; this proposal 
 | 
|---|
 | 75 | describes how to integrate a restricted sort of closure into Cforall that 
 | 
|---|
 | 76 | would be sufficiently powerful to monomorphize polymorphic functions. 
 | 
|---|
 | 77 | 
 | 
|---|
 | 78 | Monomorphization parameters in the current implementation fall into four 
 | 
|---|
 | 79 | categories:
 | 
|---|
 | 80 |   1. Size/alignment of types; a single unsigned integer
 | 
|---|
 | 81 |   2. Field offsets for generic types; a fixed length array of unsigned 
 | 
|---|
 | 82 |   3. Functions used to satisfy type assertions: a single function pointer
 | 
|---|
 | 83 |   4. Variables used to satisfy type assertions: a single void pointer
 | 
|---|
 | 84 | 
 | 
|---|
 | 85 | The gist of this proposal is to develop a copyable closure object (similar  
 | 
|---|
 | 86 | in concept to `std::function` in C++) that can encapsulate a function 
 | 
|---|
 | 87 | pointer and an arbitrary list of these monomorphization parameters and 
 | 
|---|
 | 88 | provide a function call operator that passes the appropriate parameters to 
 | 
|---|
 | 89 | the underlying function. In (very-)pseudo-Cforall, it might look something 
 | 
|---|
 | 90 | like the following:
 | 
|---|
 | 91 | 
 | 
|---|
 | 92 |     forall(ttype Rtns, ttype Args) struct Fn {
 | 
|---|
 | 93 |         [assertion...] closed;
 | 
|---|
 | 94 |         forall(closed) Rtns (*f)(Args...);
 | 
|---|
 | 95 |     };
 | 
|---|
 | 96 | 
 | 
|---|
 | 97 |     forall(ttype Rtns, ttype Args, [assertion...] Closed)
 | 
|---|
 | 98 |     void ?{}( Fn(Rtns, Args) *this, 
 | 
|---|
 | 99 |               forall(Closed) Rtns (*f)(Args), Closed closed ) {
 | 
|---|
 | 100 |         this->closed = closed;
 | 
|---|
 | 101 |         this->f = f;
 | 
|---|
 | 102 |     }
 | 
|---|
 | 103 |     // ^ function pointers convert implicitly to Fn, as they have an empty 
 | 
|---|
 | 104 |     // assertion list here
 | 
|---|
 | 105 | 
 | 
|---|
 | 106 |     forall(ttype Rtns, ttype Args)
 | 
|---|
 | 107 |     Rtns ?() ( Fn(Rtns, Args) fn, Args args ) {
 | 
|---|
 | 108 |         return fn.f( fn.closed, args );
 | 
|---|
 | 109 |     }
 | 
|---|
 | 110 | 
 | 
|---|
 | 111 | Using this `Fn` closure internally (even if it was never exposed to user 
 | 
|---|
 | 112 | programmers), the top example would codegen something like this, with `Fn` 
 | 
|---|
 | 113 | substituted for the implicit function pointers:
 | 
|---|
 | 114 | 
 | 
|---|
 | 115 |     void double(size_t _sizeof_T,  
 | 
|---|
 | 116 |                 Fn(void, [void*, void*]) _assign_T,
 | 
|---|
 | 117 |                 Fn(void, [void*, void*, void*]) _add_T,
 | 
|---|
 | 118 |                 void *_rtn, 
 | 
|---|
 | 119 |                 void *x ) {
 | 
|---|
 | 120 |         _add_T( _rtn, x, x );
 | 
|---|
 | 121 |     }
 | 
|---|
 | 122 | 
 | 
|---|
 | 123 |     void quadruple(size_t _sizeof_R,  
 | 
|---|
 | 124 |                    Fn(void, [void*, void*]) _assign_R,
 | 
|---|
 | 125 |                    Fn(void, [void*, void*]) _double_R, 
 | 
|---|
 | 126 |                    void *_rtn,
 | 
|---|
 | 127 |                    void *y ) {
 | 
|---|
 | 128 |         void *_tmp0 = alloca(_sizeof_R);
 | 
|---|
 | 129 |         _double_R( _rtn, (_double_R( _tmp0, y ), _tmp0) );
 | 
|---|
 | 130 |      }
 | 
|---|
 | 131 | 
 | 
|---|
 | 132 |      void fred() {
 | 
|---|
 | 133 |          // closure wrapper for static function
 | 
|---|
 | 134 |          Fn(void, [void*, void*]) _thunk0 = { _builtin_assign_float };
 | 
|---|
 | 135 |          // nested closure to adapt double() to _double_R()
 | 
|---|
 | 136 |          Fn(void, [void*, void*]) _thunk1 = { double, 
 | 
|---|
 | 137 |              [sizeof(float), _builtin_assign_float, _builtin_add_float] };
 | 
|---|
 | 138 |             
 | 
|---|
 | 139 |          float magic;
 | 
|---|
 | 140 |          float _tmp1 = 10.5f;
 | 
|---|
 | 141 |          quadruple( sizeof(float), _thunk0, _thunk1,
 | 
|---|
 | 142 |                     &magic, &_tmp1 );
 | 
|---|
 | 143 |      }
 | 
|---|
 | 144 | 
 | 
|---|
 | 145 | The main challenge with this approach is that the `Fn` closure is 
 | 
|---|
 | 146 | (necessarily) variable in size, as it can close over an arbitrary (but fixed 
 | 
|---|
 | 147 | at construction time) number of parameters. This will make memory management 
 | 
|---|
 | 148 | for it somewhat challenging, and writing the code in the translator to 
 | 
|---|
 | 149 | implement the function call operator passing a variable number of type 
 | 
|---|
 | 150 | assertions should also be non-trivial, but tractable.
 | 
|---|