﻿id	summary	reporter	owner	description	type	status	priority	component	version	resolution	keywords	cc
278	Performance issue on ARM64 due to __aarch64_sync_cache_range calls	caparson		"On the ARM64, calls to aarch64_sync_cache_range occur when a pointer to a nested function is stored. This is likely due to the compiler worrying about self-modifying code.

The following minimal repro will generate a call to aarch64_sync_cache_range:
{{{
#!div style=""font-size: 80%""
  {{{#!C
  int main() {
        inline void foo() {}
        void (*f)(void) = foo;
  }
  }}}
}}}

Calls to aarch64_sync_cache_range are very expensive since they involve a memory barrier instruction ISB.
 ISB - whenever instruction fetches need to explicitly take place after a certain point in the program, for example after memory map updates or after writing code to be executed. (In practice, this means ""throw away any prefetched instructions at this point"".)

Currently we store pointers to nested functions in 2 cases: polymorphic adapters and destructors of polymorphic fields.

We believe that the adapter issue can be fixed by hoisting adapters to global scope since they do not need a closure.

The dtor issue may be able to be fixed by changing the calling convention of destructors to not store a function pointer:

{{{
#!div style=""font-size: 80%""
  {{{#!C
__attribute__ ((cleanup(__destroy_Destructor))) struct __Destructor __memberDtor0 = { ((void *)_X1tS1A_Y1T__1), ((void (*)(void *__param_0))__cleanup_dtor20) }; // old way
    __attribute__ ((cleanup(__cleanup_dtor20))) void * nd = ((void *)_X1tS1A_Y1T__1); // fix
  }}}
}}}

Attached is a program that generates output with both the issues. Also attached is a trimmed down version of the output C code with the fixes implemented inline."	defect	new	major	cfa-cc	1.0			
