Opened 10 months ago

#278 new defect

Performance issue on ARM64 due to __aarch64_sync_cache_range calls

Reported by: caparson Owned by:
Priority: major Component: cfa-cc
Version: 1.0 Keywords:
Cc:

Description

On the ARM64, calls to aarch64_sync_cache_range occur when a pointer to a nested function is stored. This is likely due to the compiler worrying about self-modifying code.

The following minimal repro will generate a call to aarch64_sync_cache_range:

int main() {
      inline void foo() {}
      void (*f)(void) = foo;
}

Calls to aarch64_sync_cache_range are very expensive since they involve a memory barrier instruction ISB.

ISB - whenever instruction fetches need to explicitly take place after a certain point in the program, for example after memory map updates or after writing code to be executed. (In practice, this means "throw away any prefetched instructions at this point".)

Currently we store pointers to nested functions in 2 cases: polymorphic adapters and destructors of polymorphic fields.

We believe that the adapter issue can be fixed by hoisting adapters to global scope since they do not need a closure.

The dtor issue may be able to be fixed by changing the calling convention of destructors to not store a function pointer:

__attribute__ ((cleanup(__destroy_Destructor))) struct __Destructor __memberDtor0 = { ((void *)_X1tS1A_Y1T__1), ((void (*)(void *__param_0))__cleanup_dtor20) }; // old way
    __attribute__ ((cleanup(__cleanup_dtor20))) void * nd = ((void *)_X1tS1A_Y1T__1); // fix

Attached is a program that generates output with both the issues. Also attached is a trimmed down version of the output C code with the fixes implemented inline.

Attachments (2)

poly_dtor.cfa (402 bytes) - added by caparson 10 months ago.
Minimal reproduction of the issue
poly_dtor_fixed.c (26.1 KB) - added by caparson 10 months ago.
Output C from the repro that has been modified with a potential fix for both issues

Download all attachments as: .zip

Change History (2)

Changed 10 months ago by caparson

Attachment: poly_dtor.cfa added

Minimal reproduction of the issue

Changed 10 months ago by caparson

Attachment: poly_dtor_fixed.c added

Output C from the repro that has been modified with a potential fix for both issues

Note: See TracTickets for help on using tickets.