Changeset eb0d9b7 for libcfa


Ignore:
Timestamp:
Dec 20, 2025, 4:52:54 AM (2 days ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Children:
80e83b6c
Parents:
0210a543
Message:

Improve libcfa-array's bound-check removal and write that thesis section.

The libcfa change adds a more performant alternative for a subset of multidimensional indexing cases that were already functionally correct.
That the new alternative is more performant is not shown in the test suite.
There is an associated new high-performance option for passing an array-or-slice to a function.
The added test cases cover those options.

The added in-thesis demos rely on the new more-performant alternative for multidimensional indexing.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • libcfa/src/collections/array.hfa

    r0210a543 reb0d9b7  
    242242}
    243243
     244// Further form of -[-,-,-] that avoids using the trait system.
     245// Above overloads work for any type with (recursively valid) subscript operator,
     246// provided said subscript is passed as an assertion.
     247// Below works only on arpk variations but never passes its subscript though an assertion.
     248//
     249// When arpk implements the trait used above,
     250// the critical assertion is backed by a nontrivial thunk.
     251// There is no "thunk problem" (lifetime) issue, when used as shown in the test suite.
     252// But the optimizer has shown difficulty removing these thunks in cases where "it should,"
     253// i.e. when all user code is in one compilation unit.
     254// Not that every attempt at removing such a thunk fails; cases have been found going both ways.
     255// Cases have been found with unnecessary bound-checks removed successfully,
     256// on user code written against the overloads below,
     257// but where these bound checks (which occur within `call`ed thunks) are not removed,
     258// on user code written against the overloads above.
     259//
     260// The overloads below provide specializations of the above
     261// that are a little harder to use than the ones above,
     262// but where array API erasure has been seen to be more effective.
     263// Note that the style below does not appeal to a case where thunk inlining is more effective;
     264// rather, it simply does not rely on thunks in the first place.
     265//
     266// Both usage styles are shown in test array-md-sbscr-cases#numSubscrTypeCompatibility,
     267// with the more general one above being "high abstraction,"
     268// and the more performant one below being "mid abstraction" and "low abstraction."
     269//
     270// A breadth of index types is not given here (providing -[size_t,size_t,...] only)
     271// because these declarations are not feeding a trait, so safe implicit arithmetic conversion kiks in.
     272// Even so, there may still be an un-met need for accepting
     273// either ptrdiff_t or size_t (signed or unsigned)
     274// because Mike has seen the optimizer resist removing bound checks when sign-conversion is in play.
     275// "Only size_t" is meeting today's need
     276// and no solution is known that avoids 2^D overloads for D dimensions
     277// while offering multiple subscript types and staying assertion-free.
     278//
     279// This approach, of avoiding traits entirely, is likely incompatible with the original desire
     280// to have one recursive multidimensional subscript operator (TRY_BROKEN_DESIRED_MD_SUBSCRIPT).
     281// To make a single declaration work,
     282// we would probably have to get better at coaxing the optimizer into inlining thunks.
     283
     284forall( [N2], S2*, [N1], S1*, Timmed1, Tbase )
     285static inline Timmed1 & ?[?]( arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ) & this, size_t ix2, size_t ix1 ) {
     286        return this[ix2][ix1];
     287}
     288
     289forall( [N3], S3*, [N2], S2*, [N1], S1*, Timmed1, Tbase )
     290static inline Timmed1 & ?[?]( arpk( N3, S3, arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ), Tbase ) & this, size_t ix3, size_t ix2, size_t ix1 ) {
     291        return this[ix3][ix2][ix1];
     292}
     293
     294forall( [N4], S4*, [N3], S3*, [N2], S2*, [N1], S1*, Timmed1, Tbase )
     295static inline Timmed1 & ?[?]( arpk( N4, S4, arpk( N3, S3, arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ), Tbase ), Tbase ) & this, size_t ix4, size_t ix3, size_t ix2, size_t ix1 ) {
     296        return this[ix4][ix3][ix2][ix1];
     297}
     298
     299
     300
    244301#endif
    245302
Note: See TracChangeset for help on using the changeset viewer.