Context Navigation

← Previous Changeset
Next Changeset →

Changeset eb0d9b7

Timestamp:

Dec 20, 2025, 4:52:54 AM (3 months ago)

Author:

Michael Brooks <mlbrooks@…>

Branches:

master, stuck-waitfor-destruct

Children:

80e83b6c

Parents:

0210a543

Message:

Improve libcfa-array's bound-check removal and write that thesis section.

The libcfa change adds a more performant alternative for a subset of multidimensional indexing cases that were already functionally correct.
That the new alternative is more performant is not shown in the test suite.
There is an associated new high-performance option for passing an array-or-slice to a function.
The added test cases cover those options.

The added in-thesis demos rely on the new more-performant alternative for multidimensional indexing.

Files:

: 9 added
: 3 deleted
: 4 edited

doc/theses/mike_brooks_MMath/Makefile (modified) (3 diffs)
doc/theses/mike_brooks_MMath/array.tex (modified) (17 diffs)
doc/theses/mike_brooks_MMath/pictures/ar-bchk.pdf (added)
doc/theses/mike_brooks_MMath/pictures/ar-bchk.xlsx (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/Makefile (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/control.c (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/control.cc (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/control.cfa (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/treatment.c (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/treatment.cc (added)
doc/theses/mike_brooks_MMath/programs/ar-bchk/treatment.cfa (added)
doc/theses/mike_brooks_MMath/programs/array-boundcheck-removal-matmul.cfa (deleted)
doc/theses/mike_brooks_MMath/programs/array-boundcheck-removal-stdvec.cpp (deleted)
doc/theses/mike_brooks_MMath/programs/array-boundcheck-removal.cfa (deleted)
libcfa/src/collections/array.hfa (modified) (1 diff)
tests/array-collections/array-md-sbscr-cases.cfa (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/Makefile

-              r0210a543
+              reb0d9b7
 BibRep = ../../bibliography
+strip-top-dir = $(foreach p,$1,$(patsubst %/,%,$(subst $(firstword $(subst /, ,$(p)))/,, $(p))))
+.SUFFIXES:  # disable make built-in rules
 TeXSRC = ${wildcard *.tex}
 PicSRC = ${notdir ${wildcard ${Pictures}/*.png}} ${notdir ${wildcard ${Pictures}/*.fig}}
 …
 PgmSRC = ${notdir ${wildcard ${Programs}/*}}
 RunPgmSRC = ${notdir ${wildcard ${Programs}/*.run.*}}
+NondemoPgmSRC = ${call strip-top-dir,${wildcard ${Programs}/*/*.c*}}
 BibSRC = ${wildcard *.bib}
 …
 # File Dependencies
 ${DOCUMENT}: ${TeXSRC} $(RunPgmOut) ${DemoPgmOut} ${PlotSRC} ${PicSRC} ${BibSRC} ${BibRep}/pl.bib ${LaTMac}/common.tex Makefile | ${Build}
+${DOCUMENT}: ${TeXSRC} $(RunPgmOut) ${DemoPgmOut} ${NondemoPgmSRC} ${PlotSRC} ${PicSRC} ${BibSRC} ${BibRep}/pl.bib ${LaTMac}/common.tex Makefile | ${Build}
         echo ${PicSRC}
         echo ${GraphSRC_OLD}

doc/theses/mike_brooks_MMath/array.tex

-              r0210a543
+              reb0d9b7
 is a sufficient postcondition.
 In an imperative language like C and \CFA, it is also necessary to discuss side effects, for which an even heavier formalism, like separation logic, is required.
 Secondly, TODO: bash Rust.
 % TODO: cite the crap out of these claims.
+Secondly: ... bash Rust.
+... cite the crap out of these claims.
 \end{comment}
 …
 An expression-compatibility question is a new addition to the \CFA compiler, and occurs in the context of dimension expressions, and possibly enumerations assigns, which must be unique.
 % TODO: ensure these compiler implementation matters are treated under \CFA compiler background: type unification, cost calculation, GenPoly.
+% ... ensure these compiler implementation matters are treated under \CFA compiler background: type unification, cost calculation, GenPoly.
 The relevant technical component of the \CFA compiler is the standard type-unification within the type resolver.
 …
 \end{cfa}
 when resolving a function call to @g@, the first unification stage compares the type @T@ of the parameter with @array( float, n + 1 )@, of the argument.
-\PAB{TODO: finish.}
 The actual rules for comparing two dimension expressions are conservative.
 …
 is to imply, ``all else being equal, allow an array with length calculated by $e_1$
 to be passed to a function expecting a length-$e_2$ array.''\footnote{
         TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
+        ... Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
         Should it be an earlier scoping principle?  Feels like it should matter in more places than here.}
 So, a ``yes'' answer must represent a guarantee that both expressions evaluate the
 …
 So, a variable and a literal can never match.
 TODO: Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation
+... Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation
 \end{comment}
 …
+\section{Bound Checks, Added and Removed}
+\section{Zero Overhead}
+At runtime, the \CFA array is exactly a C array.
 \CFA array subscripting is protected with runtime bound checks.
+The array dependent-typing provides information to the C optimizer allowing it remove many of the bound checks.
+This section provides a demonstration of the effect.
+The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the \CC pattern where restricted vector usage models a checked array.
+The essential feature of this padded-room system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
+The experiment compares with the \CC version to keep access to generated assembly code simple.
+The array dependent-typing provides information to the C optimizer, allowing it remove many of the bound checks.
+This section provides a demonstration of these effects.
+The experiment compares the \CFA array system with the simple-safety system most typically exemplified by Java arrays (\VRef[Section]{JavaCompare}), but also reflected in the \CC pattern where restricted vector usage models a checked array (\VRef[Section]{CppCompare}).
+The essential feature of this simple system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
+The experiment uses the \CC version to simplify access to generated assembly code.
+While ``\CC'' labels a participant, it is really the simple-safety system (of @vector@ with @.at@) whose limitaitons are being explained, and not limitations of \CC optimization.
 As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions.
 When the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
+Firstly, when the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
 But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch an occurrence the mistake.
+TODO: paste source and assembly codes
+Incorporating reuse among dimension sizes is seen to give \CFA an advantage at being optimized.
+The case is naive matrix multiplication over a row-major encoding.
+TODO: paste source codes
+\begin{figure}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{llll@{}}
+\multicolumn{1}{c}{C} &
+\multicolumn{1}{c}{\CFA} &
+\multicolumn{1}{c}{\CC (nested vector)}
+\\[1em]
+\lstinput{20-28}{ar-bchk/control.c} &
+\lstinput{20-28}{ar-bchk/control.cfa} &
+\lstinput{20-28}{ar-bchk/control.cc}
+\\
+\multicolumn{1}{c}{(a)} &
+\multicolumn{1}{c}{(b)} &
+\multicolumn{1}{c}{(c)}
+\\[1em]
+\multicolumn{4}{l}{
+        \includegraphics[page=1,trim=0 9.125in 2in 0,clip]{ar-bchk}
+}
+\\
+\multicolumn{1}{c}{(d)} &
+\multicolumn{1}{c}{(e)} &
+\multicolumn{1}{c}{(f)}
+\\[1em]
+\multicolumn{4}{l}{
+        \includegraphics[page=2,trim=0 8.75in 2in 0,clip]{ar-bchk}
+}
+\\
+\multicolumn{1}{c}{(g)} &
+\multicolumn{1}{c}{(h)} &
+\multicolumn{1}{c}{(i)}
+\end{tabular}
+\caption{Overhead comparison, control case.
+All three programs/compilers generate equally performant code when the programmer ``self-checks'' bounds via a loop's condition.
+Yet both \CFA and \CC versions generate code that is slower and safer than C's when the programmer might overrun the bounds.}
+\label{f:ovhd-ctl}
+\end{figure}
+\VRef[Figure]{f:ovhd-ctl} gives a control-case example, of summing values in an array.
+Each column shows a this program in a different language: C (with no bound checking ever, a,~d,~g), \CFA (b,~e,~h), and \CC (c,~f,~i).
+The source-code functions in (a, b, c) can be compiled to have either correct or incorrect uses of bounds.
+When compiling for correct bound use, the @BND@ macro passes its argument through, so the loops cover exactly the passed array sizes.
+When compiling for incorrect bound use (a programming error), the @BND@ macro hardcodes the loops for 100 iterations, which implies out-of-bound access attempts when the passed array has fewer than 100 elements.
+The assembly code excerpts show optimized translations, for correct-bound mode in (d, e, f) and incorrect-bound mode in (g, h, i).
+Because incorrect-bound mode hardcodes 100 iterations, the loop always executes a first time, so this mode does not have the @n == 0@ branch seen in correct-bound mode.
+For C, this difference is the only (d--g) difference.
+For correct bounds, the \CFA translation equals the C translation, \ie~there is no (d--e) difference.
+It is practically so for \CC too, modulo the additional indirection of dereferencing into the vector's auxiliary allocation.
+Differences arise when the bound-incorrect programs are passed an array shorter than 100.
+The C version, (g), proceeds with undefined behaviour, reading past the end of the passed array.
+The \CFA and \CC versions, (h, i), flag the mistake with the runtime check, the @i >= n@ branch.
+This check is provided implicitly by their library types, @array@ and @vector@ respectively.
+The significant result here is these runtime checks being \emph{absent} from the bound-correct translations, (e, f).
+The code optimizer has removed them because, at the point where they would occur (immediately past @.L28@/@.L3@), it knows from the surrounding control flow either @i == 0 && 0 < n@ or @i < n@ (directly), \i.e. @i < n@ either way.
+So a repeat of this check, the branch and its consequent code (raise error) are all removable.
+Progressing from the control case, a deliberately bound-incorrect mode is no longer informative.
+Rather, given a (well-typed) program that does good work on good data, the issue is whether it is (determinably) bound-correct on all data.
+When dimension sizes get reused, \CFA has an advantage over \CC-vector at getting simply written programs well optimized.
+\begin{figure}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{llll@{}}
+\multicolumn{1}{c}{C} &
+\multicolumn{1}{c}{\CFA} &
+\multicolumn{1}{c}{\CC (nested vector)}
+\\[1em]
+\lstinput{20-37}{ar-bchk/treatment.c} &
+\lstinput{20-37}{ar-bchk/treatment.cfa} &
+\lstinput{20-37}{ar-bchk/treatment.cc}
+\end{tabular}
+\caption{Overhead comparison, differentiation case, source.
+}
+\label{f:ovhd-treat-src}
+\end{figure}
+\begin{figure}
+\newcolumntype{C}[1]{>{\centering\arraybackslash}p{#1}}
+\begin{tabular}{C{1.5in}C{1.5in}C{1.5in}p{1in}}
+C &
+\CFA &
+\CC (nested vector)
+\\[1em]
+\multicolumn{4}{l}{
+        \includegraphics[page=3,trim=0 4in 2in 0,clip]{ar-bchk}
+}
+\end{tabular}
+\caption{Overhead comparison, differentiation case, assembly.
+}
+\label{f:ovhd-treat-asm}
+\end{figure}
+The example case of \VRef[Figures]{f:ovhd-treat-src} and \VRef{f:ovhd-treat-asm} is simple matrix multiplication over a row-major encoding.
+Simple means coding directly to the intuition of the mathematical definition, without trying to optimize for memory layout.
+In the assembly code of \VRef[Figures]{f:ovhd-treat-asm}, the looping pattern of \VRef[Figure]{f:ovhd-ctl} (d, e, f), ``Skip aheas on zero; loop back for next,'' occurs with three nesting levels.
+Simultaneously, the dynamic bound-check pattern of \VRef[Figure]{f:ovhd-ctl} (h,~i), ``Get out on invalid,'' occurs, targeting @.L7@, @.L9@ and @.L8@.
+Here, \VRef[Figures]{f:ovhd-treat-asm} shows the \CFA solution optimizing into practically the C solution, while the \CC solution shows added runtime bound checks.
+Like in \VRef[Figure]{f:ovhd-ctl} (e), the significance is the \emph{absence} of library-imposed runtime checks, even though the source code is working through the the \CFA @array@ library.
+The optimizer removed the library-imposed checks because the data strructure @array@-of-@array@ is constained by its type to be shaped correctly for the intuitive looping.
+In \CC, the same constraint does not apply to @vector@-of-@vector@.
+Because every individual @vector@ carries its own size, two types of bound mistakes are possible.
+Firstly, the three structures received may not be matrices at all, per the obvious/dense/total interpretation; rather, any one might be ragged-right in its rows.
+The \CFA solution guards against this possibility by encoding the minor length (number of columns) in the major element (row) type.
+In @res@, for example, each of the @M@ rows is @array(float, N)@, guaranteeing @N@ cells within it.
+Or more technically, guaranteeing @N@ as the basis for the imposed bound check \emph{of every row.}
+The second type of \CC bound mistake is that its types do not impose the mathematically familiar constraint of $(M \times P) (P \times N) \rightarrow (M \times N)$.
+Even assuming away the first issue, \eg that in @lhs@, all minor/cell counts agree, the data format allows the @rhs@ major/row count to disagree.
+Or, the possibility that the row count @res.size()@ disagrees with the row count @lhs.size()@ illustrates this bound-mistake type in isolation.
+The \CFA solution guards against this possibility by keeping length information separate from the array data, and therefore eligible for sharing.
+This capability lets \CFA escape the one-to-one correspondence between array instances and symbolic bounds, where this correspondence leaves a \CC-vector programmer stuck with a matrix representation that repeats itself.
+It is important to clarify that the \CFA solution does not become unsafe (like C) in losing its dynamic checks, even though it does become fast (as C) in losing them.
+The dynamic chekcs were dismissed as unnecessary \emph{because} the program was safe to begin with.
+To regain performance, a \CC programmer is left needing to state appropriate assertions or assumptions, to convince the optimizer to dismiss the runtime checks.
+Especially considering that two of them are in the inner-most loop.
+The solution is nontrivial.
+It requires doing the work of the inner-loop checks as a preflight step.
+But this work still takes looping; doing it upfront gives too much separation for the optimizer to see ``has been checked already'' in the deep loop.
+So, the programmer must restate the preflight observation within the deep loop, but this time as an unchecked assumption.
+Such assumptions are risky because they introduce further undefined behaviour when misused.
+Only the programmer's discipline remains to ensure this work is done without error.
+The \CFA solution lets a simply stated program have dynamic guards that catch bugs, while letting a simply stated bug-free program run as fast as the unguarded C equivalent.
+\begin{comment}
+The ragged-right issue brings with it a source-of-truth difficulty: Where, in the \CC version, is one to find the value of $N$?  $M$ can come from either @res@'s or @lhs@'s major/row count, and checking these for equality is straightforward.  $P$ can come from @rhs@'s major/row count.  But $N$ is only available from columns, \ie minor/cell counts, which are ragged.  So any choice of initial source of truth, \eg
+\end{comment}
 \section{Array Lifecycle}
 …
 \section{Array Comparison}
 \subsection{Rust}
 …
 \subsection{Java}
+\label{JavaCompare}
 Java arrays are references, so multi-dimension arrays are arrays-of-arrays \see{\VRef{toc:mdimpl}}.
 …
 \subsection{\CC}
+\label{CppCompare}
 Because C arrays are difficult and dangerous, the mantra for \CC programmers is to use @std::vector@ in place of the C array.
+While the vector size can grow and shrink dynamically, \vs a fixed-size dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration (one dynamic call) to avoid using @push_back@.
+While the vector size can grow and shrink dynamically, \vs an unchanging dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration.
+So, it costs one upfront dynamic allocation and avoids growing the arry through pushing.
 \begin{c++}
 vector< vector< int > > m( 5, vector<int>(8) ); // initialize size of 5 x 8 with 6 dynamic allocations
 \end{c++}
 Multidimensional arrays are arrays-of-arrays with associated costs.
+Each @vector@ array has an array descriptor contain the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@.
+Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate and copy, and never invalidate references to contained values.
+This scheme matches Java's safety and expressiveness exactly, but with the inherent costs.
+Each @vector@ array has an array descriptor containing the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@.
+Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate-and-copy, and never invalidating references to contained values.
+This scheme matches a Java array's safety and expressiveness exactly, with the same inherent costs.
+Notably, a \CC vector of vectors does not provide contiguous element storage (even when upfront allocation is done carefully) because a vector puts its elements in an auxiliary allocation.
+So, in spite of @vector< vector< int > >@ appearing to be ``everything by value,'' it is still a first array whose elements include pointers to further arrays.
+\CC has options for working with memory-adjacent data in desirable ways, particularly in recent revisions.
+But none makes an allocation with a dynamically given fixed size less awkward than the vector arrangement just described.
+\CC~26 adds @std::inplace_vector@, which provides an interesting vector--array hybrid,\footnote{
+  Like a vector, it lets a user construct elements in a loop, rather than imposing uniform construction.
+  Yet it preserves \lstinline{std::array}'s ability to be entirely stack-allocated, by avoiding an auxiliary elements' allocation.
+} but does not change the fundamental limit of \lstinline{std::array}, that the length, being a template parameter, must be a static value.
+\CC~20's @std::span@ is a view that unifies true arrays, vectors, static sizes and dynamic sizes, under a common API that adds bound checking.
+When wrapping a vector, bound checking occurs on regular subscripting; one needn't remember to use @.at@.
+When wrapping a locally declared fixed-size array, bound communication is implicit.
+But it has a soundness gap by offering construction from pointer and user-given length.
+\CC~23's @std::mdspan@ adds multidimensional indexing and reshaping capabilities analogous to those built into \CFA's @array@.
+Like @span@, it works over multiple underlying container types.
+But neither @span@ nor @mdspan@ augments the available allocation options.
+Thus, these options do not offer an allocation with a dynamically given fixed size.
+And furthermore, they do not provide any improvement to the C flexible array member pattern, for making a dynamic amount of storage contiguous with its header, as do \CFA's accordions.
 \subsection{Levels of Dependently Typed Arrays}
+TODO: fix the position; checked c does not do linear types
 \CFA's array is the first lightweight application of dependently-typed bound tracking to an extension of C.
 Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly-typed ownership-system, which further helps guarantee statically the validity of every pointer deference.
 …
 it can also do these other cool checks, but watch how I can mess with its conservativeness and termination
 Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer novel contributions concerning similar, restricted dependent types for tracking array length.
+Two contemporary array-centric languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, contribute similar, restricted dependent types for tracking array length.
 Unlike \CFA, both are garbage-collected functional languages.
 Because they are garbage-collected, referential integrity is built-in, meaning that the heavyweight analysis, that \CFA aims to avoid, is unnecessary.
 …
 There is a particular emphasis on an existential type, enabling callee-determined return shapes.
+Dex uses a novel conception of size, embedding its quantitative information completely into an ordinary type.
+Futhark and full-strength dependently typed languages treat array sizes are ordinary values.
+Dex uses an Ada-style conception of size, embedding quantitative information completely into an ordinary type.
+Futhark and full-strength dependently typed languages treat array sizes as ordinary values.
 Futhark restricts these expressions syntactically to variables and constants, while a full-strength dependent system does not.
 \CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet has no instances.
+\CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet no objects of type @[N]@ occur.
 Belonging to the type system means it is inferred at a call site and communicated implicitly, like in Dex and unlike in Futhark.
 Having no instances means there is no type for a variable @i@ that constrains @i@ to be in the range for @N@, unlike Dex, [TODO: verify], but like Futhark.
 …
 In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, \CC, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.
+The generality has obvious aesthetic benefits for programmers working on scheduling resources to weekdays, and for programmers who prefer to count from an initial number of their own choosing.
+The Ada--Dex generality has aesthetic benefits for certain programmers.  For those working on scheduling resources to weekdays:
+For those who prefer to count from an initial number of their own choosing:
 This change of perspective also lets us remove ubiquitous dynamic bound checks.
 …
 (unsafe_from_ordinal a (idiv o bs), unsafe_from_ordinal b (rem o bs))
 \end{cfa}
+% fix mike's syntax highlighter
 and by a user-provided adapter expression at the call site that shows how to indexing with a tuple is backed by indexing each dimension at a time
 \begin{cfa}
 …
 Using a compiler-produced value eliminates an opportunity for user error.
 TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
+...someday... fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
 Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
 …
 (*ar2)[5];
 \end{cfa}
 Using ``reference to array'' works at resolving this issue.  TODO: discuss connection with Doug-Lea \CC proposal.
+Using ``reference to array'' works at resolving this issue.  ...someday... discuss connection with Doug-Lea \CC proposal.
 \begin{cfa}
 tm (&ar3)[10] = *alloc();
 …
 Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
 TODO xref C standard does not claim that @ar1@ may be subscripted,
+...someday...  xref C standard does not claim that @ar1@ may be subscripted,
 because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
 But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
 …
 The ``reference to array'' type has its sore spots too.
 TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
 TODO: I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
+...someday...  see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
+...someday... I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
 \end{comment}

libcfa/src/collections/array.hfa

-              r0210a543
+              reb0d9b7
+}
+// Further form of -[-,-,-] that avoids using the trait system.
+// Above overloads work for any type with (recursively valid) subscript operator,
+// provided said subscript is passed as an assertion.
+// Below works only on arpk variations but never passes its subscript though an assertion.
+//
+// When arpk implements the trait used above,
+// the critical assertion is backed by a nontrivial thunk.
+// There is no "thunk problem" (lifetime) issue, when used as shown in the test suite.
+// But the optimizer has shown difficulty removing these thunks in cases where "it should,"
+// i.e. when all user code is in one compilation unit.
+// Not that every attempt at removing such a thunk fails; cases have been found going both ways.
+// Cases have been found with unnecessary bound-checks removed successfully,
+// on user code written against the overloads below,
+// but where these bound checks (which occur within `call`ed thunks) are not removed,
+// on user code written against the overloads above.
+//
+// The overloads below provide specializations of the above
+// that are a little harder to use than the ones above,
+// but where array API erasure has been seen to be more effective.
+// Note that the style below does not appeal to a case where thunk inlining is more effective;
+// rather, it simply does not rely on thunks in the first place.
+//
+// Both usage styles are shown in test array-md-sbscr-cases#numSubscrTypeCompatibility,
+// with the more general one above being "high abstraction,"
+// and the more performant one below being "mid abstraction" and "low abstraction."
+//
+// A breadth of index types is not given here (providing -[size_t,size_t,...] only)
+// because these declarations are not feeding a trait, so safe implicit arithmetic conversion kiks in.
+// Even so, there may still be an un-met need for accepting
+// either ptrdiff_t or size_t (signed or unsigned)
+// because Mike has seen the optimizer resist removing bound checks when sign-conversion is in play.
+// "Only size_t" is meeting today's need
+// and no solution is known that avoids 2^D overloads for D dimensions
+// while offering multiple subscript types and staying assertion-free.
+//
+// This approach, of avoiding traits entirely, is likely incompatible with the original desire
+// to have one recursive multidimensional subscript operator (TRY_BROKEN_DESIRED_MD_SUBSCRIPT).
+// To make a single declaration work,
+// we would probably have to get better at coaxing the optimizer into inlining thunks.
+forall( [N2], S2*, [N1], S1*, Timmed1, Tbase )
+static inline Timmed1 & ?[?]( arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ) & this, size_t ix2, size_t ix1 ) {
+        return this[ix2][ix1];
+}
+forall( [N3], S3*, [N2], S2*, [N1], S1*, Timmed1, Tbase )
+static inline Timmed1 & ?[?]( arpk( N3, S3, arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ), Tbase ) & this, size_t ix3, size_t ix2, size_t ix1 ) {
+        return this[ix3][ix2][ix1];
+}
+forall( [N4], S4*, [N3], S3*, [N2], S2*, [N1], S1*, Timmed1, Tbase )
+static inline Timmed1 & ?[?]( arpk( N4, S4, arpk( N3, S3, arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ), Tbase ), Tbase ) & this, size_t ix4, size_t ix3, size_t ix2, size_t ix1 ) {
+        return this[ix4][ix3][ix2][ix1];
+}
 #endif

tests/array-collections/array-md-sbscr-cases.cfa

-              r0210a543
+              reb0d9b7
+}
+// common function body, working on parameter wxyz, but for wxyz being different types
+#define CHECK_NUM_SUBSCR_TYPE_COMPAT                                         \
+    valExpected = getMagicNumber(2, 3, 4, 5);                                \
+    assert(( wxyz[2] [3] [4] [5] == valExpected ));                          \
+    assert(( wxyz[2,  3] [4] [5] == valExpected ));                          \
+    assert(( wxyz[2] [3,  4] [5] == valExpected ));                          \
+    assert(( wxyz[2] [3] [4,  5] == valExpected ));                          \
+    assert(( wxyz[2,  3,  4] [5] == valExpected ));                          \
+    assert(( wxyz[2] [3,  4,  5] == valExpected ));                          \
+    assert(( wxyz[2,  3,  4,  5] == valExpected ));                          \
+    for ( i; Nw ) {                                                          \
+        assert(( wxyz[ i, 3, 4, 5 ] == getMagicNumber(i, 3, 4, 5) ));        \
+    }                                                                        \
+    for ( i; Nx ) {                                                          \
+        assert(( wxyz[ 2, i, 4, 5 ] == getMagicNumber(2, i, 4, 5) ));        \
+    }                                                                        \
+    for ( i; Ny ) {                                                          \
+        assert(( wxyz[ 2, 3, i, 5 ] == getMagicNumber(2, 3, i, 5) ));        \
+    }                                                                        \
+    for ( i; Nz ) {                                                          \
+        assert(( wxyz[ 2, 3, 4, i ] == getMagicNumber(2, 3, 4, i) ));        \
+    }
+#define CHECK_NUM_SUBSCR_TYPE_COMPAT_ADDENDUM_RESHAPE                        \
+    for ( i; Nw ) {                                                          \
+        assert(( wxyz[ i, all, 4, 5 ][3] == getMagicNumber(i, 3, 4, 5) ));   \
+    }                                                                        \
+    for ( i; Nw ) {                                                          \
+        assert(( wxyz[ all, 3, 4, 5 ][i] == getMagicNumber(i, 3, 4, 5) ));   \
+    }
+// Low abstraction: simple declaration, cannot send a slice, can make a slice, runs fast
+forall( [Nw], [Nx], [Ny], [Nz] )
+void test_numSubscrTypeCompatibility_lo( array( float, Nw, Nx, Ny, Nz ) & wxyz ) {
+    CHECK_NUM_SUBSCR_TYPE_COMPAT
+    CHECK_NUM_SUBSCR_TYPE_COMPAT_ADDENDUM_RESHAPE
+}
+// Medium abstraction: complex declaration, can send or make a slice, runs fast
+forall( [Nw], Sw*
+      , [Nx], Sx*
+      , [Ny], Sy*
+      , [Nz], Sz* )
+void test_numSubscrTypeCompatibility_mid(
+        arpk( Nw, Sw,
+        arpk( Nx, Sx,
+        arpk( Ny, Sy,
+        arpk( Nz, Sz, float, float)
+                           , float)
+                           , float)
+                           , float) & wxyz
+    ) {
+    CHECK_NUM_SUBSCR_TYPE_COMPAT
+    CHECK_NUM_SUBSCR_TYPE_COMPAT_ADDENDUM_RESHAPE
+}
+// High abstraction: mid-complexity declaration, can send a slice or a non-arpk, cannot make a slice, may not run fast
+forall( [Nw], Awxyz &
+      , [Nx], Axyz &
+      , [Ny], Ayz &
+      , [Nz], Az &
+      | ar( Awxyz, Axyz, Nw )
+      | ar( Axyz, Ayz, Nx )
+      | ar( Ayz, Az, Ny )
+      | ar( Az, float, Nz ) )
+void test_numSubscrTypeCompatibility_hi( Awxyz & wxyz ) {
+    CHECK_NUM_SUBSCR_TYPE_COMPAT
+}
 forall( [Nw], [Nx], [Ny], [Nz] )
 void test_numSubscrTypeCompatibility( tag(Nw), tag(Nx), tag(Ny), tag(Nz) ) {
 …
     fillHelloData(wxyz);
+    valExpected = getMagicNumber(2, 3, 4, 5);
+    assert(( wxyz [2] [3] [4] [5]  == valExpected ));
+    assert(( wxyz[2,  3][4] [5]  == valExpected ));
+    assert(( wxyz [2][3,  4][5]  == valExpected ));
+    assert(( wxyz [2] [3][4,  5] == valExpected ));
+    assert(( wxyz[2,  3,  4][5]  == valExpected ));
+    assert(( wxyz [2][3,  4,  5] == valExpected ));
+    assert(( wxyz[2,  3,  4,  5] == valExpected ));
+    for ( i; Nw ) {
+        assert(( wxyz[ i, 3, 4, 5 ] == getMagicNumber(i, 3, 4, 5) ));
+    }
+    for ( i; Nx ) {
+        assert(( wxyz[ 2, i, 4, 5 ] == getMagicNumber(2, i, 4, 5) ));
+    }
+    for ( i; Ny ) {
+        assert(( wxyz[ 2, 3, i, 5 ] == getMagicNumber(2, 3, i, 5) ));
+    }
+    for ( i; Nz ) {
+        assert(( wxyz[ 2, 3, 4, i ] == getMagicNumber(2, 3, 4, i) ));
+    }
+    for ( i; Nw ) {
+        assert(( wxyz[ i, all, 4, 5 ][3] == getMagicNumber(i, 3, 4, 5) ));
+    }
+    for ( i; Nw ) {
+        assert(( wxyz[ all, 3, 4, 5 ][i] == getMagicNumber(i, 3, 4, 5) ));
+    }
+    test_numSubscrTypeCompatibility_lo ( wxyz );
+    test_numSubscrTypeCompatibility_mid( wxyz );
+    test_numSubscrTypeCompatibility_hi ( wxyz );
+}

Note: See TracChangeset for help on using the changeset viewer.