Changeset eb0d9b7


Ignore:
Timestamp:
Dec 20, 2025, 4:52:54 AM (6 hours ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Parents:
0210a543
Message:

Improve libcfa-array's bound-check removal and write that thesis section.

The libcfa change adds a more performant alternative for a subset of multidimensional indexing cases that were already functionally correct.
That the new alternative is more performant is not shown in the test suite.
There is an associated new high-performance option for passing an array-or-slice to a function.
The added test cases cover those options.

The added in-thesis demos rely on the new more-performant alternative for multidimensional indexing.

Files:
9 added
3 deleted
4 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/Makefile

    r0210a543 reb0d9b7  
    1111BibRep = ../../bibliography
    1212
     13strip-top-dir = $(foreach p,$1,$(patsubst %/,%,$(subst $(firstword $(subst /, ,$(p)))/,, $(p))))
     14
     15.SUFFIXES:  # disable make built-in rules
     16
    1317TeXSRC = ${wildcard *.tex}
    1418PicSRC = ${notdir ${wildcard ${Pictures}/*.png}} ${notdir ${wildcard ${Pictures}/*.fig}}
     
    1923PgmSRC = ${notdir ${wildcard ${Programs}/*}}
    2024RunPgmSRC = ${notdir ${wildcard ${Programs}/*.run.*}}
     25NondemoPgmSRC = ${call strip-top-dir,${wildcard ${Programs}/*/*.c*}}
    2126BibSRC = ${wildcard *.bib}
    2227
     
    5762# File Dependencies
    5863
    59 ${DOCUMENT}: ${TeXSRC} $(RunPgmOut) ${DemoPgmOut} ${PlotSRC} ${PicSRC} ${BibSRC} ${BibRep}/pl.bib ${LaTMac}/common.tex Makefile | ${Build}
     64${DOCUMENT}: ${TeXSRC} $(RunPgmOut) ${DemoPgmOut} ${NondemoPgmSRC} ${PlotSRC} ${PicSRC} ${BibSRC} ${BibRep}/pl.bib ${LaTMac}/common.tex Makefile | ${Build}
    6065        echo ${PicSRC}
    6166        echo ${GraphSRC_OLD}
  • doc/theses/mike_brooks_MMath/array.tex

    r0210a543 reb0d9b7  
    9292is a sufficient postcondition.
    9393In an imperative language like C and \CFA, it is also necessary to discuss side effects, for which an even heavier formalism, like separation logic, is required.
    94 Secondly, TODO: bash Rust.
    95 % TODO: cite the crap out of these claims.
     94Secondly: ... bash Rust.
     95... cite the crap out of these claims.
    9696\end{comment}
    9797
     
    698698An expression-compatibility question is a new addition to the \CFA compiler, and occurs in the context of dimension expressions, and possibly enumerations assigns, which must be unique.
    699699
    700 % TODO: ensure these compiler implementation matters are treated under \CFA compiler background: type unification, cost calculation, GenPoly.
     700% ... ensure these compiler implementation matters are treated under \CFA compiler background: type unification, cost calculation, GenPoly.
    701701
    702702The relevant technical component of the \CFA compiler is the standard type-unification within the type resolver.
     
    732732\end{cfa}
    733733when resolving a function call to @g@, the first unification stage compares the type @T@ of the parameter with @array( float, n + 1 )@, of the argument.
    734 \PAB{TODO: finish.}
    735734
    736735The actual rules for comparing two dimension expressions are conservative.
     
    738737is to imply, ``all else being equal, allow an array with length calculated by $e_1$
    739738to be passed to a function expecting a length-$e_2$ array.''\footnote{
    740         TODO: Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
     739        ... Deal with directionality, that I'm doing exact-match, no ``at least as long as,'' no subtyping.
    741740        Should it be an earlier scoping principle?  Feels like it should matter in more places than here.}
    742741So, a ``yes'' answer must represent a guarantee that both expressions evaluate the
     
    752751So, a variable and a literal can never match.
    753752
    754 TODO: Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation
     753... Discuss the interaction of this dimension hoisting with the challenge of extra unification for cost calculation
    755754\end{comment}
    756755
     
    10251024
    10261025
    1027 \section{Bound Checks, Added and Removed}
    1028 
     1026\section{Zero Overhead}
     1027
     1028At runtime, the \CFA array is exactly a C array.
    10291029\CFA array subscripting is protected with runtime bound checks.
    1030 The array dependent-typing provides information to the C optimizer allowing it remove many of the bound checks.
    1031 This section provides a demonstration of the effect.
    1032 
    1033 The experiment compares the \CFA array system with the padded-room system [TODO:xref] most typically exemplified by Java arrays, but also reflected in the \CC pattern where restricted vector usage models a checked array.
    1034 The essential feature of this padded-room system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
    1035 The experiment compares with the \CC version to keep access to generated assembly code simple.
     1030The array dependent-typing provides information to the C optimizer, allowing it remove many of the bound checks.
     1031This section provides a demonstration of these effects.
     1032
     1033The experiment compares the \CFA array system with the simple-safety system most typically exemplified by Java arrays (\VRef[Section]{JavaCompare}), but also reflected in the \CC pattern where restricted vector usage models a checked array (\VRef[Section]{CppCompare}).
     1034The essential feature of this simple system is the one-to-one correspondence between array instances and the symbolic bounds on which dynamic checks are based.
     1035The experiment uses the \CC version to simplify access to generated assembly code.
     1036While ``\CC'' labels a participant, it is really the simple-safety system (of @vector@ with @.at@) whose limitaitons are being explained, and not limitations of \CC optimization.
    10361037
    10371038As a control case, a simple loop (with no reused dimension sizes) is seen to get the same optimization treatment in both the \CFA and \CC versions.
    1038 When the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
     1039Firstly, when the programmer treats the array's bound correctly (making the subscript ``obviously fine''), no dynamic bound check is observed in the program's optimized assembly code.
    10391040But when the bounds are adjusted, such that the subscript is possibly invalid, the bound check appears in the optimized assembly, ready to catch an occurrence the mistake.
    10401041
    1041 TODO: paste source and assembly codes
    1042 
    1043 Incorporating reuse among dimension sizes is seen to give \CFA an advantage at being optimized.
    1044 The case is naive matrix multiplication over a row-major encoding.
    1045 
    1046 TODO: paste source codes
    1047 
     1042\begin{figure}
     1043\setlength{\tabcolsep}{10pt}
     1044\begin{tabular}{llll@{}}
     1045\multicolumn{1}{c}{C} &
     1046\multicolumn{1}{c}{\CFA} &
     1047\multicolumn{1}{c}{\CC (nested vector)}
     1048\\[1em]
     1049\lstinput{20-28}{ar-bchk/control.c} &
     1050\lstinput{20-28}{ar-bchk/control.cfa} &
     1051\lstinput{20-28}{ar-bchk/control.cc}
     1052\\
     1053\multicolumn{1}{c}{(a)} &
     1054\multicolumn{1}{c}{(b)} &
     1055\multicolumn{1}{c}{(c)}
     1056\\[1em]
     1057\multicolumn{4}{l}{
     1058        \includegraphics[page=1,trim=0 9.125in 2in 0,clip]{ar-bchk}
     1059}
     1060\\
     1061\multicolumn{1}{c}{(d)} &
     1062\multicolumn{1}{c}{(e)} &
     1063\multicolumn{1}{c}{(f)}
     1064\\[1em]
     1065\multicolumn{4}{l}{
     1066        \includegraphics[page=2,trim=0 8.75in 2in 0,clip]{ar-bchk}
     1067}
     1068\\
     1069\multicolumn{1}{c}{(g)} &
     1070\multicolumn{1}{c}{(h)} &
     1071\multicolumn{1}{c}{(i)}
     1072\end{tabular}
     1073\caption{Overhead comparison, control case.
     1074All three programs/compilers generate equally performant code when the programmer ``self-checks'' bounds via a loop's condition.
     1075Yet both \CFA and \CC versions generate code that is slower and safer than C's when the programmer might overrun the bounds.}
     1076\label{f:ovhd-ctl}
     1077\end{figure}
     1078
     1079\VRef[Figure]{f:ovhd-ctl} gives a control-case example, of summing values in an array.
     1080Each column shows a this program in a different language: C (with no bound checking ever, a,~d,~g), \CFA (b,~e,~h), and \CC (c,~f,~i).
     1081The source-code functions in (a, b, c) can be compiled to have either correct or incorrect uses of bounds.
     1082When compiling for correct bound use, the @BND@ macro passes its argument through, so the loops cover exactly the passed array sizes.
     1083When compiling for incorrect bound use (a programming error), the @BND@ macro hardcodes the loops for 100 iterations, which implies out-of-bound access attempts when the passed array has fewer than 100 elements.
     1084The assembly code excerpts show optimized translations, for correct-bound mode in (d, e, f) and incorrect-bound mode in (g, h, i).
     1085Because incorrect-bound mode hardcodes 100 iterations, the loop always executes a first time, so this mode does not have the @n == 0@ branch seen in correct-bound mode.
     1086For C, this difference is the only (d--g) difference.
     1087For correct bounds, the \CFA translation equals the C translation, \ie~there is no (d--e) difference.
     1088It is practically so for \CC too, modulo the additional indirection of dereferencing into the vector's auxiliary allocation.
     1089
     1090Differences arise when the bound-incorrect programs are passed an array shorter than 100.
     1091The C version, (g), proceeds with undefined behaviour, reading past the end of the passed array.
     1092The \CFA and \CC versions, (h, i), flag the mistake with the runtime check, the @i >= n@ branch.
     1093This check is provided implicitly by their library types, @array@ and @vector@ respectively.
     1094The significant result here is these runtime checks being \emph{absent} from the bound-correct translations, (e, f).
     1095The code optimizer has removed them because, at the point where they would occur (immediately past @.L28@/@.L3@), it knows from the surrounding control flow either @i == 0 && 0 < n@ or @i < n@ (directly), \i.e. @i < n@ either way.
     1096So a repeat of this check, the branch and its consequent code (raise error) are all removable.
     1097
     1098Progressing from the control case, a deliberately bound-incorrect mode is no longer informative.
     1099Rather, given a (well-typed) program that does good work on good data, the issue is whether it is (determinably) bound-correct on all data.
     1100
     1101When dimension sizes get reused, \CFA has an advantage over \CC-vector at getting simply written programs well optimized.
     1102
     1103\begin{figure}
     1104\setlength{\tabcolsep}{10pt}
     1105\begin{tabular}{llll@{}}
     1106\multicolumn{1}{c}{C} &
     1107\multicolumn{1}{c}{\CFA} &
     1108\multicolumn{1}{c}{\CC (nested vector)}
     1109\\[1em]
     1110\lstinput{20-37}{ar-bchk/treatment.c} &
     1111\lstinput{20-37}{ar-bchk/treatment.cfa} &
     1112\lstinput{20-37}{ar-bchk/treatment.cc}
     1113\end{tabular}
     1114\caption{Overhead comparison, differentiation case, source.
     1115}
     1116\label{f:ovhd-treat-src}
     1117\end{figure}
     1118
     1119\begin{figure}
     1120\newcolumntype{C}[1]{>{\centering\arraybackslash}p{#1}}
     1121\begin{tabular}{C{1.5in}C{1.5in}C{1.5in}p{1in}}
     1122C &
     1123\CFA &
     1124\CC (nested vector)
     1125\\[1em]
     1126\multicolumn{4}{l}{
     1127        \includegraphics[page=3,trim=0 4in 2in 0,clip]{ar-bchk}
     1128}
     1129\end{tabular}
     1130\caption{Overhead comparison, differentiation case, assembly.
     1131}
     1132\label{f:ovhd-treat-asm}
     1133\end{figure}
     1134
     1135The example case of \VRef[Figures]{f:ovhd-treat-src} and \VRef{f:ovhd-treat-asm} is simple matrix multiplication over a row-major encoding.
     1136Simple means coding directly to the intuition of the mathematical definition, without trying to optimize for memory layout.
     1137In the assembly code of \VRef[Figures]{f:ovhd-treat-asm}, the looping pattern of \VRef[Figure]{f:ovhd-ctl} (d, e, f), ``Skip aheas on zero; loop back for next,'' occurs with three nesting levels.
     1138Simultaneously, the dynamic bound-check pattern of \VRef[Figure]{f:ovhd-ctl} (h,~i), ``Get out on invalid,'' occurs, targeting @.L7@, @.L9@ and @.L8@.
     1139
     1140Here, \VRef[Figures]{f:ovhd-treat-asm} shows the \CFA solution optimizing into practically the C solution, while the \CC solution shows added runtime bound checks.
     1141Like in \VRef[Figure]{f:ovhd-ctl} (e), the significance is the \emph{absence} of library-imposed runtime checks, even though the source code is working through the the \CFA @array@ library.
     1142The optimizer removed the library-imposed checks because the data strructure @array@-of-@array@ is constained by its type to be shaped correctly for the intuitive looping.
     1143In \CC, the same constraint does not apply to @vector@-of-@vector@.
     1144Because every individual @vector@ carries its own size, two types of bound mistakes are possible.
     1145
     1146Firstly, the three structures received may not be matrices at all, per the obvious/dense/total interpretation; rather, any one might be ragged-right in its rows.
     1147The \CFA solution guards against this possibility by encoding the minor length (number of columns) in the major element (row) type.
     1148In @res@, for example, each of the @M@ rows is @array(float, N)@, guaranteeing @N@ cells within it.
     1149Or more technically, guaranteeing @N@ as the basis for the imposed bound check \emph{of every row.}
     1150
     1151The second type of \CC bound mistake is that its types do not impose the mathematically familiar constraint of $(M \times P) (P \times N) \rightarrow (M \times N)$.
     1152Even assuming away the first issue, \eg that in @lhs@, all minor/cell counts agree, the data format allows the @rhs@ major/row count to disagree.
     1153Or, the possibility that the row count @res.size()@ disagrees with the row count @lhs.size()@ illustrates this bound-mistake type in isolation.
     1154The \CFA solution guards against this possibility by keeping length information separate from the array data, and therefore eligible for sharing.
     1155This capability lets \CFA escape the one-to-one correspondence between array instances and symbolic bounds, where this correspondence leaves a \CC-vector programmer stuck with a matrix representation that repeats itself.
     1156
     1157It is important to clarify that the \CFA solution does not become unsafe (like C) in losing its dynamic checks, even though it does become fast (as C) in losing them.
     1158The dynamic chekcs were dismissed as unnecessary \emph{because} the program was safe to begin with.
     1159
     1160To regain performance, a \CC programmer is left needing to state appropriate assertions or assumptions, to convince the optimizer to dismiss the runtime checks.
     1161Especially considering that two of them are in the inner-most loop.
     1162The solution is nontrivial.
     1163It requires doing the work of the inner-loop checks as a preflight step.
     1164But this work still takes looping; doing it upfront gives too much separation for the optimizer to see ``has been checked already'' in the deep loop.
     1165So, the programmer must restate the preflight observation within the deep loop, but this time as an unchecked assumption.
     1166Such assumptions are risky because they introduce further undefined behaviour when misused.
     1167Only the programmer's discipline remains to ensure this work is done without error.
     1168
     1169The \CFA solution lets a simply stated program have dynamic guards that catch bugs, while letting a simply stated bug-free program run as fast as the unguarded C equivalent.
     1170
     1171\begin{comment}
     1172The ragged-right issue brings with it a source-of-truth difficulty: Where, in the \CC version, is one to find the value of $N$?  $M$ can come from either @res@'s or @lhs@'s major/row count, and checking these for equality is straightforward.  $P$ can come from @rhs@'s major/row count.  But $N$ is only available from columns, \ie minor/cell counts, which are ragged.  So any choice of initial source of truth, \eg
     1173\end{comment}
    10481174
    10491175\section{Array Lifecycle}
     
    12961422\section{Array Comparison}
    12971423
    1298 
    12991424\subsection{Rust}
    13001425
     
    14131538
    14141539\subsection{Java}
     1540\label{JavaCompare}
     1541
    14151542
    14161543Java arrays are references, so multi-dimension arrays are arrays-of-arrays \see{\VRef{toc:mdimpl}}.
     
    15011628
    15021629\subsection{\CC}
     1630\label{CppCompare}
    15031631
    15041632Because C arrays are difficult and dangerous, the mantra for \CC programmers is to use @std::vector@ in place of the C array.
    1505 While the vector size can grow and shrink dynamically, \vs a fixed-size dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration (one dynamic call) to avoid using @push_back@.
     1633While the vector size can grow and shrink dynamically, \vs an unchanging dynamic size with VLAs, the cost of this extra feature is mitigated by preallocating the maximum size (like the VLA) at the declaration.
     1634So, it costs one upfront dynamic allocation and avoids growing the arry through pushing.
    15061635\begin{c++}
    15071636vector< vector< int > > m( 5, vector<int>(8) ); // initialize size of 5 x 8 with 6 dynamic allocations
    15081637\end{c++}
    15091638Multidimensional arrays are arrays-of-arrays with associated costs.
    1510 Each @vector@ array has an array descriptor contain the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@.
    1511 Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate and copy, and never invalidate references to contained values.
    1512 This scheme matches Java's safety and expressiveness exactly, but with the inherent costs.
    1513 
     1639Each @vector@ array has an array descriptor containing the dimension, which allows bound checked using @x.at(i)@ in place of @x[i]@.
     1640Used with these restrictions, out-of-bound accesses are caught, and in-bound accesses never exercise the vector's ability to grow, preventing costly reallocate-and-copy, and never invalidating references to contained values.
     1641
     1642This scheme matches a Java array's safety and expressiveness exactly, with the same inherent costs.
     1643Notably, a \CC vector of vectors does not provide contiguous element storage (even when upfront allocation is done carefully) because a vector puts its elements in an auxiliary allocation.
     1644So, in spite of @vector< vector< int > >@ appearing to be ``everything by value,'' it is still a first array whose elements include pointers to further arrays.
     1645
     1646\CC has options for working with memory-adjacent data in desirable ways, particularly in recent revisions.
     1647But none makes an allocation with a dynamically given fixed size less awkward than the vector arrangement just described.
     1648
     1649\CC~26 adds @std::inplace_vector@, which provides an interesting vector--array hybrid,\footnote{
     1650  Like a vector, it lets a user construct elements in a loop, rather than imposing uniform construction.
     1651  Yet it preserves \lstinline{std::array}'s ability to be entirely stack-allocated, by avoiding an auxiliary elements' allocation.
     1652} but does not change the fundamental limit of \lstinline{std::array}, that the length, being a template parameter, must be a static value.
     1653
     1654\CC~20's @std::span@ is a view that unifies true arrays, vectors, static sizes and dynamic sizes, under a common API that adds bound checking.
     1655When wrapping a vector, bound checking occurs on regular subscripting; one needn't remember to use @.at@.
     1656When wrapping a locally declared fixed-size array, bound communication is implicit.
     1657But it has a soundness gap by offering construction from pointer and user-given length.
     1658
     1659\CC~23's @std::mdspan@ adds multidimensional indexing and reshaping capabilities analogous to those built into \CFA's @array@.
     1660Like @span@, it works over multiple underlying container types.
     1661But neither @span@ nor @mdspan@ augments the available allocation options.
     1662
     1663Thus, these options do not offer an allocation with a dynamically given fixed size.
     1664And furthermore, they do not provide any improvement to the C flexible array member pattern, for making a dynamic amount of storage contiguous with its header, as do \CFA's accordions.
    15141665
    15151666\subsection{Levels of Dependently Typed Arrays}
    15161667
     1668TODO: fix the position; checked c does not do linear types
    15171669\CFA's array is the first lightweight application of dependently-typed bound tracking to an extension of C.
    15181670Other extensions of C that apply dependently-typed bound tracking are heavyweight, in that the bound tracking is part of a linearly-typed ownership-system, which further helps guarantee statically the validity of every pointer deference.
     
    15481700it can also do these other cool checks, but watch how I can mess with its conservativeness and termination
    15491701
    1550 Two current, state-of-the-art array languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, offer novel contributions concerning similar, restricted dependent types for tracking array length.
     1702Two contemporary array-centric languages, Dex\cite{arr:dex:long} and Futhark\cite{arr:futhark:tytheory}, contribute similar, restricted dependent types for tracking array length.
    15511703Unlike \CFA, both are garbage-collected functional languages.
    15521704Because they are garbage-collected, referential integrity is built-in, meaning that the heavyweight analysis, that \CFA aims to avoid, is unnecessary.
     
    15591711There is a particular emphasis on an existential type, enabling callee-determined return shapes.
    15601712
    1561 
    1562 Dex uses a novel conception of size, embedding its quantitative information completely into an ordinary type.
    1563 
    1564 Futhark and full-strength dependently typed languages treat array sizes are ordinary values.
     1713Dex uses an Ada-style conception of size, embedding quantitative information completely into an ordinary type.
     1714
     1715Futhark and full-strength dependently typed languages treat array sizes as ordinary values.
    15651716Futhark restricts these expressions syntactically to variables and constants, while a full-strength dependent system does not.
    15661717
    1567 \CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet has no instances.
     1718\CFA's hybrid presentation, @forall( [N] )@, has @N@ belonging to the type system, yet no objects of type @[N]@ occur.
    15681719Belonging to the type system means it is inferred at a call site and communicated implicitly, like in Dex and unlike in Futhark.
    15691720Having no instances means there is no type for a variable @i@ that constrains @i@ to be in the range for @N@, unlike Dex, [TODO: verify], but like Futhark.
     
    15741725
    15751726In Ada and Dex, an array is conceived as a function whose domain must satisfy only certain structural assumptions, while in C, \CC, Java, Futhark and \CFA today, the domain is a prefix of the natural numbers.
    1576 The generality has obvious aesthetic benefits for programmers working on scheduling resources to weekdays, and for programmers who prefer to count from an initial number of their own choosing.
     1727The Ada--Dex generality has aesthetic benefits for certain programmers.  For those working on scheduling resources to weekdays:
     1728For those who prefer to count from an initial number of their own choosing:
    15771729
    15781730This change of perspective also lets us remove ubiquitous dynamic bound checks.
     
    16101762(unsafe_from_ordinal a (idiv o bs), unsafe_from_ordinal b (rem o bs))
    16111763\end{cfa}
     1764% fix mike's syntax highlighter
    16121765and by a user-provided adapter expression at the call site that shows how to indexing with a tuple is backed by indexing each dimension at a time
    16131766\begin{cfa}
     
    17021855Using a compiler-produced value eliminates an opportunity for user error.
    17031856
    1704 TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
     1857...someday... fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
    17051858
    17061859Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
     
    17111864(*ar2)[5];
    17121865\end{cfa}
    1713 Using ``reference to array'' works at resolving this issue.  TODO: discuss connection with Doug-Lea \CC proposal.
     1866Using ``reference to array'' works at resolving this issue.  ...someday... discuss connection with Doug-Lea \CC proposal.
    17141867\begin{cfa}
    17151868tm (&ar3)[10] = *alloc();
     
    17191872
    17201873Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
    1721 TODO xref C standard does not claim that @ar1@ may be subscripted,
     1874...someday... xref C standard does not claim that @ar1@ may be subscripted,
    17221875because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
    17231876But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
     
    17251878
    17261879The ``reference to array'' type has its sore spots too.
    1727 TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
    1728 
    1729 TODO: I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
     1880...someday... see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
     1881
     1882...someday... I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
    17301883\end{comment}
  • libcfa/src/collections/array.hfa

    r0210a543 reb0d9b7  
    242242}
    243243
     244// Further form of -[-,-,-] that avoids using the trait system.
     245// Above overloads work for any type with (recursively valid) subscript operator,
     246// provided said subscript is passed as an assertion.
     247// Below works only on arpk variations but never passes its subscript though an assertion.
     248//
     249// When arpk implements the trait used above,
     250// the critical assertion is backed by a nontrivial thunk.
     251// There is no "thunk problem" (lifetime) issue, when used as shown in the test suite.
     252// But the optimizer has shown difficulty removing these thunks in cases where "it should,"
     253// i.e. when all user code is in one compilation unit.
     254// Not that every attempt at removing such a thunk fails; cases have been found going both ways.
     255// Cases have been found with unnecessary bound-checks removed successfully,
     256// on user code written against the overloads below,
     257// but where these bound checks (which occur within `call`ed thunks) are not removed,
     258// on user code written against the overloads above.
     259//
     260// The overloads below provide specializations of the above
     261// that are a little harder to use than the ones above,
     262// but where array API erasure has been seen to be more effective.
     263// Note that the style below does not appeal to a case where thunk inlining is more effective;
     264// rather, it simply does not rely on thunks in the first place.
     265//
     266// Both usage styles are shown in test array-md-sbscr-cases#numSubscrTypeCompatibility,
     267// with the more general one above being "high abstraction,"
     268// and the more performant one below being "mid abstraction" and "low abstraction."
     269//
     270// A breadth of index types is not given here (providing -[size_t,size_t,...] only)
     271// because these declarations are not feeding a trait, so safe implicit arithmetic conversion kiks in.
     272// Even so, there may still be an un-met need for accepting
     273// either ptrdiff_t or size_t (signed or unsigned)
     274// because Mike has seen the optimizer resist removing bound checks when sign-conversion is in play.
     275// "Only size_t" is meeting today's need
     276// and no solution is known that avoids 2^D overloads for D dimensions
     277// while offering multiple subscript types and staying assertion-free.
     278//
     279// This approach, of avoiding traits entirely, is likely incompatible with the original desire
     280// to have one recursive multidimensional subscript operator (TRY_BROKEN_DESIRED_MD_SUBSCRIPT).
     281// To make a single declaration work,
     282// we would probably have to get better at coaxing the optimizer into inlining thunks.
     283
     284forall( [N2], S2*, [N1], S1*, Timmed1, Tbase )
     285static inline Timmed1 & ?[?]( arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ) & this, size_t ix2, size_t ix1 ) {
     286        return this[ix2][ix1];
     287}
     288
     289forall( [N3], S3*, [N2], S2*, [N1], S1*, Timmed1, Tbase )
     290static inline Timmed1 & ?[?]( arpk( N3, S3, arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ), Tbase ) & this, size_t ix3, size_t ix2, size_t ix1 ) {
     291        return this[ix3][ix2][ix1];
     292}
     293
     294forall( [N4], S4*, [N3], S3*, [N2], S2*, [N1], S1*, Timmed1, Tbase )
     295static inline Timmed1 & ?[?]( arpk( N4, S4, arpk( N3, S3, arpk( N2, S2, arpk( N1, S1, Timmed1, Tbase ), Tbase ), Tbase ), Tbase ) & this, size_t ix4, size_t ix3, size_t ix2, size_t ix1 ) {
     296        return this[ix4][ix3][ix2][ix1];
     297}
     298
     299
     300
    244301#endif
    245302
  • tests/array-collections/array-md-sbscr-cases.cfa

    r0210a543 reb0d9b7  
    231231}
    232232
     233// common function body, working on parameter wxyz, but for wxyz being different types
     234#define CHECK_NUM_SUBSCR_TYPE_COMPAT                                         \
     235    valExpected = getMagicNumber(2, 3, 4, 5);                                \
     236    assert(( wxyz[2] [3] [4] [5] == valExpected ));                          \
     237    assert(( wxyz[2,  3] [4] [5] == valExpected ));                          \
     238    assert(( wxyz[2] [3,  4] [5] == valExpected ));                          \
     239    assert(( wxyz[2] [3] [4,  5] == valExpected ));                          \
     240    assert(( wxyz[2,  3,  4] [5] == valExpected ));                          \
     241    assert(( wxyz[2] [3,  4,  5] == valExpected ));                          \
     242    assert(( wxyz[2,  3,  4,  5] == valExpected ));                          \
     243    for ( i; Nw ) {                                                          \
     244        assert(( wxyz[ i, 3, 4, 5 ] == getMagicNumber(i, 3, 4, 5) ));        \
     245    }                                                                        \
     246    for ( i; Nx ) {                                                          \
     247        assert(( wxyz[ 2, i, 4, 5 ] == getMagicNumber(2, i, 4, 5) ));        \
     248    }                                                                        \
     249    for ( i; Ny ) {                                                          \
     250        assert(( wxyz[ 2, 3, i, 5 ] == getMagicNumber(2, 3, i, 5) ));        \
     251    }                                                                        \
     252    for ( i; Nz ) {                                                          \
     253        assert(( wxyz[ 2, 3, 4, i ] == getMagicNumber(2, 3, 4, i) ));        \
     254    }
     255#define CHECK_NUM_SUBSCR_TYPE_COMPAT_ADDENDUM_RESHAPE                        \
     256    for ( i; Nw ) {                                                          \
     257        assert(( wxyz[ i, all, 4, 5 ][3] == getMagicNumber(i, 3, 4, 5) ));   \
     258    }                                                                        \
     259    for ( i; Nw ) {                                                          \
     260        assert(( wxyz[ all, 3, 4, 5 ][i] == getMagicNumber(i, 3, 4, 5) ));   \
     261    }
     262
     263// Low abstraction: simple declaration, cannot send a slice, can make a slice, runs fast
     264forall( [Nw], [Nx], [Ny], [Nz] )
     265void test_numSubscrTypeCompatibility_lo( array( float, Nw, Nx, Ny, Nz ) & wxyz ) {
     266    CHECK_NUM_SUBSCR_TYPE_COMPAT
     267    CHECK_NUM_SUBSCR_TYPE_COMPAT_ADDENDUM_RESHAPE
     268}
     269
     270// Medium abstraction: complex declaration, can send or make a slice, runs fast
     271forall( [Nw], Sw*
     272      , [Nx], Sx*
     273      , [Ny], Sy*
     274      , [Nz], Sz* )
     275void test_numSubscrTypeCompatibility_mid(
     276        arpk( Nw, Sw,
     277        arpk( Nx, Sx,
     278        arpk( Ny, Sy,
     279        arpk( Nz, Sz, float, float)
     280                           , float)
     281                           , float)
     282                           , float) & wxyz
     283    ) {
     284    CHECK_NUM_SUBSCR_TYPE_COMPAT
     285    CHECK_NUM_SUBSCR_TYPE_COMPAT_ADDENDUM_RESHAPE
     286}
     287
     288// High abstraction: mid-complexity declaration, can send a slice or a non-arpk, cannot make a slice, may not run fast
     289forall( [Nw], Awxyz &
     290      , [Nx], Axyz &
     291      , [Ny], Ayz &
     292      , [Nz], Az &
     293      | ar( Awxyz, Axyz, Nw )
     294      | ar( Axyz, Ayz, Nx )
     295      | ar( Ayz, Az, Ny )
     296      | ar( Az, float, Nz ) )
     297void test_numSubscrTypeCompatibility_hi( Awxyz & wxyz ) {
     298    CHECK_NUM_SUBSCR_TYPE_COMPAT
     299}
     300
    233301forall( [Nw], [Nx], [Ny], [Nz] )
    234302void test_numSubscrTypeCompatibility( tag(Nw), tag(Nx), tag(Ny), tag(Nz) ) {
     
    237305    fillHelloData(wxyz);
    238306
    239     valExpected = getMagicNumber(2, 3, 4, 5);
    240     assert(( wxyz [2] [3] [4] [5]  == valExpected ));
    241     assert(( wxyz[2,  3][4] [5]  == valExpected ));
    242     assert(( wxyz [2][3,  4][5]  == valExpected ));
    243     assert(( wxyz [2] [3][4,  5] == valExpected ));
    244     assert(( wxyz[2,  3,  4][5]  == valExpected ));
    245     assert(( wxyz [2][3,  4,  5] == valExpected ));
    246     assert(( wxyz[2,  3,  4,  5] == valExpected ));
    247 
    248     for ( i; Nw ) {
    249         assert(( wxyz[ i, 3, 4, 5 ] == getMagicNumber(i, 3, 4, 5) ));
    250     }
    251 
    252     for ( i; Nx ) {
    253         assert(( wxyz[ 2, i, 4, 5 ] == getMagicNumber(2, i, 4, 5) ));
    254     }
    255 
    256     for ( i; Ny ) {
    257         assert(( wxyz[ 2, 3, i, 5 ] == getMagicNumber(2, 3, i, 5) ));
    258     }
    259 
    260     for ( i; Nz ) {
    261         assert(( wxyz[ 2, 3, 4, i ] == getMagicNumber(2, 3, 4, i) ));
    262     }
    263 
    264     for ( i; Nw ) {
    265         assert(( wxyz[ i, all, 4, 5 ][3] == getMagicNumber(i, 3, 4, 5) ));
    266     }
    267 
    268     for ( i; Nw ) {
    269         assert(( wxyz[ all, 3, 4, 5 ][i] == getMagicNumber(i, 3, 4, 5) ));
    270     }
     307    test_numSubscrTypeCompatibility_lo ( wxyz );
     308    test_numSubscrTypeCompatibility_mid( wxyz );
     309    test_numSubscrTypeCompatibility_hi ( wxyz );
    271310}
    272311
Note: See TracChangeset for help on using the changeset viewer.