Context Navigation

-              r67748f9
+              r35fc819
 \label{c:Array}
 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language, resulting in the largest proportion of runtime errors and security violations.
+Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language \see{\VRef{s:Array}}, resulting in the largest proportion of runtime errors and security violations.
 This chapter describes the new \CFA language and library features that introduce a length-checked array type, @array@, to the \CFA standard library~\cite{Cforall}.
 …
 \label{s:ArrayIntro}
+The new \CFA array is declared by instantiating the generic @array@ type,
+much like instantiating any other standard-library generic type (such as \CC @vector@),
+though using a new style of generic parameter.
+The new \CFA array is declared by instantiating the generic @array@ type, much like instantiating any other standard-library generic type (such as \CC @vector@), using a new style of generic parameter.
 \begin{cfa}
 @array( float, 99 )@ x;                                 $\C[2.5in]{// x contains 99 floats}$
 …
 void f( @array( float, 42 )@ & p ) {}   $\C{// p accepts 42 floats}$
 f( x );                                                                 $\C{// statically rejected: type lengths are different, 99 != 42}$
 test2.cfa:3:1 error: Invalid application of existing declaration(s) in expression.
 Applying untyped:  Name: f ... to:  Name: x
 …
 g( x, 0 );                                                              $\C{// T is float, N is 99, dynamic subscript check succeeds}$
 g( x, 1000 );                                                   $\C{// T is float, N is 99, dynamic subscript check fails}$
 Cforall Runtime error: subscript 1000 exceeds dimension range [0,99) $for$ array 0x555555558020.
 \end{cfa}
 …
 forall( T & | sized(T) )
 T * alloc() {
         return @(T *)@malloc( @sizeof(T)@ );
+        return @(T *)@malloc( @sizeof(T)@ );    // C malloc
+}
 \end{cfa}
 …
 The loops follow the familiar pattern of using the variable @dim@ to iterate through the arrays.
 Most importantly, the type system implicitly captures @dim@ at the call of @f@ and makes it available throughout @f@ as @N@.
 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@.
+The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@; @N@ adapting in the same way at @f@'s loop bound; and a pass-thru use of @dim@ at @f@'s declaration of @ret@.
 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage.
 The result is a significant improvement in safety and usability.
 …
 \end{itemize}
+\VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}.
+\VRef[Figure]{f:TemplateVsGenericType} shows @N@ is not the same as a @size_t@ declaration in a \CC \lstinline[language=C++]{template}.\footnote{
+The \CFA program requires a snapshot declaration for \lstinline{n} to compile, as described at the end of \Vref{s:ArrayTypingC}.}
 \begin{enumerate}[leftmargin=*]
 \item
 The \CC template @N@ can only be a compile-time value, while the \CFA @N@ may be a runtime value.
 \item
 \CC does not allow a template function to be nested, while \CFA lets its polymorphic functions to be nested.
+\CC does not allow a template function to be nested, while \CFA allows polymorphic functions be nested.
 Hence, \CC precludes a simple form of information hiding.
 \item
 …
 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v10
 \end{enumerate}
+The \CC template @array@ type mitigates points \VRef[]{p:DimensionPassing} and \VRef[]{p:ArrayCopy}, but it is also trying to accomplish a similar mechanism to \CFA @array@.
+The \CC template @std::array@ tries to accomplish a similar mechanism to \CFA @array@.
+It is an aggregate type with the same semantics as a @struct@ holding a C-style array \see{\VRef{s:ArraysCouldbeValues}}, which mitigates points \VRef[]{p:DimensionPassing} and \VRef[]{p:ArrayCopy}.
 \begin{figure}
 \begin{tabular}{@{}l@{\hspace{20pt}}l@{}}
+\begin{tabular}{@{}ll@{}}
 \begin{c++}
 …
+}
 int main() {
         int ret[10], x[10];
         for ( int i = 0; i < 10; i += 1 ) x[i] = i;
         @copy<int, 10 >( ret, x );@
         for ( int i = 0; i < 10; i += 1 )
+        const size_t  n = 10;   // must be constant
+        int ret[n], x[n];
+        for ( int i = 0; i < n; i += 1 ) x[i] = i;
+        @copy<int, n >( ret, x );@
+        for ( int i = 0; i < n; i += 1 )
                 cout << ret[i] << ' ';
         cout << endl;
 …
                 for ( i; N ) ret[i] = x[i];
+        }
         const int n = promptForLength();
+        size_t  n;
+        sin | n;
         array( int, n ) ret, x;
         for ( i; n ) x[i] = i;
 …
 When the argument lengths themselves are statically unknown,
 the static check is conservative and, as always, \CFA's casting lets the programmer use knowledge not shared with the type system.
+\begin{tabular}{@{\hspace{0.5in}}l@{\hspace{1in}}l@{}}
+\lstinput{90-97}{hello-array.cfa}
+&
+\lstinput{110-117}{hello-array.cfa}
+\end{tabular}
+\noindent
+This static check's full rules are presented in \VRef[Section]{s:ArrayTypingC}.
+\lstinput{90-96}{hello-array.cfa}
+This static check's rules are presented in \VRef[Section]{s:ArrayTypingC}.
 Orthogonally, the \CFA array type works within generic \emph{types}, \ie @forall@-on-@struct@.
 The same argument safety and the associated implicit communication of array length occurs.
 Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing of element types.
 This has been extended to allow parameterizing by dimension.
 Doing so gives a refinement of C's ``flexible array member''~\cite[\S~6.7.2.1.18]{C11}.
+This feature is extended to allow parameterizing by dimension.
+Doing so gives a refinement of C's ``flexible array member''~\cite[\S~6.7.2.1.18]{C11}:
 \begin{cfa}
 struct S {
 …
 \end{cfa}
 This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members.
 Finally, inputs and outputs are given at the bottom for different sized schools.
+Finally, inputs and outputs are given on the right for different sized schools.
 The example program prints the courses in each student's preferred order, all using the looked-up display names.
 \begin{figure}
+\begin{cquote}
+\lstinput{50-55}{hello-accordion.cfa}
+\begin{lrbox}{\myboxA}
+\begin{tabular}{@{}l@{}}
+\lstinput{50-55}{hello-accordion.cfa} \\
 \lstinput{90-98}{hello-accordion.cfa}
+\ \\
+@$ cat school1@
+\lstinput{}{school1}
+@$ ./a.out < school1@
+\lstinput{}{school1.out}
+@$ cat school2@
+\lstinput{}{school2}
+@$ ./a.out < school2@
+\lstinput{}{school2.out}
+\end{cquote}
+\end{tabular}
+\end{lrbox}
+\begin{lrbox}{\myboxB}
+\begin{tabular}{@{}l@{}}
+@$ cat school1@ \\
+\lstinputlisting{school1} \\
+@$ ./a.out < school1@ \\
+\lstinputlisting{school1.out} \\
+@$ cat school2@ \\
+\lstinputlisting{school2} \\
+@$ ./a.out < school2@ \\
+\lstinputlisting{school2.out}
+\end{tabular}
+\end{lrbox}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}ll@{}}
+\usebox\myboxA
+&
+\usebox\myboxB
+\end{tabular}
 \caption{\lstinline{School} Example, Input and Output}
 …
 When a function operates on a @School@ structure, the type system handles its memory layout transparently.
 \lstinput{30-37}{hello-accordion.cfa}
+\lstinput{30-36}{hello-accordion.cfa}
 In the example, function @getPref@ returns, for the student at position @is@, what is the position of their @pref@\textsuperscript{th}-favoured class?
 …
 \section{Dimension Parameter Implementation}
 The core of the preexisting \CFA compiler already had the ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised).
+The core of the preexisting \CFA compiler already has the ``heavy equipment'' needed to provide the feature set just reviewed (up to bugs in cases not yet exercised).
 To apply this equipment in tracking array lengths, I encoded a dimension (array's length) as a type.
 The type in question does not describe any data that the program actually uses at runtime.
 …
 \begin{itemize}[leftmargin=*]
 \item
         Resolver provided values for a used declaration's type-system variables, gathered from type information in scope at the usage site.
 \item
         The box pass, encoding information about type parameters into ``extra'' regular parameters/arguments on declarations and calls.
+        Resolver provided values for a declaration's type-system variables, gathered from type information in scope at the usage site.
+\item
+        The box pass, encoding information about type parameters into ``extra'' regular parameters and arguments on declarations and calls.
         Notably, it conveys the size of a type @foo@ as a @__sizeof_foo@ parameter, and rewrites the @sizeof(foo)@ expression as @__sizeof_foo@, \ie a use of the parameter.
 \end{itemize}
 …
 The rules for resolution had to be restricted slightly, in order to achieve important refusal cases.
 This work is detailed in \VRef[Section]{s:ArrayTypingC}.
 However, the resolution--boxing scheme, in its preexisting state, was already equipped to work on (desugared) dimension parameters.
+However, the resolution--boxing scheme, in its preexisting state, is equipped to work on (desugared) dimension parameters.
 The following discussion explains the desugaring and how correctly lowered code results.
 …
 \end{enumerate}
 The chosen solution is to encode the value @N@ \emph{as a type}, so items 1 and 2 are immediately available for free.
 Item 3 needs a way to recover the encoded value from a (valid) type (and to reject invalid types occurring here).
+Item 3 needs a way to recover the encoded value from a (valid) type and to reject invalid types.
 Item 4 needs a way to produce a type that encodes the given value.
 …
         The type @thing(N)@ is (replaced by @void *@, but thereby effectively) gone.
 \item
         The @sout...@ expression (being an application of the @?|?@ operator) has a regular variable (parameter) usage for its second argument.
+        The @sout...@ expression has a regular variable (parameter) usage for its second argument.
 \item
         Information about the particular @thing@ instantiation (value 10) is moved, from the type, to a regular function-call argument.
 …
 \begin{cfa}
 enum { n = 42 };
 float x[@n@];   // or just 42
 float (*xp1)[@42@] = &x;    // accept
 float (*xp2)[@999@] = &x;   // reject
+float x[@n@];   $\C{// or just 42}$
+float (*xp1)[@42@] = &x;    $\C{// accept}$
+float (*xp2)[@999@] = &x;   $\C{// reject}$
 warning: initialization of 'float (*)[999]' from incompatible pointer type 'float (*)[42]'
 \end{cfa}
 When a variable is involved, C and \CFA take two different approaches.
 Today's C compilers accept the following without warning.
+Today's C compilers accept the following without a warning.
 \begin{cfa}
 static const int n = 42;
 …
 The way the \CFA array is implemented, the type analysis for this case reduces to a case similar to the earlier C version.
 The \CFA compiler's compatibility analysis proceeds as:
 \begin{itemize}[parsep=0pt]
+\begin{itemize}[leftmargin=*,parsep=0pt]
 \item
         Is @array( float, 999 )@ type-compatible with @array( float, n )@?
 …
         in order to preserve the length information that powers runtime bound-checking.}
 Therefore, the need to upgrade legacy C code is low.
 Finally, if this incompatibility is a problem onboarding C programs to \CFA, it is should be possible to change the C type check to a warning rather than an error, acting as a \emph{lint} of the original code for a missing type annotation.
+Finally, if this incompatibility is a problem onboarding C programs to \CFA, it should be possible to change the C type check to a warning rather than an error, acting as a \emph{lint} of the original code for a missing type annotation.
 To handle two occurrences of the same variable, more information is needed, \eg, this is fine,
 …
 int n = 42;
 float x[@n@];
 float (*xp)[@n@] = x;   // accept
+float (*xp)[@n@] = x;   $\C{// accept}$
 \end{cfa}
 where @n@ remains fixed across a contiguous declaration context.
 However, intervening dynamic statement cause failures.
+However, intervening dynamic statements can cause failures.
 \begin{cfa}
 int n = 42;
 float x[@n@];
 @n@ = 999; // dynamic change
 float (*xp)[@n@] = x;   // reject
 \end{cfa}
 However, side-effects can occur in a contiguous declaration context.
+@n@ = 999; $\C{// dynamic change}$
+float (*xp)[@n@] = x;   $\C{// reject}$
+\end{cfa}
+As well, side-effects can even occur in a contiguous declaration context.
 \begin{cquote}
 \setlength{\tabcolsep}{20pt}
 …
 void f() {
         float x[@n@] = { g() };
         float (*xp)[@n@] = x;   // reject
+        float (*xp)[@n@] = x;                   // reject
+}
 \end{cfa}
 …
 int @n@ = 42;
 void g() {
         @n@ = 99;
+        @n@ = 999;              // accept
+}
 …
 The issue here is that knowledge needed to make a correct decision is hidden by separate compilation.
 Even within a translation unit, static analysis might not be able to provide all the information.
 However, if the example uses @const@, the check is possible.
+However, if the example uses @const@, the check is possible even though the value is unknown.
 \begin{cquote}
 \setlength{\tabcolsep}{20pt}
 …
 void f() {
         float x[n] = { g() };
         float (*xp)[n] = x;   // reject
+        float (*xp)[n] = x;             // accept
+}
 \end{cfa}
 …
 @const@ int n = 42;
 void g() {
         @n = 99@; // allowed
+        @n = 999@;              // reject
+}
 …
 \end{comment}
 The conservatism of the new rule set can leave a programmer needing a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis.
+The conservatism of the new rule set can leave a programmer requiring a recourse, when needing to use a dimension expression whose stability argument is more subtle than current-state analysis.
 This recourse is to declare an explicit constant for the dimension value.
 Consider these two dimension expressions, whose uses are rejected by the blunt current-state rules:
 …
 void f( int @&@ nr, @const@ int nv ) {
         float x[@nr@];
         float (*xp)[@nr@] = &x;   // reject: nr varying (no references)
+        float (*xp)[@nr@] = &x;                 // reject: nr varying (no references)
         float y[@nv + 1@];
         float (*yp)[@nv + 1@] = &y;   // reject: ?+? unpredictable (no functions)
+        float (*yp)[@nv + 1@] = &y;             // reject: ?+? unpredictable (no functions)
+}
 \end{cfa}
 Yet, both dimension expressions are reused safely.
 The @nr@ reference is never written, not volatile meaning no implicit code (load) between declarations, and control does not leave the function between the uses.
+The @nr@ reference is never written, no implicit code (load) between declarations, and control does not leave the function between the uses.
 As well, the build-in @?+?@ function is predictable.
 To make these cases work, the programmer must add the follow constant declarations (cast does not work):
 …
         @const int nx@ = nr;
         float x[nx];
         float (*xp)[nx] = & x;   // accept
+        float (*xp)[nx] = & x;                  // accept
         @const int ny@ = nv + 1;
         float y[ny];
         float (*yp)[ny] = & y;   // accept
+        float (*yp)[ny] = & y;                  // accept
+}
 \end{cfa}
 …
 \end{cfa}
 Dimension hoisting already existed in the \CFA compiler.
 But its was buggy, particularly with determining, ``Can hoisting the expression be skipped here?'', for skipping this hoisting is clearly desirable in some cases.
+However, it was buggy, particularly with determining, ``Can hoisting the expression be skipped here?'', for skipping this hoisting is clearly desirable in some cases.
 For example, when a programmer has already hoisted to perform an optimization to prelude duplicate code (expression) and/or expression evaluation.
 In the new implementation, these cases are correct, harmonized with the accept/reject criteria.
 …
 \item
 Flexible-stride memory:
 this model has complete independence between subscripting ordering and memory layout, offering the ability to slice by (provide an index for) any dimension, \eg slice a plane, row, or column, \eg @c[3][*][*]@, @c[3][4][*]@, @c[3][*][5]@.
+this model has complete independence between subscript ordering and memory layout, offering the ability to slice by (provide an index for) any dimension, \eg slice a row, column, or plane, \eg @c[3][4][*]@, @c[3][*][5]@, @c[3][*][*]@.
 \item
 Fixed-stride memory:
 …
 Style 3 is the inevitable target of any array implementation.
 The hardware offers this model to the C compiler, with bytes as the unit of displacement.
 C offers this model to its programmer as pointer arithmetic, with arbitrary sizes as the unit.
+C offers this model to programmers as pointer arithmetic, with arbitrary sizes as the unit.
 Casting a multidimensional array as a single-dimensional array/pointer, then using @x[i]@ syntax to access its elements, is still a form of pointer arithmetic.
+Now stepping into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that even the interface is quite low-level.
+A C/\CFA array interface includes the resulting memory layout.
+The defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix.
+The required memory shape of such a slice is fixed, before any discussion of implementation.
+The implementation presented here is how the \CFA array-library wrangles the C type system, to make it do memory steps that are consistent with this layout while not affecting legacy C programs.
+To step into the implementation of \CFA's new type-1 multidimensional arrays in terms of C's existing type-2 multidimensional arrays, it helps to clarify that the interface is low-level, \ie a C/\CFA array interface includes the resulting memory layout.
+Specifically, the defining requirement of a type-2 system is the ability to slice a column from a column-finest matrix.
+Hence, the required memory shape of such a slice is fixed, before any discussion of implementation.
+The implementation presented here is how the \CFA array-library wrangles the C type system to make it do memory steps that are consistent with this layout while not affecting legacy C programs.
 % TODO: do I have/need a presentation of just this layout, just the semantics of -[all]?
 …
 \lstinput[aboveskip=0pt]{145-145}{hello-md.cfa}
 The nontrivial slicing in this example now allows passing a \emph{noncontiguous} slice to @print1d@, where the new-array library provides a ``subscript by all'' operation for this purpose.
 In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for such a value, implementing the @ar@ trait.
+In a multi-dimensional subscript operation, any dimension given as @all@ is a placeholder, \ie ``not yet subscripted by a value'', waiting for a value implementing the @ar@ trait.
 \lstinput{150-151}{hello-md.cfa}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 35fc819 for doc/theses/mike_brooks_MMath/array.tex

Legend:

doc/theses/mike_brooks_MMath/array.tex

Download in other formats: