Context Navigation

← Previous Change
Next Change →

Changeset 720eec9 for doc/theses

Timestamp:

Oct 25, 2024, 4:09:56 PM (3 weeks ago)

Author:

Michael Brooks <mlbrooks@…>

Branches:

Children:

Parents:

Message:

Thesis, array, proofread intro sections.

Location:

doc/theses/mike_brooks_MMath

Files:

: 4 edited

array.tex (modified) (15 diffs)
programs/hello-accordion.cfa (modified) (2 diffs)
programs/hello-array.cfa (modified) (2 diffs)
uw-ethesis.bib (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/array.tex

-                      rd031f7f
+                      r720eec9
 \section{Introduction}
+Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations.
+This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}.
+Specifically, a new \CFA array is declared:
+\label{s:ArrayIntro}
+Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language, resulting in the largest proportion of runtime errors and security violations.
+This chapter describes the new \CFA language and library features that introduce a length-checked array type to the \CFA standard library~\cite{Cforall}.
+Specifically, a new \CFA array is declared by instantiating the generic @array@ type,
+much like instantiating any other standard-library generic type (such as @dlist@),
+though using a new style of generic parameter.
 \begin{cfa}
 @array( float, 99 )@ x;                                 $\C[2.75in]{// x contains 99 floats}$
 \end{cfa}
 using generic type @array@ with arguments @float@ and @99@.
 A function @f@ is declared with an @array@ parameter of length @42@.
+Here, the arguments to the @array@ type are @float@ (element type) and @99@ (length).
+When this type is used as a function parameter, the type-system requires that a call's argument matches, down to the length.
 \begin{cfa}
 void f( @array( float, 42 )@ & p ) {}   $\C{// p accepts 42 floats}$
 …
 Applying untyped:  Name: f ... to:  Name: x
 \end{cfa}
+The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match.
+Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system.
+Here, the function @f@'s parameter @p@ is declared with length 42.
+The call @f( x )@, with the argument being the previously-declared object, is invalid, because the @array@ lengths @99@ and @42@ do not match.
+A function declaration can be polymorphic over these @array@ arguments by using the @forall@ declaration prefix.
+This function @g@'s takes arbitrary type parameter @T@ (familiar) and \emph{dimension parameter} @N@ (new).
+A dimension paramter represents a to-be-determined count of elements, managed by the type system.
 \begin{cfa}
 forall( T, @[N]@ )
 …
 \end{cfa}
 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@.
 Inferring values for @T@ and @N@ is implicit without programmer involvement.
 Furthermore, the runtime subscript @x[0]@ (parameter @i@ is @0@) in @g@ is valid because @0@ is in the dimension range $[0,99)$ of argument @x@.
 The call @g( x, 1000 )@ is also valid;
 however, the runtime subscript @x[1000]@ is invalid (generates a subscript error) because @1000@ is outside the dimension range $[0,99)$ of argument @x@.
 The generic @array@ type is similar to the C array type, which \CFA inherits from C.
 Its runtime characteristics are often identical, and some features are available in both.
 For example, assume a caller can instantiates @N@ with 42 in the following (details to follow).
+Inferring values for @T@ and @N@ is implicit, without programmer involvement.
+Furthermore, in this case, the runtime subscript @x[0]@ (parameter @i@ being @0@) in @g@ is valid because 0 is in the dimension range $[0,99)$ of argument @x@.
+The call @g( x, 1000 )@ is also accepted through compile time;
+however, this case's subscript, @x[1000]@, generates an error, because @1000@ is outside the dimension range $[0,99)$ of argument @x@.
+The generic @array@ type is comparable to the C array type, which \CFA inherits from C.
+Their runtime characteristics are often identical, and some features are available in both.
+For example, assume a caller instantiates @N@ with 42 (discussion about how to follow) in:
 \begin{cfa}
 forall( [N] )
 …
 \end{cfa}
 Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@.
 The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers.
+The two variables have identical size and layout; they both encapsulate 42-float stack allocations, with no additional ``bookkeeping'' allocations or headers.
 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA.
 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart.
 A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@);
 then, the library @array@ type can be removed giving \CFA a largely uniform array type.
 At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion;
+A future goal (TODO xref) is to provide the new features upon a built-in type whose syntax approaches C's (declaration style of @x1@).
+Then, the library @array@ type could be removed, giving \CFA a largely uniform array type.
+At present, the C-syntax array gets partial support for the new features, so the generic @array@ is used exclusively when introducing features;
 feature support and C compatibility are revisited in Section ? TODO.
 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@).
 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays.
+Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref.
+Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array.
+In all discussion following, ``C array'' means the types like that of @x@ and ``\CFA array'' means the standard-library @array@ type (instantiations), like the type of @x2@.
 My contributions in this chapter are:
 …
+General dependent typing allows the type system to encode arbitrary predicates (e.g. behavioural specifications for functions),
+which is an anti-goal for my work.
+Firstly, this application is strongly associated with pure functional languages,
+where a characterization of the return value (giving it a precise type, generally dependent upon the parameters)
+is a sufficient postcondition.
+In an imperative language like C and \CFA, it is also necessary to discuss side effects,
+for which an even heavier formalism, like separation logic, is required.
+Secondly, TODO: bash Rust.
+TODO: cite the crap out of these claims.
 \section{Features added}
+This section presents motivating examples for the new array type, demonstrating the syntax and semantics of the generic @array@.
+As stated, the core capability of the new array is tracking all dimensions in the type system, where dynamic dimensions are represented using type variables.
+The definition of type variables preceding object declarations makes the array dimension lexically referenceable where it is needed.
+For example, a declaration can share one length, @N@, among a pair of parameters and the return.
+This section shows more about using the \CFA array and dimension parameters, demonstrating their syntax and semantics by way of motivating examples.
+As stated, the core capability of the new array is tracking all dimensions within the type system, where dynamic dimensions are represented using type variables.
+By declaring type variables at the front of object declarations, an array dimension is lexically referenceable where it is needed.
+For example, a declaration can share one length, @N@, among a pair of parameters and the return,
+meaning that it requires both input arrays to be of the same length, and guarantees that the result with be of that length as well.
 \lstinput{10-17}{hello-array.cfa}
+Here, the function @f@ does a pointwise comparison, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array.
+The dynamic allocation of the @ret@ array by @alloc@ uses the parameterized dimension information in its implicit @_Alignof@ and @sizeof@ determinations, and casting the return type.
+\begin{cfa}
+This function @f@ does a pointwise comparison of its two input arrays, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array.
+The dynamic allocation of the @ret@ array by preexisting @alloc@ uses the parameterized dimension information implicitly within its @sizeof@ determination, and casts the return type.
+Note that alloc only sees one whole type for its @T@ (which is @f@'s @array(bool, N)@); this type's size is a computation based on @N@.
+\begin{cfa}
+// simplification
 static inline forall( T & | sized(T) )
+T * alloc( size_t dim ) {
+        if ( _Alignof(T) <= libAlign() ) return (T *)aalloc( dim, sizeof(T) ); // calloc without zero fill
+        else return (T *)amemalign( _Alignof(T), dim, sizeof(T) ); // array memalign
+}
+\end{cfa}
+Here, the type system deduces from the left-hand side of the assignment the type @array(bool, N)@, and forwards it as the type variable @T@ for the generic @alloc@ function, making it available in its body.
+Hence, preexisting \CFA behaviour is leveraged here, both in the return-type polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine.
+T * alloc() {
+        return (T *)malloc( sizeof(T) );
+}
+\end{cfa}
 This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary @sized@ assertions needed by other types.
 (@sized@ implies a concrete \vs abstract type with a compile-time size.)
+(@sized@ implies a concrete \vs abstract type with a runtime-available size, exposed as @sizeof@.)
 As a result, there is significant programming safety by making the size accessible and implicit, compared with C's @calloc@ and non-array supporting @memalign@, which take an explicit length parameter not managed by the type system.
 …
 \end{figure}
 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustrate how dynamic values are fed into the @array@ type.
+\VRef[Figure]{f:fHarness} shows a harness that uses function @f@, illustrating how dynamic values are fed into the @array@ type.
 Here, the dimension of arrays @x@, @y@, and @result@ is specified from a command-line value, @dim@, and these arrays are allocated on the stack.
 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values.
 …
 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@.
 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage.
 These benefits cannot be underestimated.
+The result is a significant improvement in safety and usability.
 In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function.
 The syntactic form is chosen to parallel other @forall@ forms:
 \begin{cfa}
 forall( @[N]@ ) ...     $\C[1.5in]{// array kind}$
 forall( T & ) ...       $\C{// reference kind (dtype)}$
 forall( T ) ...         $\C{// value kind (otype)}\CRT$
+forall( @[N]@ ) ...     $\C[1.5in]{// dimension}$
+forall( T & ) ...       $\C{// opaque datatype (formerly, "dtype")}$
+forall( T ) ...         $\C{// value datatype (formerly, "otype")}\CRT$
 \end{cfa}
 % The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance.
 …
 \begin{itemize}
 \item
 @[N]@ within a forall declares the type variable @N@ to be a managed length.
 \item
 The type of @N@ within code is @size_t@.
 \item
 The value of @N@ within code is the acquired length derived from the usage site, \ie generic declaration or function call.
 \item
 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects.
+@[N]@ within a @forall@ declares the type variable @N@ to be a managed length.
+\item
+@N@ can be used an expression of type @size_t@ within the declared function body.
+\item
+The value of an @N@-expression is the acquired length, derived from the usage site, \ie generic declaration or function call.
+\item
+@array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@-typed objects.
 \end{itemize}
 …
 \begin{enumerate}[leftmargin=*]
 \item
+The \CC template @N@ is a compile-time value, while the \CFA @N@ is a runtime value.
+The \CC template @N@ can only be compile-time value, while the \CFA @N@ may be a runtime value.
+% agreed, though already said
+\item
+\CC does not allow a template function to be nested, while \CFA lests its polymorphic functions to be nested.
+% why is this important?
 \item
 The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduct the type of @T@.
 The \CFA @N@ is part of the array type and passed implicitly at the call.
+% fixed by comparing to std::array
+% mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v2
 \item
 \CC cannot have an array of references, but can have an array of pointers.
 …
 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing.
 The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer.
+% not really about forall-N vs template-N
+% any better CFA support is how Rob left references, not what Mike did to arrays
+% https://stackoverflow.com/questions/1164266/why-are-arrays-of-references-illegal
+% https://stackoverflow.com/questions/922360/why-cant-i-make-a-vector-of-references
 \item
 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency).
+% fixed by comparing to std::array
+% mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v10
 \end{enumerate}
+TODO: settle Mike's concerns with this comparison (perhaps, remove)
 \begin{figure}
 …
+}
 int main() {
         int ret[10], x[10];
         for ( int i = 0; i < 10; i += 1 ) x[i] = i;
 …
 \begin{cfa}
 int main() {
         @forall( T, [N] )@   // nested function
+        @forall( T, [N] )@              // nested function
         void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) {
+                for ( i; 10 ) ret[i] = x[i];
+        }
+        array( int, 10 ) ret, x;
+        for ( i; 10 ) x[i] = i;
+                for ( i; N ) ret[i] = x[i];
+        }
+        const int n = promptForLength();
+        array( int, n ) ret, x;
+        for ( i; n ) x[i] = i;
         @copy( ret,  x );@
         for ( i; 10 )
+        for ( i; n )
                 sout | ret[i] | nonl;
         sout | nl;
 …
 \end{cfa}
 \end{tabular}
 \caption{\CC \lstinline[language=C++]{template} \vs \CFA generic type }
+\caption{\lstinline{N}-style paramters, for \CC template \vs \CFA generic type }
 \label{f:TemplateVsGenericType}
 \end{figure}
+Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length.
+As stated previous, a compile-time error occurs when attempting to call @f@ with arrays of differing lengths.
+% removing leading whitespace
+\lstinput[tabsize=1]{52-53}{hello-array.cfa}
+\lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa}
+C allows casting to assert knowledge not shared with the type system.
+\lstinput{70-74}{hello-array.cfa}
+Orthogonally, the new @array@ type works with \CFA's generic types, providing argument safety and the associated implicit communication of array length.
+Specifically, \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element types and lengths.
+Doing so gives a refinement of C's ``flexible array member'' pattern, allowing nesting structures with array members anywhere within the structures.
+Just as the first example in \VRef[Section]{s:ArrayIntro} shows a compile-time rejection of a length mismatch,
+so are length mismatches stopped when they invlove dimension parameters.
+While \VRef[Figure]{f:fHarness} shows successfully calling a function @f@ expecting two arrays of the same length,
+\begin{cfa}
+array( bool, N ) & f( array( float, N ) &, array( float, N ) & );
+\end{cfa}
+a static rejection occurs when attempting to call @f@ with arrays of potentially differing lengths.
+\lstinput[tabsize=1]{70-74}{hello-array.cfa}
+When the argument lengths themselves are statically unknown,
+the static check is conservative and, as always, \CFA's casting lets the programmer use knowledge not shared with the type system.
+\begin{tabular}{@{\hspace{0.5in}}l@{\hspace{1in}}l@{}}
+\lstinput{90-97}{hello-array.cfa}
+&
+\lstinput{110-117}{hello-array.cfa}
+\end{tabular}
+\noindent
+This static check's full rules are presented in \VRef[Section]{s:ArrayTypingC}.
+Orthogonally, the \CFA array type works within generic \emph{types}, \ie @forall@-on-@struct@.
+The same argument safety and the associated implicit communication of array length occurs.
+Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing for element types.
+Now, \CFA also allows parameterizing them by length.
+Doing so gives a refinement of C's ``flexible array member'' pattern[TODO: cite ARM 6.7.2.1 pp18]\cite{arr:gnu-flex-mbr}.
+While a C flexible array member can only occur at the end of the enclosing structure,
+\CFA allows length-parameterized array members to be nested at arbitrary locations.
+This flexibility, in turn, allows for multiple array members.
 \lstinput{10-15}{hello-accordion.cfa}
 This structure's layout has the starting offset of @studentIds@ varying in generic parameter @C@, and the offset of @preferences@ varying in both generic parameters.
+This structure's layout has the starting offset of @studentIds@ varying according to the generic parameter @C@, and the offset of @preferences@ varying according to both generic parameters.
 For a function that operates on a @School@ structure, the type system handles this memory layout transparently.
 \lstinput{40-45}{hello-accordion.cfa}
+\VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes, where multidimensional arrays are discussed next.
+\VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes.
+Note the declaration of the @school@ variable.
+It is on the stack and its initialization does not use any casting or size arithmetic.
+Both of these points are impossible with a C flexible array member.
+When heap allocation is preferred, the original pattern still applies.
+\begin{cfa}
+School( classes, students ) * sp = alloc();
+\end{cfa}
+This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members.
 \begin{figure}
 …
 \begin{tabular}{@{}ll@{\hspace{25pt}}l@{}}
 \begin{tabular}{@{}p{3.25in}@{}}
 \lstinput{60-66}{hello-accordion.cfa}
+\lstinput{60-64}{hello-accordion.cfa}
 \vspace*{-3pt}
 \lstinput{73-80}{hello-accordion.cfa}
 …
 \section{Typing of C Arrays}
+\label{s:ArrayTypingC}
 Essential in giving a guarantee of accurate length is the compiler's ability

doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa

-                      rd031f7f
+                      r720eec9
 forall( [C], [S] )
 void init( @School( C, S ) & classes@, int class, int student, int pref ) with( classes ) {
         classIds[class] = class; $\C{// handle dynamic offsets of fields within structure}$
         studentIds[student] = student;
         preferences[class][student] = pref;
+void put( @School( C, S ) & s@, int class, int student, int pref ) {
+        s.classIds[class] = class; $\C{// fields' offsets are dynamic }$
+        s.studentIds[student] = student;
+        s.preferences[class][student] = pref;
+}
 …
         sin | classes | students;
         @School( classes, students ) school;@
         int class, student, preference;
         // read data into school calling init
         // for each student's class/preferences
         try {
                 for ( ) {
                         sin | class | student | preference;
                         init( school, class, student, preference );
+                }
         } catch( end_of_file * ) {}
+        // elided: read data into school, calling put
+        {       int class, student, preference;
+                // for each student's class/preferences
+                try {
+                        for ( ) {
+                                sin | class | student | preference;
+                                put( school, class, student, preference );
+                        }
+                } catch( end_of_file * ) {}        }
         for ( s; students ) {
                 sout | "student" | s | nonl;

doc/theses/mike_brooks_MMath/programs/hello-array.cfa

-                      rd031f7f
+                      r720eec9
 */
+#ifdef SHOWERR1
 void fred() {
+        array( float, @10@ ) x;
+        array( float, @20@ ) y;
+        f( x, x );              $\C[0.5in]{// ok}$
+        f( y, y );              $\C{// ok}$
+        f( x, y );              $\C{// error}\CRT$
+}
+forall( [M], [N] )
+void bad( array(float, M) &x,
+                array(float, N) &y ) {
+        f( x, x );              $\C[0.5in]{// ok}$
+        f( y, y );              $\C{// ok}$
+        f( x, y );              $\C{// error}\CRT$
+}
+#endif
+forall( [M], [N] )
+void bad_fixed( array( float, M ) & x,
+                array( float, N ) & y ) {
+        f( x, x );              $\C[0.5in]{// ok}$
+        f( y, y );              $\C{// ok}$
+        if ( M == N )
+                f( x, @(array( float, M ) &)@y ); $\C{// ok}$
+}
+void fred_ok_only() {
         array( float, @10@ ) x;
         array( float, @20@ ) y;
 …
+}
-#ifdef SHOWERR1
-forall( [M], [N] )
-void bad( array(float, M) &x, array(float, N) &y ) {
-        f( x, x );              $\C[1.5in]{// ok}$
-        f( y, y );              $\C{// ok}$
-        f( x, y );              $\C{// error}\CRT$
+}
-#endif
 forall( [M], [N] )
+void bad_fixed( array( float, M ) & x, array( float, N ) & y ) {
+        if ( M == N )
+                f( x, @(array( float, M ) &)@y ); $\C{// cast y to matching type}$
+void bad_ok_only( array(float, M) &x,
+                array(float, N) &y ) {
+        f( x, x );
+        f( y, y );
+//      f( x, y );
+}
 // Local Variables: //

doc/theses/mike_brooks_MMath/uw-ethesis.bib

-                      rd031f7f
+                      r720eec9
     year      = {2018},
+}
+% --------------------------------------------------
+% C Array facts
+@misc{arr:gnu-flex-mbr,
+    title       = {Arrays of Length Zero},
+    howpublished= {\url{https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html}},
+}
 % --------------------------------------------------

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: