Changeset 720eec9
- Timestamp:
- Oct 25, 2024, 4:09:56 PM (6 days ago)
- Branches:
- master
- Children:
- 33474e6
- Parents:
- d031f7f
- Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
rd031f7f r720eec9 4 4 5 5 \section{Introduction} 6 7 Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations. 8 This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}. 9 10 Specifically, a new \CFA array is declared: 6 \label{s:ArrayIntro} 7 8 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language, resulting in the largest proportion of runtime errors and security violations. 9 This chapter describes the new \CFA language and library features that introduce a length-checked array type to the \CFA standard library~\cite{Cforall}. 10 11 Specifically, a new \CFA array is declared by instantiating the generic @array@ type, 12 much like instantiating any other standard-library generic type (such as @dlist@), 13 though using a new style of generic parameter. 11 14 \begin{cfa} 12 15 @array( float, 99 )@ x; $\C[2.75in]{// x contains 99 floats}$ 13 16 \end{cfa} 14 using generic type @array@ with arguments @float@ and @99@.15 A function @f@ is declared with an @array@ parameter of length @42@.17 Here, the arguments to the @array@ type are @float@ (element type) and @99@ (length). 18 When this type is used as a function parameter, the type-system requires that a call's argument matches, down to the length. 16 19 \begin{cfa} 17 20 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ … … 21 24 Applying untyped: Name: f ... to: Name: x 22 25 \end{cfa} 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 25 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 26 Here, the function @f@'s parameter @p@ is declared with length 42. 27 The call @f( x )@, with the argument being the previously-declared object, is invalid, because the @array@ lengths @99@ and @42@ do not match. 28 29 A function declaration can be polymorphic over these @array@ arguments by using the @forall@ declaration prefix. 30 This function @g@'s takes arbitrary type parameter @T@ (familiar) and \emph{dimension parameter} @N@ (new). 31 A dimension paramter represents a to-be-determined count of elements, managed by the type system. 26 32 \begin{cfa} 27 33 forall( T, @[N]@ ) … … 35 41 \end{cfa} 36 42 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@. 37 Inferring values for @T@ and @N@ is implicit without programmer involvement.38 Furthermore, the runtime subscript @x[0]@ (parameter @i@ is @0@) in @g@ is valid because @0@is in the dimension range $[0,99)$ of argument @x@.39 The call @g( x, 1000 )@ is also valid;40 however, th e runtime subscript @x[1000]@ is invalid (generates a subscript error)because @1000@ is outside the dimension range $[0,99)$ of argument @x@.41 42 The generic @array@ type is similarto the C array type, which \CFA inherits from C.43 Itsruntime characteristics are often identical, and some features are available in both.44 For example, assume a caller can instantiates @N@ with 42 in the following (details to follow).43 Inferring values for @T@ and @N@ is implicit, without programmer involvement. 44 Furthermore, in this case, the runtime subscript @x[0]@ (parameter @i@ being @0@) in @g@ is valid because 0 is in the dimension range $[0,99)$ of argument @x@. 45 The call @g( x, 1000 )@ is also accepted through compile time; 46 however, this case's subscript, @x[1000]@, generates an error, because @1000@ is outside the dimension range $[0,99)$ of argument @x@. 47 48 The generic @array@ type is comparable to the C array type, which \CFA inherits from C. 49 Their runtime characteristics are often identical, and some features are available in both. 50 For example, assume a caller instantiates @N@ with 42 (discussion about how to follow) in: 45 51 \begin{cfa} 46 52 forall( [N] ) … … 51 57 \end{cfa} 52 58 Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@. 53 The two variables have identical size and layout; they both encapsulate 42-float , stack \vs heap allocationswith no additional ``bookkeeping'' allocations or headers.59 The two variables have identical size and layout; they both encapsulate 42-float stack allocations, with no additional ``bookkeeping'' allocations or headers. 54 60 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA. 55 61 56 62 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart. 57 A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@);58 then, the library @array@ type can be removedgiving \CFA a largely uniform array type.59 At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion;63 A future goal (TODO xref) is to provide the new features upon a built-in type whose syntax approaches C's (declaration style of @x1@). 64 Then, the library @array@ type could be removed, giving \CFA a largely uniform array type. 65 At present, the C-syntax array gets partial support for the new features, so the generic @array@ is used exclusively when introducing features; 60 66 feature support and C compatibility are revisited in Section ? TODO. 61 67 62 68 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@). 63 69 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 64 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref. 70 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array. 71 72 In all discussion following, ``C array'' means the types like that of @x@ and ``\CFA array'' means the standard-library @array@ type (instantiations), like the type of @x2@. 65 73 66 74 My contributions in this chapter are: … … 81 89 82 90 91 92 General dependent typing allows the type system to encode arbitrary predicates (e.g. behavioural specifications for functions), 93 which is an anti-goal for my work. 94 Firstly, this application is strongly associated with pure functional languages, 95 where a characterization of the return value (giving it a precise type, generally dependent upon the parameters) 96 is a sufficient postcondition. 97 In an imperative language like C and \CFA, it is also necessary to discuss side effects, 98 for which an even heavier formalism, like separation logic, is required. 99 Secondly, TODO: bash Rust. 100 TODO: cite the crap out of these claims. 101 102 103 83 104 \section{Features added} 84 105 85 This section presents motivating examples for the new array type, demonstrating the syntax and semantics of the generic @array@. 86 As stated, the core capability of the new array is tracking all dimensions in the type system, where dynamic dimensions are represented using type variables. 87 88 The definition of type variables preceding object declarations makes the array dimension lexically referenceable where it is needed. 89 For example, a declaration can share one length, @N@, among a pair of parameters and the return. 106 This section shows more about using the \CFA array and dimension parameters, demonstrating their syntax and semantics by way of motivating examples. 107 As stated, the core capability of the new array is tracking all dimensions within the type system, where dynamic dimensions are represented using type variables. 108 109 By declaring type variables at the front of object declarations, an array dimension is lexically referenceable where it is needed. 110 For example, a declaration can share one length, @N@, among a pair of parameters and the return, 111 meaning that it requires both input arrays to be of the same length, and guarantees that the result with be of that length as well. 90 112 \lstinput{10-17}{hello-array.cfa} 91 Here, the function @f@ does a pointwise comparison, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 92 The dynamic allocation of the @ret@ array by @alloc@ uses the parameterized dimension information in its implicit @_Alignof@ and @sizeof@ determinations, and casting the return type. 93 \begin{cfa} 113 This function @f@ does a pointwise comparison of its two input arrays, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 114 The dynamic allocation of the @ret@ array by preexisting @alloc@ uses the parameterized dimension information implicitly within its @sizeof@ determination, and casts the return type. 115 Note that alloc only sees one whole type for its @T@ (which is @f@'s @array(bool, N)@); this type's size is a computation based on @N@. 116 \begin{cfa} 117 // simplification 94 118 static inline forall( T & | sized(T) ) 95 T * alloc( size_t dim ) { 96 if ( _Alignof(T) <= libAlign() ) return (T *)aalloc( dim, sizeof(T) ); // calloc without zero fill 97 else return (T *)amemalign( _Alignof(T), dim, sizeof(T) ); // array memalign 98 } 99 \end{cfa} 100 Here, the type system deduces from the left-hand side of the assignment the type @array(bool, N)@, and forwards it as the type variable @T@ for the generic @alloc@ function, making it available in its body. 101 Hence, preexisting \CFA behaviour is leveraged here, both in the return-type polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine. 119 T * alloc() { 120 return (T *)malloc( sizeof(T) ); 121 } 122 \end{cfa} 102 123 This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary @sized@ assertions needed by other types. 103 (@sized@ implies a concrete \vs abstract type with a compile-time size.)124 (@sized@ implies a concrete \vs abstract type with a runtime-available size, exposed as @sizeof@.) 104 125 As a result, there is significant programming safety by making the size accessible and implicit, compared with C's @calloc@ and non-array supporting @memalign@, which take an explicit length parameter not managed by the type system. 105 126 … … 111 132 \end{figure} 112 133 113 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustratehow dynamic values are fed into the @array@ type.134 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@, illustrating how dynamic values are fed into the @array@ type. 114 135 Here, the dimension of arrays @x@, @y@, and @result@ is specified from a command-line value, @dim@, and these arrays are allocated on the stack. 115 136 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values. … … 119 140 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@. 120 141 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 121 The se benefits cannot be underestimated.142 The result is a significant improvement in safety and usability. 122 143 123 144 In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function. 124 145 The syntactic form is chosen to parallel other @forall@ forms: 125 146 \begin{cfa} 126 forall( @[N]@ ) ... $\C[1.5in]{// array kind}$127 forall( T & ) ... $\C{// reference kind (dtype)}$128 forall( T ) ... $\C{// value kind (otype)}\CRT$147 forall( @[N]@ ) ... $\C[1.5in]{// dimension}$ 148 forall( T & ) ... $\C{// opaque datatype (formerly, "dtype")}$ 149 forall( T ) ... $\C{// value datatype (formerly, "otype")}\CRT$ 129 150 \end{cfa} 130 151 % The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance. … … 132 153 \begin{itemize} 133 154 \item 134 @[N]@ within a foralldeclares the type variable @N@ to be a managed length.135 \item 136 The type of @N@ within code is @size_t@.137 \item 138 The value of @N@ within code is the acquired lengthderived from the usage site, \ie generic declaration or function call.139 \item 140 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects.155 @[N]@ within a @forall@ declares the type variable @N@ to be a managed length. 156 \item 157 @N@ can be used an expression of type @size_t@ within the declared function body. 158 \item 159 The value of an @N@-expression is the acquired length, derived from the usage site, \ie generic declaration or function call. 160 \item 161 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@-typed objects. 141 162 \end{itemize} 142 163 … … 144 165 \begin{enumerate}[leftmargin=*] 145 166 \item 146 The \CC template @N@ is a compile-time value, while the \CFA @N@ is a runtime value. 167 The \CC template @N@ can only be compile-time value, while the \CFA @N@ may be a runtime value. 168 % agreed, though already said 169 \item 170 \CC does not allow a template function to be nested, while \CFA lests its polymorphic functions to be nested. 171 % why is this important? 147 172 \item 148 173 The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduct the type of @T@. 149 174 The \CFA @N@ is part of the array type and passed implicitly at the call. 175 % fixed by comparing to std::array 176 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v2 150 177 \item 151 178 \CC cannot have an array of references, but can have an array of pointers. … … 153 180 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing. 154 181 The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer. 182 % not really about forall-N vs template-N 183 % any better CFA support is how Rob left references, not what Mike did to arrays 184 % https://stackoverflow.com/questions/1164266/why-are-arrays-of-references-illegal 185 % https://stackoverflow.com/questions/922360/why-cant-i-make-a-vector-of-references 155 186 \item 156 187 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency). 188 % fixed by comparing to std::array 189 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v10 157 190 \end{enumerate} 191 TODO: settle Mike's concerns with this comparison (perhaps, remove) 158 192 159 193 \begin{figure} … … 166 200 } 167 201 int main() { 202 168 203 int ret[10], x[10]; 169 204 for ( int i = 0; i < 10; i += 1 ) x[i] = i; … … 177 212 \begin{cfa} 178 213 int main() { 179 @forall( T, [N] )@ 214 @forall( T, [N] )@ // nested function 180 215 void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) { 181 for ( i; 10 ) ret[i] = x[i]; 182 } 183 184 array( int, 10 ) ret, x; 185 for ( i; 10 ) x[i] = i; 216 for ( i; N ) ret[i] = x[i]; 217 } 218 219 const int n = promptForLength(); 220 array( int, n ) ret, x; 221 for ( i; n ) x[i] = i; 186 222 @copy( ret, x );@ 187 for ( i; 10)223 for ( i; n ) 188 224 sout | ret[i] | nonl; 189 225 sout | nl; … … 191 227 \end{cfa} 192 228 \end{tabular} 193 \caption{\ CC \lstinline[language=C++]{template}\vs \CFA generic type }229 \caption{\lstinline{N}-style paramters, for \CC template \vs \CFA generic type } 194 230 \label{f:TemplateVsGenericType} 195 231 \end{figure} 196 232 197 Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length. 198 As stated previous, a compile-time error occurs when attempting to call @f@ with arrays of differing lengths. 199 % removing leading whitespace 200 \lstinput[tabsize=1]{52-53}{hello-array.cfa} 201 \lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa} 202 C allows casting to assert knowledge not shared with the type system. 203 \lstinput{70-74}{hello-array.cfa} 204 205 Orthogonally, the new @array@ type works with \CFA's generic types, providing argument safety and the associated implicit communication of array length. 206 Specifically, \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element types and lengths. 207 Doing so gives a refinement of C's ``flexible array member'' pattern, allowing nesting structures with array members anywhere within the structures. 233 Just as the first example in \VRef[Section]{s:ArrayIntro} shows a compile-time rejection of a length mismatch, 234 so are length mismatches stopped when they invlove dimension parameters. 235 While \VRef[Figure]{f:fHarness} shows successfully calling a function @f@ expecting two arrays of the same length, 236 \begin{cfa} 237 array( bool, N ) & f( array( float, N ) &, array( float, N ) & ); 238 \end{cfa} 239 a static rejection occurs when attempting to call @f@ with arrays of potentially differing lengths. 240 \lstinput[tabsize=1]{70-74}{hello-array.cfa} 241 When the argument lengths themselves are statically unknown, 242 the static check is conservative and, as always, \CFA's casting lets the programmer use knowledge not shared with the type system. 243 \begin{tabular}{@{\hspace{0.5in}}l@{\hspace{1in}}l@{}} 244 \lstinput{90-97}{hello-array.cfa} 245 & 246 \lstinput{110-117}{hello-array.cfa} 247 \end{tabular} 248 249 \noindent 250 This static check's full rules are presented in \VRef[Section]{s:ArrayTypingC}. 251 252 Orthogonally, the \CFA array type works within generic \emph{types}, \ie @forall@-on-@struct@. 253 The same argument safety and the associated implicit communication of array length occurs. 254 Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing for element types. 255 Now, \CFA also allows parameterizing them by length. 256 Doing so gives a refinement of C's ``flexible array member'' pattern[TODO: cite ARM 6.7.2.1 pp18]\cite{arr:gnu-flex-mbr}. 257 While a C flexible array member can only occur at the end of the enclosing structure, 258 \CFA allows length-parameterized array members to be nested at arbitrary locations. 259 This flexibility, in turn, allows for multiple array members. 208 260 \lstinput{10-15}{hello-accordion.cfa} 209 This structure's layout has the starting offset of @studentIds@ varying in generic parameter @C@, and the offset of @preferences@ varying inboth generic parameters.261 This structure's layout has the starting offset of @studentIds@ varying according to the generic parameter @C@, and the offset of @preferences@ varying according to both generic parameters. 210 262 For a function that operates on a @School@ structure, the type system handles this memory layout transparently. 211 263 \lstinput{40-45}{hello-accordion.cfa} 212 \VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes, where multidimensional arrays are discussed next. 264 \VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes. 265 Note the declaration of the @school@ variable. 266 It is on the stack and its initialization does not use any casting or size arithmetic. 267 Both of these points are impossible with a C flexible array member. 268 When heap allocation is preferred, the original pattern still applies. 269 \begin{cfa} 270 School( classes, students ) * sp = alloc(); 271 \end{cfa} 272 This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members. 273 213 274 214 275 \begin{figure} … … 216 277 \begin{tabular}{@{}ll@{\hspace{25pt}}l@{}} 217 278 \begin{tabular}{@{}p{3.25in}@{}} 218 \lstinput{60-6 6}{hello-accordion.cfa}279 \lstinput{60-64}{hello-accordion.cfa} 219 280 \vspace*{-3pt} 220 281 \lstinput{73-80}{hello-accordion.cfa} … … 233 294 234 295 \section{Typing of C Arrays} 296 \label{s:ArrayTypingC} 235 297 236 298 Essential in giving a guarantee of accurate length is the compiler's ability -
doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa
rd031f7f r720eec9 39 39 40 40 forall( [C], [S] ) 41 void init( @School( C, S ) & classes@, int class, int student, int pref ) with( classes) {42 classIds[class] = class; $\C{// handle dynamic offsets of fields within structure}$43 s tudentIds[student] = student;44 preferences[class][student] = pref;41 void put( @School( C, S ) & s@, int class, int student, int pref ) { 42 s.classIds[class] = class; $\C{// fields' offsets are dynamic }$ 43 s.studentIds[student] = student; 44 s.preferences[class][student] = pref; 45 45 } 46 46 … … 62 62 sin | classes | students; 63 63 @School( classes, students ) school;@ 64 int class, student, preference;65 // read data into school calling init66 // for each student's class/preferences67 try {68 for ( ) {69 sin | class | student | preference;70 init( school, class, student, preference );71 }72 } catch( end_of_file * ) {}64 // elided: read data into school, calling put 65 { int class, student, preference; 66 // for each student's class/preferences 67 try { 68 for ( ) { 69 sin | class | student | preference; 70 put( school, class, student, preference ); 71 } 72 } catch( end_of_file * ) {} } 73 73 for ( s; students ) { 74 74 sout | "student" | s | nonl; -
doc/theses/mike_brooks_MMath/programs/hello-array.cfa
rd031f7f r720eec9 49 49 */ 50 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 #ifdef SHOWERR1 65 51 66 void fred() { 67 68 69 70 array( float, @10@ ) x; 71 array( float, @20@ ) y; 72 f( x, x ); $\C[0.5in]{// ok}$ 73 f( y, y ); $\C{// ok}$ 74 f( x, y ); $\C{// error}\CRT$ 75 76 77 78 79 } 80 81 82 83 84 85 86 87 88 89 90 forall( [M], [N] ) 91 void bad( array(float, M) &x, 92 array(float, N) &y ) { 93 f( x, x ); $\C[0.5in]{// ok}$ 94 f( y, y ); $\C{// ok}$ 95 96 f( x, y ); $\C{// error}\CRT$ 97 } 98 99 #endif 100 101 102 103 104 105 106 107 108 109 110 forall( [M], [N] ) 111 void bad_fixed( array( float, M ) & x, 112 array( float, N ) & y ) { 113 f( x, x ); $\C[0.5in]{// ok}$ 114 f( y, y ); $\C{// ok}$ 115 if ( M == N ) 116 f( x, @(array( float, M ) &)@y ); $\C{// ok}$ 117 } 118 119 120 121 122 123 124 void fred_ok_only() { 52 125 array( float, @10@ ) x; 53 126 array( float, @20@ ) y; … … 57 130 } 58 131 59 #ifdef SHOWERR160 forall( [M], [N] )61 void bad( array(float, M) &x, array(float, N) &y ) {62 f( x, x ); $\C[1.5in]{// ok}$63 f( y, y ); $\C{// ok}$64 f( x, y ); $\C{// error}\CRT$65 }66 #endif67 68 69 132 70 133 forall( [M], [N] ) 71 void bad_fixed( array( float, M ) & x, array( float, N ) & y ) { 72 if ( M == N ) 73 f( x, @(array( float, M ) &)@y ); $\C{// cast y to matching type}$ 134 void bad_ok_only( array(float, M) &x, 135 array(float, N) &y ) { 136 f( x, x ); 137 f( y, y ); 138 139 // f( x, y ); 74 140 } 141 75 142 76 143 // Local Variables: // -
doc/theses/mike_brooks_MMath/uw-ethesis.bib
rd031f7f r720eec9 17 17 year = {2018}, 18 18 } 19 20 % -------------------------------------------------- 21 % C Array facts 22 23 @misc{arr:gnu-flex-mbr, 24 title = {Arrays of Length Zero}, 25 howpublished= {\url{https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html}}, 26 } 27 19 28 20 29 % --------------------------------------------------
Note: See TracChangeset
for help on using the changeset viewer.