Changes in / [33474e6:bf91d1d]
- Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
TabularUnified doc/theses/mike_brooks_MMath/array.tex ¶
r33474e6 rbf91d1d 4 4 5 5 \section{Introduction} 6 \label{s:ArrayIntro} 7 8 Arrays in C are possibly the single most misunderstood and incorrectly used feature in the language, resulting in the largest proportion of runtime errors and security violations. 9 This chapter describes the new \CFA language and library features that introduce a length-checked array type to the \CFA standard library~\cite{Cforall}. 10 11 Specifically, a new \CFA array is declared by instantiating the generic @array@ type, 12 much like instantiating any other standard-library generic type (such as @dlist@), 13 though using a new style of generic parameter. 6 7 Arrays in C are possible the single most misunderstood and incorrectly used features in the language, resulting in the largest proportion of runtime errors and security violations. 8 This chapter describes the new \CFA language and library features that introduce a length-checked array-type to the \CFA standard library~\cite{Cforall}. 9 10 Specifically, a new \CFA array is declared: 14 11 \begin{cfa} 15 12 @array( float, 99 )@ x; $\C[2.75in]{// x contains 99 floats}$ 16 13 \end{cfa} 17 Here, the arguments to the @array@ type are @float@ (element type) and @99@ (length).18 When this type is used as a function parameter, the type-system requires that a call's argument matches, down to the length.14 using generic type @array@ with arguments @float@ and @99@. 15 A function @f@ is declared with an @array@ parameter of length @42@. 19 16 \begin{cfa} 20 17 void f( @array( float, 42 )@ & p ) {} $\C{// p accepts 42 floats}$ … … 24 21 Applying untyped: Name: f ... to: Name: x 25 22 \end{cfa} 26 Here, the function @f@'s parameter @p@ is declared with length 42. 27 The call @f( x )@, with the argument being the previously-declared object, is invalid, because the @array@ lengths @99@ and @42@ do not match. 28 29 A function declaration can be polymorphic over these @array@ arguments by using the @forall@ declaration prefix. 30 This function @g@'s takes arbitrary type parameter @T@ (familiar) and \emph{dimension parameter} @N@ (new). 31 A dimension paramter represents a to-be-determined count of elements, managed by the type system. 23 The call @f( x )@ is invalid because the @array@ lengths @99@ and @42@ do not match. 24 25 Next, function @g@ introduces a @forall@ prefix on type parameter @T@ and arbitrary \emph{dimension parameter} @N@, the new feature that represents a count of elements managed by the type system. 32 26 \begin{cfa} 33 27 forall( T, @[N]@ ) … … 41 35 \end{cfa} 42 36 The call @g( x, 0 )@ is valid because @g@ accepts any length of array, where the type system infers @float@ for @T@ and length @99@ for @N@. 43 Inferring values for @T@ and @N@ is implicit ,without programmer involvement.44 Furthermore, in this case, the runtime subscript @x[0]@ (parameter @i@ being @0@) in @g@ is valid because 0is in the dimension range $[0,99)$ of argument @x@.45 The call @g( x, 1000 )@ is also accepted through compile time;46 however, th is case's subscript, @x[1000]@, generates an error,because @1000@ is outside the dimension range $[0,99)$ of argument @x@.47 48 The generic @array@ type is comparableto the C array type, which \CFA inherits from C.49 Theirruntime characteristics are often identical, and some features are available in both.50 For example, assume a caller instantiates @N@ with 42 (discussion about how to follow) in:37 Inferring values for @T@ and @N@ is implicit without programmer involvement. 38 Furthermore, the runtime subscript @x[0]@ (parameter @i@ is @0@) in @g@ is valid because @0@ is in the dimension range $[0,99)$ of argument @x@. 39 The call @g( x, 1000 )@ is also valid; 40 however, the runtime subscript @x[1000]@ is invalid (generates a subscript error) because @1000@ is outside the dimension range $[0,99)$ of argument @x@. 41 42 The generic @array@ type is similar to the C array type, which \CFA inherits from C. 43 Its runtime characteristics are often identical, and some features are available in both. 44 For example, assume a caller can instantiates @N@ with 42 in the following (details to follow). 51 45 \begin{cfa} 52 46 forall( [N] ) … … 57 51 \end{cfa} 58 52 Both of the locally-declared array variables, @x1@ and @x2@, have 42 elements, each element being a @float@. 59 The two variables have identical size and layout; they both encapsulate 42-float stack allocations,with no additional ``bookkeeping'' allocations or headers.53 The two variables have identical size and layout; they both encapsulate 42-float, stack \vs heap allocations with no additional ``bookkeeping'' allocations or headers. 60 54 Providing this explicit generic approach requires a significant extension to the \CFA type system to support a full-feature, safe, efficient (space and time) array-type, which forms the foundation for more complex array forms in \CFA. 61 55 62 56 Admittedly, the @array@ library type (type for @x2@) is syntactically different from its C counterpart. 63 A future goal (TODO xref) is to provide the new features upon a built-in type whose syntax approaches C's (declaration style of @x1@).64 Then, the library @array@ type could be removed,giving \CFA a largely uniform array type.65 At present, the C -syntax array gets partial support for the new features, so the generic @array@ is used exclusively when introducing features;57 A future goal (TODO xref) is to provide a built-in array type with syntax approaching C's (type for @x1@); 58 then, the library @array@ type can be removed giving \CFA a largely uniform array type. 59 At present, the C syntax @array@ is only partially supported, so the generic @array@ is used exclusively in the discussion; 66 60 feature support and C compatibility are revisited in Section ? TODO. 67 61 68 62 Offering the @array@ type, as a distinct alternative to the C array, is consistent with \CFA's goal of backwards compatibility, \ie virtually all existing C (@gcc@) programs can be compiled by \CFA with only a small number of changes, similar to \CC (@g++@). 69 63 However, a few compatibility-breaking changes to the behaviour of the C array are necessary, both as an implementation convenience and to fix C's lax treatment of arrays. 70 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity of the C array. 71 72 In all discussion following, ``C array'' means the types like that of @x@ and ``\CFA array'' means the standard-library @array@ type (instantiations), like the type of @x2@. 64 Hence, the @array@ type is an opportunity to start from a clean slate and show a cohesive selection of features, making it unnecessary to deal with every inherited complexity introduced by the C array TODO xref. 73 65 74 66 My contributions in this chapter are: … … 89 81 90 82 91 92 General dependent typing allows the type system to encode arbitrary predicates (e.g. behavioural specifications for functions),93 which is an anti-goal for my work.94 Firstly, this application is strongly associated with pure functional languages,95 where a characterization of the return value (giving it a precise type, generally dependent upon the parameters)96 is a sufficient postcondition.97 In an imperative language like C and \CFA, it is also necessary to discuss side effects,98 for which an even heavier formalism, like separation logic, is required.99 Secondly, TODO: bash Rust.100 TODO: cite the crap out of these claims.101 102 103 104 83 \section{Features added} 105 84 106 This section shows more about using the \CFA array and dimension parameters, demonstrating their syntax and semantics by way of motivating examples. 107 As stated, the core capability of the new array is tracking all dimensions within the type system, where dynamic dimensions are represented using type variables. 108 109 By declaring type variables at the front of object declarations, an array dimension is lexically referenceable where it is needed. 110 For example, a declaration can share one length, @N@, among a pair of parameters and the return, 111 meaning that it requires both input arrays to be of the same length, and guarantees that the result with be of that length as well. 85 This section presents motivating examples for the new array type, demonstrating the syntax and semantics of the generic @array@. 86 As stated, the core capability of the new array is tracking all dimensions in the type system, where dynamic dimensions are represented using type variables. 87 88 The definition of type variables preceding object declarations makes the array dimension lexically referenceable where it is needed. 89 For example, a declaration can share one length, @N@, among a pair of parameters and the return. 112 90 \lstinput{10-17}{hello-array.cfa} 113 This function @f@ does a pointwise comparison of its two input arrays, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 114 The dynamic allocation of the @ret@ array by preexisting @alloc@ uses the parameterized dimension information implicitly within its @sizeof@ determination, and casts the return type. 115 Note that alloc only sees one whole type for its @T@ (which is @f@'s @array(bool, N)@); this type's size is a computation based on @N@. 116 \begin{cfa} 117 // simplification 91 Here, the function @f@ does a pointwise comparison, checking if each pair of numbers is within half a percent of each other, returning the answers in a newly allocated @bool@ array. 92 The dynamic allocation of the @ret@ array by @alloc@ uses the parameterized dimension information in its implicit @_Alignof@ and @sizeof@ determinations, and casting the return type. 93 \begin{cfa} 118 94 static inline forall( T & | sized(T) ) 119 T * alloc() { 120 return (T *)malloc( sizeof(T) ); 121 } 122 \end{cfa} 95 T * alloc( size_t dim ) { 96 if ( _Alignof(T) <= libAlign() ) return (T *)aalloc( dim, sizeof(T) ); // calloc without zero fill 97 else return (T *)amemalign( _Alignof(T), dim, sizeof(T) ); // array memalign 98 } 99 \end{cfa} 100 Here, the type system deduces from the left-hand side of the assignment the type @array(bool, N)@, and forwards it as the type variable @T@ for the generic @alloc@ function, making it available in its body. 101 Hence, preexisting \CFA behaviour is leveraged here, both in the return-type polymorphism, and the @sized(T)@-aware standard-library @alloc@ routine. 123 102 This example illustrates how the new @array@ type plugs into existing \CFA behaviour by implementing necessary @sized@ assertions needed by other types. 124 (@sized@ implies a concrete \vs abstract type with a runtime-available size, exposed as @sizeof@.)103 (@sized@ implies a concrete \vs abstract type with a compile-time size.) 125 104 As a result, there is significant programming safety by making the size accessible and implicit, compared with C's @calloc@ and non-array supporting @memalign@, which take an explicit length parameter not managed by the type system. 126 105 … … 132 111 \end{figure} 133 112 134 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@ , illustratinghow dynamic values are fed into the @array@ type.113 \VRef[Figure]{f:fHarness} shows a harness that uses function @f@ to illustrate how dynamic values are fed into the @array@ type. 135 114 Here, the dimension of arrays @x@, @y@, and @result@ is specified from a command-line value, @dim@, and these arrays are allocated on the stack. 136 115 Then the @x@ array is initialized with decreasing values, and the @y@ array with amounts offset by constant @0.005@, giving relative differences within tolerance initially and diverging for later values. … … 140 119 The example shows @dim@ adapting into a type-system managed length at the declarations of @x@, @y@, and @result@, @N@ adapting in the same way at @f@'s loop bound, and a pass-thru use of @dim@ at @f@'s declaration of @ret@. 141 120 Except for the lifetime-management issue of @result@, \ie explicit @free@, this program has eliminated both the syntactic and semantic problems associated with C arrays and their usage. 142 The result is a significant improvement in safety and usability.121 These benefits cannot be underestimated. 143 122 144 123 In general, the @forall( ..., [N] )@ participates in the user-relevant declaration of the name @N@, which becomes usable in parameter/return declarations and within a function. 145 124 The syntactic form is chosen to parallel other @forall@ forms: 146 125 \begin{cfa} 147 forall( @[N]@ ) ... $\C[1.5in]{// dimension}$148 forall( T & ) ... $\C{// opaque datatype (formerly, "dtype")}$149 forall( T ) ... $\C{// value datatype (formerly, "otype")}\CRT$126 forall( @[N]@ ) ... $\C[1.5in]{// array kind}$ 127 forall( T & ) ... $\C{// reference kind (dtype)}$ 128 forall( T ) ... $\C{// value kind (otype)}\CRT$ 150 129 \end{cfa} 151 130 % The notation @array(thing, N)@ is a single-dimensional case, giving a generic type instance. … … 153 132 \begin{itemize} 154 133 \item 155 @[N]@ within a @forall@declares the type variable @N@ to be a managed length.156 \item 157 @N@ can be used an expression of type @size_t@ within the declared function body.158 \item 159 The value of an @N@-expression is the acquired length,derived from the usage site, \ie generic declaration or function call.160 \item 161 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ -typedobjects.134 @[N]@ within a forall declares the type variable @N@ to be a managed length. 135 \item 136 The type of @N@ within code is @size_t@. 137 \item 138 The value of @N@ within code is the acquired length derived from the usage site, \ie generic declaration or function call. 139 \item 140 @array( thing, N0, N1, ... )@ is a multi-dimensional type wrapping $\prod_i N_i$ adjacent occurrences of @thing@ objects. 162 141 \end{itemize} 163 142 … … 165 144 \begin{enumerate}[leftmargin=*] 166 145 \item 167 The \CC template @N@ can only be compile-time value, while the \CFA @N@ may be a runtime value. 168 % agreed, though already said 169 \item 170 \CC does not allow a template function to be nested, while \CFA lests its polymorphic functions to be nested. 171 % why is this important? 146 The \CC template @N@ is a compile-time value, while the \CFA @N@ is a runtime value. 172 147 \item 173 148 The \CC template @N@ must be passed explicitly at the call, unless @N@ has a default value, even when \CC can deduct the type of @T@. 174 149 The \CFA @N@ is part of the array type and passed implicitly at the call. 175 % fixed by comparing to std::array176 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v2177 150 \item 178 151 \CC cannot have an array of references, but can have an array of pointers. … … 180 153 In the \CC example, the arrays fall back on C arrays, which have a duality with references with respect to automatic dereferencing. 181 154 The \CFA array is a contiguous object with an address, which can be stored as a reference or pointer. 182 % not really about forall-N vs template-N183 % any better CFA support is how Rob left references, not what Mike did to arrays184 % https://stackoverflow.com/questions/1164266/why-are-arrays-of-references-illegal185 % https://stackoverflow.com/questions/922360/why-cant-i-make-a-vector-of-references186 155 \item 187 156 C/\CC arrays cannot be copied, while \CFA arrays can be copied, making them a first-class object (although array copy is often avoided for efficiency). 188 % fixed by comparing to std::array189 % mycode/arrr/thesis-examples/check-peter/cs-cpp.cpp, v10190 157 \end{enumerate} 191 TODO: settle Mike's concerns with this comparison (perhaps, remove)192 158 193 159 \begin{figure} … … 200 166 } 201 167 int main() { 202 203 168 int ret[10], x[10]; 204 169 for ( int i = 0; i < 10; i += 1 ) x[i] = i; … … 212 177 \begin{cfa} 213 178 int main() { 214 @forall( T, [N] )@ 179 @forall( T, [N] )@ // nested function 215 180 void copy( array( T, @N@ ) & ret, array( T, @N@ ) & x ) { 216 for ( i; N ) ret[i] = x[i]; 217 } 218 219 const int n = promptForLength(); 220 array( int, n ) ret, x; 221 for ( i; n ) x[i] = i; 181 for ( i; 10 ) ret[i] = x[i]; 182 } 183 184 array( int, 10 ) ret, x; 185 for ( i; 10 ) x[i] = i; 222 186 @copy( ret, x );@ 223 for ( i; n)187 for ( i; 10 ) 224 188 sout | ret[i] | nonl; 225 189 sout | nl; … … 227 191 \end{cfa} 228 192 \end{tabular} 229 \caption{\ lstinline{N}-style paramters, for \CC template\vs \CFA generic type }193 \caption{\CC \lstinline[language=C++]{template} \vs \CFA generic type } 230 194 \label{f:TemplateVsGenericType} 231 195 \end{figure} 232 196 233 Just as the first example in \VRef[Section]{s:ArrayIntro} shows a compile-time rejection of a length mismatch, 234 so are length mismatches stopped when they invlove dimension parameters. 235 While \VRef[Figure]{f:fHarness} shows successfully calling a function @f@ expecting two arrays of the same length, 236 \begin{cfa} 237 array( bool, N ) & f( array( float, N ) &, array( float, N ) & ); 238 \end{cfa} 239 a static rejection occurs when attempting to call @f@ with arrays of potentially differing lengths. 240 \lstinput[tabsize=1]{70-74}{hello-array.cfa} 241 When the argument lengths themselves are statically unknown, 242 the static check is conservative and, as always, \CFA's casting lets the programmer use knowledge not shared with the type system. 243 \begin{tabular}{@{\hspace{0.5in}}l@{\hspace{1in}}l@{}} 244 \lstinput{90-97}{hello-array.cfa} 245 & 246 \lstinput{110-117}{hello-array.cfa} 247 \end{tabular} 248 249 \noindent 250 This static check's full rules are presented in \VRef[Section]{s:ArrayTypingC}. 251 252 Orthogonally, the \CFA array type works within generic \emph{types}, \ie @forall@-on-@struct@. 253 The same argument safety and the associated implicit communication of array length occurs. 254 Preexisting \CFA allowed aggregate types to be generalized with type parameters, enabling parameterizing for element types. 255 Now, \CFA also allows parameterizing them by length. 256 Doing so gives a refinement of C's ``flexible array member'' pattern[TODO: cite ARM 6.7.2.1 pp18]\cite{arr:gnu-flex-mbr}. 257 While a C flexible array member can only occur at the end of the enclosing structure, 258 \CFA allows length-parameterized array members to be nested at arbitrary locations. 259 This flexibility, in turn, allows for multiple array members. 197 Continuing the discussion of \VRef[Figure]{f:fHarness}, the example has @f@ expecting two arrays of the same length. 198 As stated previous, a compile-time error occurs when attempting to call @f@ with arrays of differing lengths. 199 % removing leading whitespace 200 \lstinput[tabsize=1]{52-53}{hello-array.cfa} 201 \lstinput[tabsize=1,aboveskip=0pt]{62-64}{hello-array.cfa} 202 C allows casting to assert knowledge not shared with the type system. 203 \lstinput{70-74}{hello-array.cfa} 204 205 Orthogonally, the new @array@ type works with \CFA's generic types, providing argument safety and the associated implicit communication of array length. 206 Specifically, \CFA allows aggregate types to be generalized with multiple type parameters, including parameterized element types and lengths. 207 Doing so gives a refinement of C's ``flexible array member'' pattern, allowing nesting structures with array members anywhere within the structures. 260 208 \lstinput{10-15}{hello-accordion.cfa} 261 This structure's layout has the starting offset of @studentIds@ varying according to the generic parameter @C@, and the offset of @preferences@ varying according toboth generic parameters.209 This structure's layout has the starting offset of @studentIds@ varying in generic parameter @C@, and the offset of @preferences@ varying in both generic parameters. 262 210 For a function that operates on a @School@ structure, the type system handles this memory layout transparently. 263 211 \lstinput{40-45}{hello-accordion.cfa} 264 \VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes. 265 Note the declaration of the @school@ variable. 266 It is on the stack and its initialization does not use any casting or size arithmetic. 267 Both of these points are impossible with a C flexible array member. 268 When heap allocation is preferred, the original pattern still applies. 269 \begin{cfa} 270 School( classes, students ) * sp = alloc(); 271 \end{cfa} 272 This ability to avoid casting and size arithmetic improves safety and usability over C flexible array members. 273 212 \VRef[Figure]{f:checkHarness} shows the @School@ harness and results with different array sizes, where multidimensional arrays are discussed next. 274 213 275 214 \begin{figure} … … 277 216 \begin{tabular}{@{}ll@{\hspace{25pt}}l@{}} 278 217 \begin{tabular}{@{}p{3.25in}@{}} 279 \lstinput{60-6 4}{hello-accordion.cfa}218 \lstinput{60-66}{hello-accordion.cfa} 280 219 \vspace*{-3pt} 281 220 \lstinput{73-80}{hello-accordion.cfa} … … 294 233 295 234 \section{Typing of C Arrays} 296 \label{s:ArrayTypingC}297 235 298 236 Essential in giving a guarantee of accurate length is the compiler's ability -
TabularUnified doc/theses/mike_brooks_MMath/programs/hello-accordion.cfa ¶
r33474e6 rbf91d1d 39 39 40 40 forall( [C], [S] ) 41 void put( @School( C, S ) & s@, int class, int student, int pref) {42 s.classIds[class] = class; $\C{// fields' offsets are dynamic}$43 s .studentIds[student] = student;44 s.preferences[class][student] = pref;41 void init( @School( C, S ) & classes@, int class, int student, int pref ) with( classes ) { 42 classIds[class] = class; $\C{// handle dynamic offsets of fields within structure}$ 43 studentIds[student] = student; 44 preferences[class][student] = pref; 45 45 } 46 46 … … 62 62 sin | classes | students; 63 63 @School( classes, students ) school;@ 64 // elided: read data into school, calling put65 { int class, student, preference;66 67 68 69 70 put( school, class, student, preference );71 72 } catch( end_of_file * ) {}}64 int class, student, preference; 65 // read data into school calling init 66 // for each student's class/preferences 67 try { 68 for ( ) { 69 sin | class | student | preference; 70 init( school, class, student, preference ); 71 } 72 } catch( end_of_file * ) {} 73 73 for ( s; students ) { 74 74 sout | "student" | s | nonl; -
TabularUnified doc/theses/mike_brooks_MMath/programs/hello-array.cfa ¶
r33474e6 rbf91d1d 49 49 */ 50 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 #ifdef SHOWERR165 66 51 void fred() { 67 68 69 70 array( float, @10@ ) x;71 array( float, @20@ ) y;72 f( x, x ); $\C[0.5in]{// ok}$73 f( y, y ); $\C{// ok}$74 f( x, y ); $\C{// error}\CRT$75 76 77 78 79 }80 81 82 83 84 85 86 87 88 89 90 forall( [M], [N] )91 void bad( array(float, M) &x,92 array(float, N) &y ) {93 f( x, x ); $\C[0.5in]{// ok}$94 f( y, y ); $\C{// ok}$95 96 f( x, y ); $\C{// error}\CRT$97 }98 99 #endif100 101 102 103 104 105 106 107 108 109 110 forall( [M], [N] )111 void bad_fixed( array( float, M ) & x,112 array( float, N ) & y ) {113 f( x, x ); $\C[0.5in]{// ok}$114 f( y, y ); $\C{// ok}$115 if ( M == N )116 f( x, @(array( float, M ) &)@y ); $\C{// ok}$117 }118 119 120 121 122 123 124 void fred_ok_only() {125 52 array( float, @10@ ) x; 126 53 array( float, @20@ ) y; … … 130 57 } 131 58 59 #ifdef SHOWERR1 60 forall( [M], [N] ) 61 void bad( array(float, M) &x, array(float, N) &y ) { 62 f( x, x ); $\C[1.5in]{// ok}$ 63 f( y, y ); $\C{// ok}$ 64 f( x, y ); $\C{// error}\CRT$ 65 } 66 #endif 67 68 132 69 133 70 forall( [M], [N] ) 134 void bad_ok_only( array(float, M) &x, 135 array(float, N) &y ) { 136 f( x, x ); 137 f( y, y ); 138 139 // f( x, y ); 71 void bad_fixed( array( float, M ) & x, array( float, N ) & y ) { 72 if ( M == N ) 73 f( x, @(array( float, M ) &)@y ); $\C{// cast y to matching type}$ 140 74 } 141 142 75 143 76 // Local Variables: // -
TabularUnified doc/theses/mike_brooks_MMath/uw-ethesis.bib ¶
r33474e6 rbf91d1d 17 17 year = {2018}, 18 18 } 19 20 % --------------------------------------------------21 % C Array facts22 23 @misc{arr:gnu-flex-mbr,24 title = {Arrays of Length Zero},25 howpublished= {\url{https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html}},26 }27 28 19 29 20 % --------------------------------------------------
Note: See TracChangeset
for help on using the changeset viewer.