Changeset 486caad for doc/theses/mike_brooks_MMath
- Timestamp:
- Mar 25, 2024, 7:15:30 PM (2 years ago)
- Branches:
- master, stuck-waitfor-destruct
- Children:
- d734fa1
- Parents:
- df78cce (diff), bf050c5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 4 edited
-
array.tex (modified) (1 diff)
-
background.tex (modified) (2 diffs)
-
intro.tex (modified) (2 diffs)
-
uw-ethesis.tex (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
rdf78cce r486caad 510 510 511 511 \subsection{Retire pointer arithmetic} 512 513 514 \section{\CFA} 515 516 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \\ 517 moved from background chapter \\ 518 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \\ 519 520 Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it. 521 (For later: That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.) 522 523 \subsection{\CFA features interacting with arrays} 524 525 Prior work on \CFA included making C arrays, as used in C code from the wild, 526 work, if this code is fed into @cfacc@. 527 The quality of this this treatment was fine, with no more or fewer bugs than is typical. 528 529 More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features. 530 531 A notable success was with the \CFA @alloc@ function, 532 which type information associated with a polymorphic return type 533 replaces @malloc@'s use of programmer-supplied size information. 534 \begin{cfa} 535 // C, library 536 void * malloc( size_t ); 537 // C, user 538 struct tm * el1 = malloc( sizeof(struct tm) ); 539 struct tm * ar1 = malloc( 10 * sizeof(struct tm) ); 540 541 // CFA, library 542 forall( T * ) T * alloc(); 543 // CFA, user 544 tm * el2 = alloc(); 545 tm (*ar2)[10] = alloc(); 546 \end{cfa} 547 The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument. 548 This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type. 549 Using a compiler-produced value eliminates an opportunity for user error. 550 551 TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa 552 553 Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug. 554 In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@. 555 They are not subscripted in the same way. 556 \begin{cfa} 557 ar1[5]; 558 (*ar2)[5]; 559 \end{cfa} 560 Using ``reference to array'' works at resolving this issue. TODO: discuss connection with Doug-Lea \CC proposal. 561 \begin{cfa} 562 tm (&ar3)[10] = *alloc(); 563 ar3[5]; 564 \end{cfa} 565 The implicit size communication to @alloc@ still works in the same ways as for @ar2@. 566 567 Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one. 568 TODO xref C standard does not claim that @ar1@ may be subscripted, 569 because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.'' 570 But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls, 571 where the type requested is an array, making the result, much more obviously, an array object. 572 573 The ``reference to array'' type has its sore spots too. 574 TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@) 575 576 TODO: I fixed a bug associated with using an array as a T. I think. Did I really? What was the bug? -
doc/theses/mike_brooks_MMath/background.tex
rdf78cce r486caad 1 1 \chapter{Background} 2 2 3 This chapter states facts about the prior work, upon which my contributions build. 4 Each receives a justification of the extent to which its statement is phrased to provoke controversy or surprise. 5 6 \section{C} 7 8 \subsection{Common knowledge} 9 10 The reader is assumed to have used C or \CC for the coursework of at least four university-level courses, or have equivalent experience. 11 The current discussion introduces facts, unaware of which, such a functioning novice may be operating. 12 13 % TODO: decide if I'm also claiming this collection of facts, and test-oriented presentation is a contribution; if so, deal with (not) arguing for its originality 14 15 \subsection{Convention: C is more touchable than its standard} 16 17 When it comes to explaining how C works, I like illustrating definite program semantics. 18 I prefer doing so, over a quoting manual's suggested programmer's intuition, or showing how some compiler writers chose to model their problem. 19 To illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and I show its behaviour. 20 21 This behaviour is typically one of 22 \begin{itemize} 23 \item my statement that the compiler accepts or rejects the program 24 \item the program's printed output, which I show 25 \item my implied assurance that its assertions do not fail when run 26 \end{itemize} 27 28 The compiler whose program semantics is shown is 29 \begin{cfa} 30 $ gcc --version 31 gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 32 \end{cfa} 33 running on Architecture @x86_64@, with the same environment targeted. 34 35 Unless explicit discussion ensues about differences among compilers or with (versions of) the standard, it is further implied that there exists a second version of GCC and some version of Clang, running on and for the same platform, that give substantially similar behaviour. 36 In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard. 37 38 39 \subsection{C reports many ill-typed expressions as warnings} 40 41 These attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed. 42 \lstinput{12-15}{bkgd-c-tyerr.c} 43 with warnings: 44 \begin{cfa} 45 warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)' 46 warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *' 47 \end{cfa} 48 Similarly, 49 \lstinput{17-19}{bkgd-c-tyerr.c} 50 with warning: 51 \begin{cfa} 52 warning: passing argument 1 of 'f' from incompatible pointer type 53 note: expected 'void (*)(void)' but argument is of type 'float *' 54 \end{cfa} 55 with a segmentation fault at runtime. 56 57 That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@. 58 Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition. 59 60 A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program. 61 The behaviour (whose absence is unprovable) is neither minor nor unlikely. 62 The rejection shows that the program is ill-typed. 63 64 Yet, the rejection presents as a GCC warning. 65 66 In the discussion following, ``ill-typed'' means giving a nonzero @gcc -Werror@ exit condition with a message that discusses typing. 67 68 *1 TAPL-pg1 definition of a type system 69 70 71 \section{C Arrays} 72 73 \subsection{C has an array type (!)} 3 Since this work builds on C, it is necessary to explain the C mechanisms and their shortcomings for array, linked list, and string, 4 5 6 \section{Array} 74 7 75 8 When a programmer works with an array, C semantics provide access to a type that is different in every way from ``pointer to its first element.'' … … 511 444 512 445 513 \section{\CFA} 514 515 Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it. 516 (For later: That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.) 517 518 \subsection{\CFA features interacting with arrays} 519 520 Prior work on \CFA included making C arrays, as used in C code from the wild, 521 work, if this code is fed into @cfacc@. 522 The quality of this this treatment was fine, with no more or fewer bugs than is typical. 523 524 More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features. 525 526 A notable success was with the \CFA @alloc@ function, 527 which type information associated with a polymorphic return type 528 replaces @malloc@'s use of programmer-supplied size information. 529 \begin{cfa} 530 // C, library 531 void * malloc( size_t ); 532 // C, user 533 struct tm * el1 = malloc( sizeof(struct tm) ); 534 struct tm * ar1 = malloc( 10 * sizeof(struct tm) ); 535 536 // CFA, library 537 forall( T * ) T * alloc(); 538 // CFA, user 539 tm * el2 = alloc(); 540 tm (*ar2)[10] = alloc(); 541 \end{cfa} 542 The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument. 543 This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type. 544 Using a compiler-produced value eliminates an opportunity for user error. 545 546 TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa 547 548 Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug. 549 In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@. 550 They are not subscripted in the same way. 551 \begin{cfa} 552 ar1[5]; 553 (*ar2)[5]; 554 \end{cfa} 555 Using ``reference to array'' works at resolving this issue. TODO: discuss connection with Doug-Lea \CC proposal. 556 \begin{cfa} 557 tm (&ar3)[10] = *alloc(); 558 ar3[5]; 559 \end{cfa} 560 The implicit size communication to @alloc@ still works in the same ways as for @ar2@. 561 562 Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one. 563 TODO xref C standard does not claim that @ar1@ may be subscripted, 564 because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.'' 565 But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls, 566 where the type requested is an array, making the result, much more obviously, an array object. 567 568 The ``reference to array'' type has its sore spots too. 569 TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@) 570 571 TODO: I fixed a bug associated with using an array as a T. I think. Did I really? What was the bug? 446 \section{Linked List} 447 448 449 \section{String} -
doc/theses/mike_brooks_MMath/intro.tex
rdf78cce r486caad 1 1 \chapter{Introduction} 2 3 All modern programming languages provide three high-level containers (collection): array, linked-list, and string. 4 Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list. 2 5 3 6 \cite{Blache19} … … 5 8 \cite{Ruef19} 6 9 7 \section{Array s}10 \section{Array} 8 11 9 \section{Strings} 12 Array provides a homogeneous container with $O(1)$ access to elements using subscripting. 13 The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation. 14 For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap. 15 16 17 \section{Linked List} 18 19 Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations. 20 Subscripting by value is sometimes available, \eg hash table. 21 Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes). 22 If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack; 23 otherwise, elements are heap allocated and explicitly/implicitly managed. 24 25 26 \section{String} 27 28 String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters. 29 What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements. 30 Nevertheless, subscripting is often available. 31 The cost of string operations is less important than the power of the block operation to accomplish complex manipulation. 32 The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages. 33 34 35 \section{Motivation} 36 37 The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{CFA}, which is based on C. 38 Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design. 39 Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built. 40 41 42 \subsection{C?} 43 44 Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}. 45 However, most programming languages are only partially explained by standard's manuals. 46 When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}. 47 Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features. 48 While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour. 49 These example programs show 50 \begin{itemize} 51 \item the compiler accepts or rejects certain syntax, 52 \item prints output to buttress a claim of behaviour, 53 \item executes without triggering any embedded assertions testing pre/post-assertions or invariants. 54 \end{itemize} 55 This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures. 56 Any discovered anomalies among compilers or versions is discussed. 57 In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard. 58 59 60 \subsection{Ill-Typed Expressions} 61 62 C reports many ill-typed expressions as warnings. 63 For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed. 64 \lstinput{12-15}{bkgd-c-tyerr.c} 65 with warnings: 66 \begin{cfa} 67 warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)' 68 warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *' 69 \end{cfa} 70 Similarly, 71 \lstinput{17-19}{bkgd-c-tyerr.c} 72 with warning: 73 \begin{cfa} 74 warning: passing argument 1 of 'f' from incompatible pointer type 75 note: expected 'void (*)(void)' but argument is of type 'float *' 76 \end{cfa} 77 with a segmentation fault at runtime. 78 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness. 79 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings. 80 In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing. 81 Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors. 82 83 % That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@. 84 % Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition. 85 86 % A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program. 87 % The behaviour (whose absence is unprovable) is neither minor nor unlikely. 88 % The rejection shows that the program is ill-typed. 89 % 90 % Yet, the rejection presents as a GCC warning. 91 % *1 TAPL-pg1 definition of a type system 92 10 93 11 94 \section{Contributions} 95 96 \subsection{Linked List} 97 98 \subsection{Array} 99 100 \subsection{String} -
doc/theses/mike_brooks_MMath/uw-ethesis.tex
rdf78cce r486caad 218 218 \input{intro} 219 219 \input{background} 220 \input{array} 220 221 \input{list} 221 \input{array}222 222 \input{string} 223 223 \input{conclusion}
Note:
See TracChangeset
for help on using the changeset viewer.