Ignore:
Timestamp:
Mar 25, 2024, 7:15:30 PM (2 years ago)
Author:
JiadaL <j82liang@…>
Branches:
master, stuck-waitfor-destruct
Children:
d734fa1
Parents:
df78cce (diff), bf050c5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/theses/mike_brooks_MMath
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/array.tex

    rdf78cce r486caad  
    510510
    511511\subsection{Retire pointer arithmetic}
     512
     513
     514\section{\CFA}
     515
     516XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \\
     517moved from background chapter \\
     518XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \\
     519
     520Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it.
     521(For later:  That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.)
     522
     523\subsection{\CFA features interacting with arrays}
     524
     525Prior work on \CFA included making C arrays, as used in C code from the wild,
     526work, if this code is fed into @cfacc@.
     527The quality of this this treatment was fine, with no more or fewer bugs than is typical.
     528
     529More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features.
     530
     531A notable success was with the \CFA @alloc@ function,
     532which type information associated with a polymorphic return type
     533replaces @malloc@'s use of programmer-supplied size information.
     534\begin{cfa}
     535// C, library
     536void * malloc( size_t );
     537// C, user
     538struct tm * el1 = malloc(      sizeof(struct tm) );
     539struct tm * ar1 = malloc( 10 * sizeof(struct tm) );
     540
     541// CFA, library
     542forall( T * ) T * alloc();
     543// CFA, user
     544tm * el2 = alloc();
     545tm (*ar2)[10] = alloc();
     546\end{cfa}
     547The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument.
     548This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type.
     549Using a compiler-produced value eliminates an opportunity for user error.
     550
     551TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
     552
     553Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
     554In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@.
     555They are not subscripted in the same way.
     556\begin{cfa}
     557ar1[5];
     558(*ar2)[5];
     559\end{cfa}
     560Using ``reference to array'' works at resolving this issue.  TODO: discuss connection with Doug-Lea \CC proposal.
     561\begin{cfa}
     562tm (&ar3)[10] = *alloc();
     563ar3[5];
     564\end{cfa}
     565The implicit size communication to @alloc@ still works in the same ways as for @ar2@.
     566
     567Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
     568TODO xref C standard does not claim that @ar1@ may be subscripted,
     569because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
     570But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
     571where the type requested is an array, making the result, much more obviously, an array object.
     572
     573The ``reference to array'' type has its sore spots too.
     574TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
     575
     576TODO: I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
  • doc/theses/mike_brooks_MMath/background.tex

    rdf78cce r486caad  
    11\chapter{Background}
    22
    3 This chapter states facts about the prior work, upon which my contributions build.
    4 Each receives a justification of the extent to which its statement is phrased to provoke controversy or surprise.
    5 
    6 \section{C}
    7 
    8 \subsection{Common knowledge}
    9 
    10 The reader is assumed to have used C or \CC for the coursework of at least four university-level courses, or have equivalent experience.
    11 The current discussion introduces facts, unaware of which, such a functioning novice may be operating.
    12 
    13 % TODO: decide if I'm also claiming this collection of facts, and test-oriented presentation is a contribution; if so, deal with (not) arguing for its originality
    14 
    15 \subsection{Convention: C is more touchable than its standard}
    16 
    17 When it comes to explaining how C works, I like illustrating definite program semantics.
    18 I prefer doing so, over a quoting manual's suggested programmer's intuition, or showing how some compiler writers chose to model their problem.
    19 To illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and I show its behaviour.
    20 
    21 This behaviour is typically one of
    22 \begin{itemize}
    23         \item my statement that the compiler accepts or rejects the program
    24         \item the program's printed output, which I show
    25         \item my implied assurance that its assertions do not fail when run
    26 \end{itemize}
    27 
    28 The compiler whose program semantics is shown is
    29 \begin{cfa}
    30 $ gcc --version
    31 gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    32 \end{cfa}
    33 running on Architecture @x86_64@, with the same environment targeted.
    34 
    35 Unless explicit discussion ensues about differences among compilers or with (versions of) the standard, it is further implied that there exists a second version of GCC and some version of Clang, running on and for the same platform, that give substantially similar behaviour.
    36 In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
    37 
    38 
    39 \subsection{C reports many ill-typed expressions as warnings}
    40 
    41 These attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
    42 \lstinput{12-15}{bkgd-c-tyerr.c}
    43 with warnings:
    44 \begin{cfa}
    45 warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
    46 warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
    47 \end{cfa}
    48 Similarly,
    49 \lstinput{17-19}{bkgd-c-tyerr.c}
    50 with warning:
    51 \begin{cfa}
    52 warning: passing argument 1 of 'f' from incompatible pointer type
    53 note: expected 'void (*)(void)' but argument is of type 'float *'
    54 \end{cfa}
    55 with a segmentation fault at runtime.
    56 
    57 That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
    58 Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
    59 
    60 A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
    61 The behaviour (whose absence is unprovable) is neither minor nor unlikely.
    62 The rejection shows that the program is ill-typed.
    63 
    64 Yet, the rejection presents as a GCC warning.
    65 
    66 In the discussion following, ``ill-typed'' means giving a nonzero @gcc -Werror@ exit condition with a message that discusses typing.
    67 
    68 *1  TAPL-pg1 definition of a type system
    69 
    70 
    71 \section{C Arrays}
    72 
    73 \subsection{C has an array type (!)}
     3Since this work builds on C, it is necessary to explain the C mechanisms and their shortcomings for array, linked list, and string,
     4
     5
     6\section{Array}
    747
    758When a programmer works with an array, C semantics provide access to a type that is different in every way from ``pointer to its first element.''
     
    511444
    512445
    513 \section{\CFA}
    514 
    515 Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it.
    516 (For later:  That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.)
    517 
    518 \subsection{\CFA features interacting with arrays}
    519 
    520 Prior work on \CFA included making C arrays, as used in C code from the wild,
    521 work, if this code is fed into @cfacc@.
    522 The quality of this this treatment was fine, with no more or fewer bugs than is typical.
    523 
    524 More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features.
    525 
    526 A notable success was with the \CFA @alloc@ function,
    527 which type information associated with a polymorphic return type
    528 replaces @malloc@'s use of programmer-supplied size information.
    529 \begin{cfa}
    530 // C, library
    531 void * malloc( size_t );
    532 // C, user
    533 struct tm * el1 = malloc(      sizeof(struct tm) );
    534 struct tm * ar1 = malloc( 10 * sizeof(struct tm) );
    535 
    536 // CFA, library
    537 forall( T * ) T * alloc();
    538 // CFA, user
    539 tm * el2 = alloc();
    540 tm (*ar2)[10] = alloc();
    541 \end{cfa}
    542 The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument.
    543 This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type.
    544 Using a compiler-produced value eliminates an opportunity for user error.
    545 
    546 TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
    547 
    548 Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
    549 In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@.
    550 They are not subscripted in the same way.
    551 \begin{cfa}
    552 ar1[5];
    553 (*ar2)[5];
    554 \end{cfa}
    555 Using ``reference to array'' works at resolving this issue.  TODO: discuss connection with Doug-Lea \CC proposal.
    556 \begin{cfa}
    557 tm (&ar3)[10] = *alloc();
    558 ar3[5];
    559 \end{cfa}
    560 The implicit size communication to @alloc@ still works in the same ways as for @ar2@.
    561 
    562 Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
    563 TODO xref C standard does not claim that @ar1@ may be subscripted,
    564 because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
    565 But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
    566 where the type requested is an array, making the result, much more obviously, an array object.
    567 
    568 The ``reference to array'' type has its sore spots too.
    569 TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
    570 
    571 TODO: I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
     446\section{Linked List}
     447
     448
     449\section{String}
  • doc/theses/mike_brooks_MMath/intro.tex

    rdf78cce r486caad  
    11\chapter{Introduction}
     2
     3All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
     4Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
    25
    36\cite{Blache19}
     
    58\cite{Ruef19}
    69
    7 \section{Arrays}
     10\section{Array}
    811
    9 \section{Strings}
     12Array provides a homogeneous container with $O(1)$ access to elements using subscripting.
     13The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
     14For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
     15
     16
     17\section{Linked List}
     18
     19Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
     20Subscripting by value is sometimes available, \eg hash table.
     21Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
     22If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
     23otherwise, elements are heap allocated and explicitly/implicitly managed.
     24
     25
     26\section{String}
     27
     28String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
     29What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements.
     30Nevertheless, subscripting is often available.
     31The cost of string operations is less important than the power of the block operation to accomplish complex manipulation.
     32The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
     33
     34
     35\section{Motivation}
     36
     37The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{CFA}, which is based on C.
     38Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
     39Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
     40
     41
     42\subsection{C?}
     43
     44Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}.
     45However, most programming languages are only partially explained by standard's manuals.
     46When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
     47Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
     48While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
     49These example programs show
     50\begin{itemize}
     51        \item the compiler accepts or rejects certain syntax,
     52        \item prints output to buttress a claim of behaviour,
     53        \item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
     54\end{itemize}
     55This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
     56Any discovered anomalies among compilers or versions is discussed.
     57In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
     58
     59
     60\subsection{Ill-Typed Expressions}
     61
     62C reports many ill-typed expressions as warnings.
     63For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
     64\lstinput{12-15}{bkgd-c-tyerr.c}
     65with warnings:
     66\begin{cfa}
     67warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
     68warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
     69\end{cfa}
     70Similarly,
     71\lstinput{17-19}{bkgd-c-tyerr.c}
     72with warning:
     73\begin{cfa}
     74warning: passing argument 1 of 'f' from incompatible pointer type
     75note: expected 'void (*)(void)' but argument is of type 'float *'
     76\end{cfa}
     77with a segmentation fault at runtime.
     78Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness.
     79Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings.
     80In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
     81Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
     82
     83% That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
     84% Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
     85
     86% A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
     87% The behaviour (whose absence is unprovable) is neither minor nor unlikely.
     88% The rejection shows that the program is ill-typed.
     89%
     90% Yet, the rejection presents as a GCC warning.
     91% *1  TAPL-pg1 definition of a type system
     92
    1093
    1194\section{Contributions}
     95
     96\subsection{Linked List}
     97
     98\subsection{Array}
     99
     100\subsection{String}
  • doc/theses/mike_brooks_MMath/uw-ethesis.tex

    rdf78cce r486caad  
    218218\input{intro}
    219219\input{background}
     220\input{array}
    220221\input{list}
    221 \input{array}
    222222\input{string}
    223223\input{conclusion}
Note: See TracChangeset for help on using the changeset viewer.