Context Navigation

← Previous Change
Next Change →

Changeset 486caad for doc/theses/mike_brooks_MMath

Timestamp:

Mar 25, 2024, 7:15:30 PM (2 years ago)

Author:

JiadaL <j82liang@…>

Branches:

master, stuck-waitfor-destruct

Children:

Parents:

df78cce (diff), bf050c5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

doc/theses/mike_brooks_MMath

Files:

: 4 edited

array.tex (modified) (1 diff)
background.tex (modified) (2 diffs)
intro.tex (modified) (2 diffs)
uw-ethesis.tex (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/array.tex

-              rdf78cce
+              r486caad
 \subsection{Retire pointer arithmetic}
+\section{\CFA}
+XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \\
+moved from background chapter \\
+XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \\
+Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it.
+(For later:  That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.)
+\subsection{\CFA features interacting with arrays}
+Prior work on \CFA included making C arrays, as used in C code from the wild,
+work, if this code is fed into @cfacc@.
+The quality of this this treatment was fine, with no more or fewer bugs than is typical.
+More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features.
+A notable success was with the \CFA @alloc@ function,
+which type information associated with a polymorphic return type
+replaces @malloc@'s use of programmer-supplied size information.
+\begin{cfa}
+// C, library
+void * malloc( size_t );
+// C, user
+struct tm * el1 = malloc(      sizeof(struct tm) );
+struct tm * ar1 = malloc( 10 * sizeof(struct tm) );
+// CFA, library
+forall( T * ) T * alloc();
+// CFA, user
+tm * el2 = alloc();
+tm (*ar2)[10] = alloc();
+\end{cfa}
+The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument.
+This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type.
+Using a compiler-produced value eliminates an opportunity for user error.
+TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
+Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
+In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@.
+They are not subscripted in the same way.
+\begin{cfa}
+ar1[5];
+(*ar2)[5];
+\end{cfa}
+Using ``reference to array'' works at resolving this issue.  TODO: discuss connection with Doug-Lea \CC proposal.
+\begin{cfa}
+tm (&ar3)[10] = *alloc();
+ar3[5];
+\end{cfa}
+The implicit size communication to @alloc@ still works in the same ways as for @ar2@.
+Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
+TODO xref C standard does not claim that @ar1@ may be subscripted,
+because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
+But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
+where the type requested is an array, making the result, much more obviously, an array object.
+The ``reference to array'' type has its sore spots too.
+TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
+TODO: I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?

doc/theses/mike_brooks_MMath/background.tex

-              rdf78cce
+              r486caad
 \chapter{Background}
+This chapter states facts about the prior work, upon which my contributions build.
+Each receives a justification of the extent to which its statement is phrased to provoke controversy or surprise.
+\section{C}
+\subsection{Common knowledge}
+The reader is assumed to have used C or \CC for the coursework of at least four university-level courses, or have equivalent experience.
+The current discussion introduces facts, unaware of which, such a functioning novice may be operating.
+% TODO: decide if I'm also claiming this collection of facts, and test-oriented presentation is a contribution; if so, deal with (not) arguing for its originality
+\subsection{Convention: C is more touchable than its standard}
+When it comes to explaining how C works, I like illustrating definite program semantics.
+I prefer doing so, over a quoting manual's suggested programmer's intuition, or showing how some compiler writers chose to model their problem.
+To illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and I show its behaviour.
+This behaviour is typically one of
+\begin{itemize}
+        \item my statement that the compiler accepts or rejects the program
+        \item the program's printed output, which I show
+        \item my implied assurance that its assertions do not fail when run
+\end{itemize}
+The compiler whose program semantics is shown is
+\begin{cfa}
+$ gcc --version
+gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
+\end{cfa}
+running on Architecture @x86_64@, with the same environment targeted.
+Unless explicit discussion ensues about differences among compilers or with (versions of) the standard, it is further implied that there exists a second version of GCC and some version of Clang, running on and for the same platform, that give substantially similar behaviour.
+In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
+\subsection{C reports many ill-typed expressions as warnings}
+These attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
+\lstinput{12-15}{bkgd-c-tyerr.c}
+with warnings:
+\begin{cfa}
+warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
+warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
+\end{cfa}
+Similarly,
+\lstinput{17-19}{bkgd-c-tyerr.c}
+with warning:
+\begin{cfa}
+warning: passing argument 1 of 'f' from incompatible pointer type
+note: expected 'void (*)(void)' but argument is of type 'float *'
+\end{cfa}
+with a segmentation fault at runtime.
+That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
+Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
+A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
+The behaviour (whose absence is unprovable) is neither minor nor unlikely.
+The rejection shows that the program is ill-typed.
+Yet, the rejection presents as a GCC warning.
+In the discussion following, ``ill-typed'' means giving a nonzero @gcc -Werror@ exit condition with a message that discusses typing.
+*1  TAPL-pg1 definition of a type system
+\section{C Arrays}
+\subsection{C has an array type (!)}
+Since this work builds on C, it is necessary to explain the C mechanisms and their shortcomings for array, linked list, and string,
+\section{Array}
 When a programmer works with an array, C semantics provide access to a type that is different in every way from ``pointer to its first element.''
 …
+\section{\CFA}
+Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it.
+(For later:  That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.)
+\subsection{\CFA features interacting with arrays}
+Prior work on \CFA included making C arrays, as used in C code from the wild,
+work, if this code is fed into @cfacc@.
+The quality of this this treatment was fine, with no more or fewer bugs than is typical.
+More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features.
+A notable success was with the \CFA @alloc@ function,
+which type information associated with a polymorphic return type
+replaces @malloc@'s use of programmer-supplied size information.
+\begin{cfa}
+// C, library
+void * malloc( size_t );
+// C, user
+struct tm * el1 = malloc(      sizeof(struct tm) );
+struct tm * ar1 = malloc( 10 * sizeof(struct tm) );
+// CFA, library
+forall( T * ) T * alloc();
+// CFA, user
+tm * el2 = alloc();
+tm (*ar2)[10] = alloc();
+\end{cfa}
+The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument.
+This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type.
+Using a compiler-produced value eliminates an opportunity for user error.
+TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
+Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
+In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@.
+They are not subscripted in the same way.
+\begin{cfa}
+ar1[5];
+(*ar2)[5];
+\end{cfa}
+Using ``reference to array'' works at resolving this issue.  TODO: discuss connection with Doug-Lea \CC proposal.
+\begin{cfa}
+tm (&ar3)[10] = *alloc();
+ar3[5];
+\end{cfa}
+The implicit size communication to @alloc@ still works in the same ways as for @ar2@.
+Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
+TODO xref C standard does not claim that @ar1@ may be subscripted,
+because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
+But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
+where the type requested is an array, making the result, much more obviously, an array object.
+The ``reference to array'' type has its sore spots too.
+TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
+TODO: I fixed a bug associated with using an array as a T.  I think.  Did I really?  What was the bug?
+\section{Linked List}
+\section{String}

doc/theses/mike_brooks_MMath/intro.tex

-              rdf78cce
+              r486caad
 \chapter{Introduction}
+All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
+Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
 \cite{Blache19}
 …
 \cite{Ruef19}
 \section{Arrays}
+\section{Array}
+\section{Strings}
+Array provides a homogeneous container with $O(1)$ access to elements using subscripting.
+The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
+For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
+\section{Linked List}
+Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
+Subscripting by value is sometimes available, \eg hash table.
+Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
+If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
+otherwise, elements are heap allocated and explicitly/implicitly managed.
+\section{String}
+String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
+What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements.
+Nevertheless, subscripting is often available.
+The cost of string operations is less important than the power of the block operation to accomplish complex manipulation.
+The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
+\section{Motivation}
+The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{CFA}, which is based on C.
+Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
+Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
+\subsection{C?}
+Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}.
+However, most programming languages are only partially explained by standard's manuals.
+When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
+Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
+While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
+These example programs show
+\begin{itemize}
+        \item the compiler accepts or rejects certain syntax,
+        \item prints output to buttress a claim of behaviour,
+        \item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
+\end{itemize}
+This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
+Any discovered anomalies among compilers or versions is discussed.
+In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
+\subsection{Ill-Typed Expressions}
+C reports many ill-typed expressions as warnings.
+For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
+\lstinput{12-15}{bkgd-c-tyerr.c}
+with warnings:
+\begin{cfa}
+warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
+warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
+\end{cfa}
+Similarly,
+\lstinput{17-19}{bkgd-c-tyerr.c}
+with warning:
+\begin{cfa}
+warning: passing argument 1 of 'f' from incompatible pointer type
+note: expected 'void (*)(void)' but argument is of type 'float *'
+\end{cfa}
+with a segmentation fault at runtime.
+Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness.
+Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings.
+In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
+Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
+% That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
+% Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
+% A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
+% The behaviour (whose absence is unprovable) is neither minor nor unlikely.
+% The rejection shows that the program is ill-typed.
+%
+% Yet, the rejection presents as a GCC warning.
+% *1  TAPL-pg1 definition of a type system
 \section{Contributions}
+\subsection{Linked List}
+\subsection{Array}
+\subsection{String}

doc/theses/mike_brooks_MMath/uw-ethesis.tex

-              rdf78cce
+              r486caad
 \input{intro}
 \input{background}
+\input{array}
 \input{list}
-\input{array}
 \input{string}
 \input{conclusion}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: