source: doc/theses/andrew_beach_MMath/intro.tex @ 814f87d

ADTast-experimentalenumforall-pointer-decaypthread-emulationqualifiedEnum
Last change on this file since 814f87d was 814f87d, checked in by Andrew Beach <ajbeach@…>, 3 years ago

Andrew MMath: Updated thesis with Yizhou Zhang's feedback.

  • Property mode set to 100644
File size: 15.1 KB
RevLine 
[e8a7ca2]1\chapter{Introduction}
2
[471ff17]3% The highest level overview of Cforall and EHMs. Get this done right away.
[cb6b8cb]4This thesis covers the design and implementation of the exception handling
[21f2e92]5mechanism (EHM) of
[6071efc]6\CFA (pronounced sea-for-all and may be written Cforall or CFA).
[cb6b8cb]7\CFA is a new programming language that extends C, which maintains
[6071efc]8backwards-compatibility while introducing modern programming features.
9Adding exception handling to \CFA gives it new ways to handle errors and
[cb6b8cb]10make large control-flow jumps.
[471ff17]11
12% Now take a step back and explain what exceptions are generally.
[21f2e92]13Exception handling provides dynamic inter-function control flow.
[cb6b8cb]14A language's EHM is a combination of language syntax and run-time
15components that construct, raise, propagate and handle exceptions,
16to provide all of that control flow.
[553f8abe]17There are two forms of exception handling covered in this thesis:
18termination, which acts as a multi-level return,
19and resumption, which is a dynamic function call.
[cb6b8cb]20% About other works:
21Often, when this separation is not made, termination exceptions are assumed
22as they are more common and may be the only form of handling provided in
23a language.
[553f8abe]24
[cb6b8cb]25All types of exception handling link a raise with a handler.
26Both operations are usually language primitives, although raises can be
[9cdfa5fb]27treated as a function that takes an exception argument.
28Handlers are more complex, as they are added to and removed from the stack
29during execution, must specify what they can handle and must give the code to
[cb6b8cb]30handle the exception.
31
32Exceptions work with different execution models but for the descriptions
33that follow a simple call stack, with functions added and removed in a
34first-in-last-out order, is assumed.
35
36Termination exception handling searches the stack for the handler, then
37unwinds the stack to where the handler was found before calling it.
38The handler is run inside the function that defined it and when it finishes
39it returns control to that function.
[e46ea00]40\begin{center}
[75f8e04]41\input{termination}
[432bffe]42
43\medskip
44\input{termhandle.pstex_t}
[e46ea00]45\end{center}
[6aa84e0]46\todo*{Can I make the new diagrams fit the old style?}
[e46ea00]47
[cb6b8cb]48Resumption exception handling searches the stack for a handler and then calls
49it without removing any other stack frames.
50The handler is run on top of the existing stack, often as a new function or
51closure capturing the context in which the handler was defined.
[9cdfa5fb]52After the handler has finished running, it returns control to the function
[cb6b8cb]53that preformed the raise, usually starting after the raise.
54\begin{center}
55\input{resumption}
[432bffe]56
57\medskip
58\input{resumhandle.pstex_t}
[cb6b8cb]59\end{center}
[e46ea00]60
[553f8abe]61Although a powerful feature, exception handling tends to be complex to set up
[9cdfa5fb]62and expensive to use,
[cb6b8cb]63so it is often limited to unusual or ``exceptional" cases.
[9cdfa5fb]64The classic example is error handling; exceptions can be used to
[cb6b8cb]65remove error handling logic from the main execution path, and pay
[553f8abe]66most of the cost only when the error actually occurs.
[e8a7ca2]67
[471ff17]68\section{Thesis Overview}
[21f2e92]69This work describes the design and implementation of the \CFA EHM.
[553f8abe]70The \CFA EHM implements all of the common exception features (or an
[e8a7ca2]71equivalent) found in most other EHMs and adds some features of its own.
[9cdfa5fb]72The design of all the features had to be adapted to \CFA's feature set, as
[e8a7ca2]73some of the underlying tools used to implement and express exception handling
74in other languages are absent in \CFA.
[9cdfa5fb]75Still, the resulting syntax resembles that of other languages:
[f42a6b8]76\begin{cfa}
77try {
[e8a7ca2]78        ...
79        T * object = malloc(request_size);
80        if (!object) {
[f42a6b8]81                throw OutOfMemory{fixed_allocation, request_size};
[e8a7ca2]82        }
83        ...
[f42a6b8]84} catch (OutOfMemory * error) {
[e8a7ca2]85        ...
86}
[f42a6b8]87\end{cfa}
[e8a7ca2]88% A note that yes, that was a very fast overview.
[471ff17]89The design and implementation of all of \CFA's EHM's features are
[553f8abe]90described in detail throughout this thesis, whether they are a common feature
[e8a7ca2]91or one unique to \CFA.
92
93% The current state of the project and what it contributes.
[cb6b8cb]94All of these features have been implemented in \CFA,
95covering both changes to the compiler and the run-time.
96In addition, a suite of test cases and performance benchmarks were created
[9cdfa5fb]97alongside the implementation.
[f42a6b8]98The implementation techniques are generally applicable in other programming
[553f8abe]99languages and much of the design is as well.
[cb6b8cb]100Some parts of the EHM use other features unique to \CFA and would be
[f42a6b8]101harder to replicate in other programming languages.
102
[e46ea00]103The contributions of this work are:
104\begin{enumerate}
[553f8abe]105\item Designing \CFA's exception handling mechanism, adapting designs from
[cb6b8cb]106other programming languages and creating new features.
107\item Implementing stack unwinding and the \CFA EHM, including updating
108the \CFA compiler and the run-time environment.
[9cdfa5fb]109\item Designing and implementing a prototype virtual system.
[553f8abe]110% I think the virtual system and per-call site default handlers are the only
111% "new" features, everything else is a matter of implementation.
[cb6b8cb]112\item Creating tests to check the behaviour of the EHM.
[9cdfa5fb]113\item Creating benchmarks to check the performance of the EHM,
[cb6b8cb]114as compared to other languages.
[e46ea00]115\end{enumerate}
116
[cb6b8cb]117The rest of this thesis is organized as follows.
118The current state of exceptions is covered in \autoref{s:background}.
[9cdfa5fb]119The existing state of \CFA is covered in \autoref{c:existing}.
[cb6b8cb]120New EHM features are introduced in \autoref{c:features},
121covering their usage and design.
122That is followed by the implementation of these features in
[553f8abe]123\autoref{c:implement}.
[cb6b8cb]124Performance results are examined in \autoref{c:performance}.
[f42a6b8]125Possibilities to extend this project are discussed in \autoref{c:future}.
[cb6b8cb]126Finally, the project is summarized in \autoref{c:conclusion}.
[471ff17]127
128\section{Background}
129\label{s:background}
130
[cb6b8cb]131Exception handling has been examined before in programming languages,
132with papers on the subject dating back 70s.\cite{Goodenough75}
133Early exceptions were often treated as signals, which carried no information
[6cf21ed8]134except their identity.
135Ada originally used this system\cite{Ada}, but now allows for a string
136message as a payload\cite{Ada12}.
[fcaa1e4]137
138The modern flag-ship for termination exceptions is \Cpp,
[471ff17]139which added them in its first major wave of non-object-orientated features
[6cf21ed8]140in 1990.\cite{CppHistory}
[cb6b8cb]141Many EHMs have special exception types,
142however \Cpp has the ability to use any type as an exception.
143These were found to be not very useful and have been pushed aside for classes
144inheriting from
[fcaa1e4]145\code{C++}{std::exception}.
[9cdfa5fb]146Although there is a special catch-all syntax (@catch(...)@), there are no
[cb6b8cb]147operations that can be performed on the caught value, not even type inspection.
[9cdfa5fb]148Instead, the base exception-type \code{C++}{std::exception} defines common
[cb6b8cb]149functionality (such as
[f42a6b8]150the ability to describe the reason the exception was raised) and all
[cb6b8cb]151exceptions have this functionality.
152That trade-off, restricting usable types to gain guaranteed functionality,
153is almost universal now, as without some common functionality it is almost
154impossible to actually handle any errors.
[fcaa1e4]155
[6cf21ed8]156Java was the next popular language to use exceptions.\cite{Java8}
[9cdfa5fb]157Its exception system largely reflects that of \Cpp, except that it requires
[f42a6b8]158you throw a child type of \code{Java}{java.lang.Throwable}
[fcaa1e4]159and it uses checked exceptions.
[cb6b8cb]160Checked exceptions are part of a function's interface,
161the exception signature of the function.
[9cdfa5fb]162Every exception that could be raised from a function, either directly or
[cb6b8cb]163because it is not handled from a called function, is given.
164Using this information, it is possible to statically verify if any given
[9cdfa5fb]165exception is handled, and guarantee that no exception will go unhandled.
[cb6b8cb]166Making exception information explicit improves clarity and safety,
167but can slow down or restrict programming.
168For example, programming high-order functions becomes much more complex
169if the argument functions could raise exceptions.
170However, as odd it may seem, the worst problems are rooted in the simple
171inconvenience of writing and updating exception signatures.
172This has caused Java programmers to develop multiple programming ``hacks''
173to circumvent checked exceptions, negating their advantages.
174One particularly problematic example is the ``catch-and-ignore'' pattern,
175where an empty handler is used to handle an exception without doing any
176recovery or repair. In theory that could be good enough to properly handle
177the exception, but more often is used to ignore an exception that the       
[9cdfa5fb]178programmer does not feel is worth the effort of handling, for instance if
[cb6b8cb]179they do not believe it will ever be raised.
[9cdfa5fb]180If they are incorrect, the exception will be silenced, while in a similar
[cb6b8cb]181situation with unchecked exceptions the exception would at least activate   
[9cdfa5fb]182the language's unhandled exception code (usually, a program abort with an
[cb6b8cb]183error message).
[471ff17]184
185%\subsection
[cb6b8cb]186Resumption exceptions are less popular,
[9cdfa5fb]187although resumption is as old as termination; that is, few
[fcaa1e4]188programming languages have implemented them.
[471ff17]189% http://bitsavers.informatik.uni-stuttgart.de/pdf/xerox/parc/techReports/
190%   CSL-79-3_Mesa_Language_Manual_Version_5.0.pdf
[6cf21ed8]191Mesa is one programming language that did.\cite{Mesa} Experience with Mesa
[f42a6b8]192is quoted as being one of the reasons resumptions were not
[471ff17]193included in the \Cpp standard.
194% https://en.wikipedia.org/wiki/Exception_handling
[9cdfa5fb]195Since then, resumptions have been ignored in main-stream programming languages.
[cb6b8cb]196However, resumption is being revisited in the context of decades of other
197developments in programming languages.
198While rejecting resumption may have been the right decision in the past,
199the situation has changed since then.
[9cdfa5fb]200Some developments, such as the functional programming equivalent to resumptions,
[cb6b8cb]201algebraic effects\cite{Zhang19}, are enjoying success.
[9cdfa5fb]202A complete reexamination of resumption is beyond this thesis,
203but their reemergence is enough reason to try them in \CFA.
[fcaa1e4]204% Especially considering how much easier they are to implement than
[cb6b8cb]205% termination exceptions and how much Peter likes them.
[471ff17]206
207%\subsection
[f42a6b8]208Functional languages tend to use other solutions for their primary error
[cb6b8cb]209handling mechanism, but exception-like constructs still appear.
210Termination appears in the error construct, which marks the result of an
211expression as an error; then the result of any expression that tries to use
212it also results in an error, and so on until an appropriate handler is reached.
213Resumption appears in algebraic effects, where a function dispatches its
[fcaa1e4]214side-effects to its caller for handling.
[471ff17]215
216%\subsection
[9cdfa5fb]217More recently exceptions, seem to be vanishing from newer programming
[f42a6b8]218languages, replaced by ``panic".
[cb6b8cb]219In Rust, a panic is just a program level abort that may be implemented by
[6cf21ed8]220unwinding the stack like in termination exception
221handling.\cite{RustPanicMacro}\cite{RustPanicModule}
[814f87d]222Go's panic though is very similar to a termination, except it only supports
[fcaa1e4]223a catch-all by calling \code{Go}{recover()}, simplifying the interface at
[6cf21ed8]224the cost of flexibility.\cite{Go:2021}
[471ff17]225
226%\subsection
[9cdfa5fb]227As exception handling's most common use cases are in error handling,
[cb6b8cb]228here are some other ways to handle errors with comparisons with exceptions.
[471ff17]229\begin{itemize}
230\item\emph{Error Codes}:
[cb6b8cb]231This pattern has a function return an enumeration (or just a set of fixed
232values) to indicate if an error has occurred and possibly which error it was.
233
234Error codes mix exceptional/error and normal values, enlarging the range of
235possible return values. This can be addressed with multiple return values
236(or a tuple) or a tagged union.
237However, the main issue with error codes is forgetting to check them,
238which leads to an error being quietly and implicitly ignored.
239Some new languages and tools will try to issue warnings when an error code
240is discarded to avoid this problem.
241Checking error codes also bloats the main execution path,
[9cdfa5fb]242especially if the error is not handled immediately and has to be passed
[cb6b8cb]243through multiple functions before it is addressed.
[471ff17]244
245\item\emph{Special Return with Global Store}:
[cb6b8cb]246Similar to the error codes pattern but the function itself only returns
[9cdfa5fb]247that there was an error,
248and stores the reason for the error in a fixed global location.
249For example, many routines in the C standard library will only return some
[cb6b8cb]250error value (such as -1 or a null pointer) and the error code is written into
251the standard variable @errno@.
[471ff17]252
[cb6b8cb]253This approach avoids the multiple results issue encountered with straight
254error codes but otherwise has the same disadvantages and more.
255Every function that reads or writes to the global store must agree on all
256possible errors and managing it becomes more complex with concurrency.
[471ff17]257
258\item\emph{Return Union}:
[cb6b8cb]259This pattern replaces error codes with a tagged union.
[471ff17]260Success is one tag and the errors are another.
261It is also possible to make each possible error its own tag and carry its own
[9cdfa5fb]262additional information, but the two-branch format is easy to make generic
[471ff17]263so that one type can be used everywhere in error handling code.
264
[cb6b8cb]265This pattern is very popular in any functional or semi-functional language
266with primitive support for tagged unions (or algebraic data types).
[814f87d]267Return unions can also be expressed as monads (evaluation in a context)
268and often are in languages with special syntax for monadic evaluation,
269such as Haskell's \code{haskell}{do} blocks.
270
[cb6b8cb]271The main advantage is that an arbitrary object can be used to represent an
[9cdfa5fb]272error, so it can include a lot more information than a simple error code.
[cb6b8cb]273The disadvantages include that the it does have to be checked along the main
[9cdfa5fb]274execution, and if there aren't primitive tagged unions proper, usage can be
[cb6b8cb]275hard to enforce.
[814f87d]276% We need listing Rust/rust to format code snippets from it.
277% Rust's \code{rust}{Result<T, E>}
278
279Return unions as monads will result in the same code, but can hide most
280of the work to propagate errors in simple cases. The code to actually handle
281the errors, or to interact with other monads (a common case in these
282languages) still has to be written by hand.
[471ff17]283
284\item\emph{Handler Functions}:
[cb6b8cb]285This pattern associates errors with functions.
286On error, the function that produced the error calls another function to
[471ff17]287handle it.
288The handler function can be provided locally (passed in as an argument,
289either directly as as a field of a structure/object) or globally (a global
290variable).
[cb6b8cb]291C++ uses this approach as its fallback system if exception handling fails,
[9cdfa5fb]292such as \snake{std::terminate} and, for a time,
293\snake{std::unexpected}.\footnote{\snake{std::unexpected} was part of the
294Dynamic Exception Specification, which has been removed from the standard
295as of C++20.\cite{CppExceptSpec}}
[f42a6b8]296
[cb6b8cb]297Handler functions work a lot like resumption exceptions,
298but without the dynamic search for a handler.
299Since setting up the handler can be more complex/expensive,
300especially when the handler has to be passed through multiple layers of
301function calls, but cheaper (constant time) to call,
302they are more suited to more frequent (less exceptional) situations.
[471ff17]303\end{itemize}
304
305%\subsection
[cb6b8cb]306Because of their cost, exceptions are rarely used for hot paths of execution.
307Hence, there is an element of self-fulfilling prophecy as implementation
308techniques have been focused on making them cheap to set-up,
309happily making them expensive to use in exchange.
310This difference is less important in higher-level scripting languages,
[9cdfa5fb]311where using exceptions for other tasks is more common.
[6cf21ed8]312An iconic example is Python's
[9cdfa5fb]313\code{Python}{StopIteration}\cite{PythonExceptions} exception, that
[cb6b8cb]314is thrown by an iterator to indicate that it is exhausted.
[9cdfa5fb]315When paired with Python's iterator-based for-loop, this will be thrown every
[6cf21ed8]316time the end of the loop is reached.\cite{PythonForLoop}
Note: See TracBrowser for help on using the repository browser.