source: doc/theses/andrew_beach_MMath/intro.tex @ 5d3d281

Last change on this file since 5d3d281 was 86bd8538, checked in by Andrew Beach <ajbeach@…>, 3 years ago

Andrew MMath: Hopefully the last updates from the readers.

  • Property mode set to 100644
File size: 17.3 KB
Line 
1\chapter{Introduction}
2
3% The highest level overview of Cforall and EHMs. Get this done right away.
4This thesis covers the design and implementation of the exception handling
5mechanism (EHM) of
6\CFA (pronounced sea-for-all and may be written Cforall or CFA).
7\CFA is a new programming language that extends C, which maintains
8backwards-compatibility while introducing modern programming features.
9Adding exception handling to \CFA gives it new ways to handle errors and
10make large control-flow jumps.
11
12% Now take a step back and explain what exceptions are generally.
13Exception handling provides dynamic inter-function control flow.
14A language's EHM is a combination of language syntax and run-time
15components that construct, raise, propagate and handle exceptions,
16to provide all of that control flow.
17There are two forms of exception handling covered in this thesis:
18termination, which acts as a multi-level return,
19and resumption, which is a dynamic function call.
20% About other works:
21Often, when this separation is not made, termination exceptions are assumed
22as they are more common and may be the only form of handling provided in
23a language.
24
25All types of exception handling link a raise with a handler.
26Both operations are usually language primitives, although raises can be
27treated as a function that takes an exception argument.
28Handlers are more complex, as they are added to and removed from the stack
29during execution, must specify what they can handle and must give the code to
30handle the exception.
31
32Exceptions work with different execution models but for the descriptions
33that follow a simple call stack, with functions added and removed in a
34first-in-last-out order, is assumed.
35
36Termination exception handling searches the stack for the handler, then
37unwinds the stack to where the handler was found before calling it.
38The handler is run inside the function that defined it and when it finishes
39it returns control to that function.
40\begin{center}
41%\input{termination}
42%
43%\medskip
44\input{termhandle.pstex_t}
45% I hate these diagrams, but I can't access xfig to fix them and they are
46% better than the alternative.
47\end{center}
48
49Resumption exception handling searches the stack for a handler and then calls
50it without removing any other stack frames.
51The handler is run on top of the existing stack, often as a new function or
52closure capturing the context in which the handler was defined.
53After the handler has finished running, it returns control to the function
54that preformed the raise, usually starting after the raise.
55\begin{center}
56%\input{resumption}
57%
58%\medskip
59\input{resumhandle.pstex_t}
60% The other one.
61\end{center}
62
63Although a powerful feature, exception handling tends to be complex to set up
64and expensive to use,
65so it is often limited to unusual or ``exceptional" cases.
66The classic example is error handling; exceptions can be used to
67remove error handling logic from the main execution path, and pay
68most of the cost only when the error actually occurs.
69
70\section{Thesis Overview}
71This work describes the design and implementation of the \CFA EHM.
72The \CFA EHM implements all of the common exception features (or an
73equivalent) found in most other EHMs and adds some features of its own.
74The design of all the features had to be adapted to \CFA's feature set, as
75some of the underlying tools used to implement and express exception handling
76in other languages are absent in \CFA.
77Still, the resulting syntax resembles that of other languages:
78\begin{cfa}
79try {
80        ...
81        T * object = malloc(request_size);
82        if (!object) {
83                throw OutOfMemory{fixed_allocation, request_size};
84        }
85        ...
86} catch (OutOfMemory * error) {
87        ...
88}
89\end{cfa}
90% A note that yes, that was a very fast overview.
91The design and implementation of all of \CFA's EHM's features are
92described in detail throughout this thesis, whether they are a common feature
93or one unique to \CFA.
94
95% The current state of the project and what it contributes.
96All of these features have been implemented in \CFA,
97covering both changes to the compiler and the run-time.
98In addition, a suite of test cases and performance benchmarks were created
99alongside the implementation.
100The implementation techniques are generally applicable in other programming
101languages and much of the design is as well.
102Some parts of the EHM use other features unique to \CFA and would be
103harder to replicate in other programming languages.
104
105The contributions of this work are:
106\begin{enumerate}
107\item Designing \CFA's exception handling mechanism, adapting designs from
108other programming languages and creating new features.
109\item Implementing stack unwinding and the \CFA EHM, including updating
110the \CFA compiler and the run-time environment.
111\item Designing and implementing a prototype virtual system.
112% I think the virtual system and per-call site default handlers are the only
113% "new" features, everything else is a matter of implementation.
114\item Creating tests to check the behaviour of the EHM.
115\item Creating benchmarks to check the performance of the EHM,
116as compared to other languages.
117\end{enumerate}
118
119The rest of this thesis is organized as follows.
120The current state of exceptions is covered in \autoref{s:background}.
121The existing state of \CFA is covered in \autoref{c:existing}.
122New EHM features are introduced in \autoref{c:features},
123covering their usage and design.
124That is followed by the implementation of these features in
125\autoref{c:implement}.
126Performance results are examined in \autoref{c:performance}.
127Possibilities to extend this project are discussed in \autoref{c:future}.
128Finally, the project is summarized in \autoref{c:conclusion}.
129
130\section{Background}
131\label{s:background}
132
133Exception handling has been examined before in programming languages,
134with papers on the subject dating back 70s.\cite{Goodenough75}
135Early exceptions were often treated as signals, which carried no information
136except their identity.
137Ada originally used this system\cite{Ada}, but now allows for a string
138message as a payload\cite{Ada12}.
139
140The modern flagship for termination exceptions -- if one exists -- is \Cpp,
141which added them in its first major wave of non-object-orientated features
142in 1990.\cite{CppHistory}
143Many EHMs have special exception types,
144however \Cpp has the ability to use any type as an exception.
145These were found to be not very useful and have been pushed aside for classes
146inheriting from
147\code{C++}{std::exception}.
148Although there is a special catch-all syntax (@catch(...)@), there are no
149operations that can be performed on the caught value, not even type inspection.
150Instead, the base exception-type \code{C++}{std::exception} defines common
151functionality (such as
152the ability to describe the reason the exception was raised) and all
153exceptions have this functionality.
154That trade-off, restricting usable types to gain guaranteed functionality,
155is almost universal now, as without some common functionality it is almost
156impossible to actually handle any errors.
157
158Java was the next popular language to use exceptions.\cite{Java8}
159Its exception system largely reflects that of \Cpp, except that it requires
160you throw a child type of \code{Java}{java.lang.Throwable}
161and it uses checked exceptions.
162Checked exceptions are part of a function's interface,
163the exception signature of the function.
164Every exception that could be raised from a function, either directly or
165because it is not handled from a called function, is given.
166Using this information, it is possible to statically verify if any given
167exception is handled, and guarantee that no exception will go unhandled.
168Making exception information explicit improves clarity and safety,
169but can slow down or restrict programming.
170For example, programming high-order functions becomes much more complex
171if the argument functions could raise exceptions.
172However, as odd it may seem, the worst problems are rooted in the simple
173inconvenience of writing and updating exception signatures.
174This has caused Java programmers to develop multiple programming ``hacks''
175to circumvent checked exceptions, negating their advantages.
176One particularly problematic example is the ``catch-and-ignore'' pattern,
177where an empty handler is used to handle an exception without doing any
178recovery or repair. In theory that could be good enough to properly handle
179the exception, but more often is used to ignore an exception that the       
180programmer does not feel is worth the effort of handling, for instance if
181they do not believe it will ever be raised.
182If they are incorrect, the exception will be silenced, while in a similar
183situation with unchecked exceptions the exception would at least activate   
184the language's unhandled exception code (usually, a program abort with an
185error message).
186
187%\subsection
188Resumption exceptions are less popular,
189although resumption is as old as termination; that is, few
190programming languages have implemented them.
191% http://bitsavers.informatik.uni-stuttgart.de/pdf/xerox/parc/techReports/
192%   CSL-79-3_Mesa_Language_Manual_Version_5.0.pdf
193Mesa is one programming language that did.\cite{Mesa} Experience with Mesa
194is quoted as being one of the reasons resumptions were not
195included in the \Cpp standard.
196% https://en.wikipedia.org/wiki/Exception_handling
197Since then, resumptions have been ignored in mainstream programming languages.
198However, resumption is being revisited in the context of decades of other
199developments in programming languages.
200While rejecting resumption may have been the right decision in the past,
201the situation has changed since then.
202Some developments, such as the functional programming equivalent to resumptions,
203algebraic effects\cite{Zhang19}, are enjoying success.
204A complete reexamination of resumption is beyond this thesis,
205but their reemergence is enough reason to try them in \CFA.
206% Especially considering how much easier they are to implement than
207% termination exceptions and how much Peter likes them.
208
209%\subsection
210Functional languages tend to use other solutions for their primary error
211handling mechanism, but exception-like constructs still appear.
212Termination appears in the error construct, which marks the result of an
213expression as an error; then the result of any expression that tries to use
214it also results in an error, and so on until an appropriate handler is reached.
215Resumption appears in algebraic effects, where a function dispatches its
216side-effects to its caller for handling.
217
218%\subsection
219More recently, exceptions seem to be vanishing from newer programming
220languages, replaced by ``panic".
221In Rust, a panic is just a program level abort that may be implemented by
222unwinding the stack like in termination exception
223handling.\cite{RustPanicMacro}\cite{RustPanicModule}
224Go's panic though is very similar to a termination, except it only supports
225a catch-all by calling \code{Go}{recover()}, simplifying the interface at
226the cost of flexibility.\cite{Go:2021}
227
228%\subsection
229As exception handling's most common use cases are in error handling,
230here are some other ways to handle errors with comparisons with exceptions.
231\begin{itemize}
232\item\emph{Error Codes}:
233This pattern has a function return an enumeration (or just a set of fixed
234values) to indicate if an error has occurred and possibly which error it was.
235
236Error codes mix exceptional/error and normal values, enlarging the range of
237possible return values. This can be addressed with multiple return values
238(or a tuple) or a tagged union.
239However, the main issue with error codes is forgetting to check them,
240which leads to an error being quietly and implicitly ignored.
241Some new languages and tools will try to issue warnings when an error code
242is discarded to avoid this problem.
243Checking error codes also bloats the main execution path,
244especially if the error is not handled immediately and has to be passed
245through multiple functions before it is addressed.
246
247Here is an example of the pattern in Bash, where commands can only  ``return"
248numbers and most output is done through streams of text.
249\begin{lstlisting}[language=bash,escapechar={}]
250# Immediately after running a command:
251case $? in
2520)
253        # Success
254        ;;
2551)
256        # Error Code 1
257        ;;
2582|3)
259        # Error Code 2 or Error Code 3
260        ;;
261# Add more cases as needed.
262asac
263\end{lstlisting}
264
265\item\emph{Special Return with Global Store}:
266Similar to the error codes pattern but the function itself only returns
267that there was an error,
268and stores the reason for the error in a fixed global location.
269For example, many routines in the C standard library will only return some
270error value (such as -1 or a null pointer) and the error code is written into
271the standard variable @errno@.
272
273This approach avoids the multiple results issue encountered with straight
274error codes as only a single error value has to be returned,
275but otherwise has the same disadvantages and more.
276Every function that reads or writes to the global store must agree on all
277possible errors and managing it becomes more complex with concurrency.
278
279This example shows some of what has to be done to robustly handle a C
280standard library function that reports errors this way.
281\begin{lstlisting}[language=C]
282// Now a library function can set the error.
283int handle = open(path_name, flags);
284if (-1 == handle) {
285        switch (errno) {
286    case ENAMETOOLONG:
287                // path_name is a bad argument.
288                break;
289        case ENFILE:
290                // A system resource has been exausted.
291                break;
292        // And many more...
293    }
294}
295\end{lstlisting}
296% cite open man page?
297
298\item\emph{Return Union}:
299This pattern replaces error codes with a tagged union.
300Success is one tag and the errors are another.
301It is also possible to make each possible error its own tag and carry its own
302additional information, but the two-branch format is easy to make generic
303so that one type can be used everywhere in error handling code.
304
305This pattern is very popular in any functional or semi-functional language
306with primitive support for tagged unions (or algebraic data types).
307Return unions can also be expressed as monads (evaluation in a context)
308and often are in languages with special syntax for monadic evaluation,
309such as Haskell's \code{haskell}{do} blocks.
310
311The main advantage is that an arbitrary object can be used to represent an
312error, so it can include a lot more information than a simple error code.
313The disadvantages include that the it does have to be checked along the main
314execution, and if there aren't primitive tagged unions proper, usage can be
315hard to enforce.
316% We need listing Rust/rust to format code snippets from it.
317% Rust's \code{rust}{Result<T, E>}
318
319This is a simple example of examining the result of a failing function in
320Haskell, using its \code{haskell}{Either} type.
321Examining \code{haskell}{error} further would likely involve more matching,
322but the type of \code{haskell}{error} is user defined so there are no
323general cases.
324\begin{lstlisting}[language=haskell]
325case failingFunction argA argB of
326    Right value -> -- Use the successful computed value.
327    Left error -> -- Handle the produced error.
328\end{lstlisting}
329
330Return unions as monads will result in the same code, but can hide most
331of the work to propagate errors in simple cases. The code to actually handle
332the errors, or to interact with other monads (a common case in these
333languages) still has to be written by hand.
334
335If \code{haskell}{failingFunction} is implemented with two helpers that
336use the same error type, then it can be implemented with a \code{haskell}{do}
337block.
338\begin{lstlisting}[language=haskell,literate={}]
339failingFunction x y = do
340        z <- helperOne x
341        helperTwo y z
342\end{lstlisting}
343
344\item\emph{Handler Functions}:
345This pattern associates errors with functions.
346On error, the function that produced the error calls another function to
347handle it.
348The handler function can be provided locally (passed in as an argument,
349either directly as as a field of a structure/object) or globally (a global
350variable).
351C++ uses this approach as its fallback system if exception handling fails,
352such as \snake{std::terminate} and, for a time,
353\snake{std::unexpected}.\footnote{\snake{std::unexpected} was part of the
354Dynamic Exception Specification, which has been removed from the standard
355as of C++20.\cite{CppExceptSpec}}
356
357Handler functions work a lot like resumption exceptions,
358but without the dynamic search for a handler.
359Since setting up the handler can be more complex/expensive,
360especially when the handler has to be passed through multiple layers of
361function calls, but cheaper (constant time) to call,
362they are more suited to more frequent (less exceptional) situations.
363Although, in \Cpp and other languages that do not have checked exceptions,
364they can actually be enforced by the type system be more reliable.
365
366This is a more local example in \Cpp, using a function to provide
367a default value for a mapping.
368\begin{lstlisting}[language=C++]
369ValueT Map::key_or_default(KeyT key, ValueT(*make_default)(KeyT)) {
370        ValueT * value = find_value(key);
371        if (nullptr != value) {
372                return *value;
373        } else {
374                return make_default(key);
375        }
376}
377\end{lstlisting}
378\end{itemize}
379
380%\subsection
381Because of their cost, exceptions are rarely used for hot paths of execution.
382Hence, there is an element of self-fulfilling prophecy as implementation
383techniques have been focused on making them cheap to set up,
384happily making them expensive to use in exchange.
385This difference is less important in higher-level scripting languages,
386where using exceptions for other tasks is more common.
387An iconic example is Python's
388\code{Python}{StopIteration}\cite{PythonExceptions} exception, that
389is thrown by an iterator to indicate that it is exhausted.
390When paired with Python's iterator-based for-loop, this will be thrown every
391time the end of the loop is reached.\cite{PythonForLoop}
Note: See TracBrowser for help on using the repository browser.