source: doc/theses/andrew_beach_MMath/existing.tex @ 4969efd

ADTast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 4969efd was 4ed7946e, checked in by Peter A. Buhr <pabuhr@…>, 3 years ago

proofread Andrew's thesis chapters

  • Property mode set to 100644
File size: 14.3 KB
Line 
1\chapter{\CFA Existing Features}
2\label{c:existing}
3
4\CFA is an open-source project extending ISO C with
5modern safety and productivity features, while still ensuring backwards
6compatibility with C and its programmers.  \CFA is designed to have an
7orthogonal feature-set based closely on the C programming paradigm
8(non-object-oriented) and these features can be added incrementally to an
9existing C code-base allowing programmers to learn \CFA on an as-needed basis.
10
11Only those \CFA features pertaining to this thesis are discussed.  Many of the
12\CFA syntactic and semantic features used in the thesis should be fairly
13obvious to the reader.
14
15\section{Overloading and \lstinline{extern}}
16\CFA has extensive overloading, allowing multiple definitions of the same name
17to be defined~\cite{Moss18}.
18\begin{cfa}
19char i; int i; double i;
20int f(); double f();
21void g( int ); void g( double );
22\end{cfa}
23This feature requires name mangling so the assembly symbols are unique for
24different overloads. For compatibility with names in C, there is also a syntax
25to disable name mangling. These unmangled names cannot be overloaded but act as
26the interface between C and \CFA code.  The syntax for disabling/enabling
27mangling is:
28\begin{cfa}
29// name mangling on by default
30int i; // _X1ii_1
31@extern "C"@ {  // disables name mangling
32        int j; // j
33        @extern "Cforall"@ {  // enables name mangling
34                int k; // _X1ki_1
35        }
36        // revert to no name mangling
37}
38// revert to name mangling
39\end{cfa}
40Both forms of @extern@ affect all the declarations within their nested lexical
41scope and transition back to the previous mangling state when the lexical scope
42ends.
43
44\section{Reference Type}
45\CFA adds a reference type to C as an auto-dereferencing pointer.
46They work very similarly to pointers.
47Reference-types are written the same way as a pointer-type but each
48asterisk (@*@) is replaced with a ampersand (@&@);
49this includes cv-qualifiers and multiple levels of reference, \eg:
50
51\begin{minipage}{0,5\textwidth}
52With references:
53\begin{cfa}
54int i, j;
55int & ri = i;
56int && rri = ri;
57rri = 3;
58&ri = &j; // reference assignment
59ri = 5;
60\end{cfa}
61\end{minipage}
62\begin{minipage}{0,5\textwidth}
63With pointers:
64\begin{cfa}
65int i, j;
66int * pi = &i
67int ** ppi = &pi;
68**ppi = 3;
69pi = &j; // pointer assignment
70*pi = 5;
71\end{cfa}
72\end{minipage}
73
74References are intended for cases where you would want to use pointers but would
75be dereferencing them (almost) every usage.
76In most cases a reference can just be thought of as a pointer that
77automatically puts a dereference in front of each of its uses (per-level of
78reference).
79The address-of operator (@&@) acts as an escape and removes one of the
80automatic dereference operations.
81Mutable references may be assigned by converting them to a pointer
82with a @&@ and then assigning a pointer to them, as in @&ri = &j;@ above.
83
84\section{Operators}
85
86In general, operator names in \CFA are constructed by bracketing an operator
87token with @?@, which indicates the position of the arguments. For example,
88infixed multiplication is @?*?@ while prefix dereference is @*?@.
89This syntax make it easy to tell the difference between prefix operations
90(such as @++?@) and post-fix operations (@?++@).
91
92An operator name may describe any function signature (it is just a name) but
93only certain signatures may be called in operator form.
94\begin{cfa}
95int ?+?( int i, int j, int k ) { return i + j + k; }
96{
97        sout | ?+?( 3, 4, 5 ); // no infix form
98}
99\end{cfa}
100Some ``near-misses" for unary/binary operator prototypes generate warnings.
101
102Both constructors and destructors are operators, which means they are
103functions with special operator names rather than type names in \Cpp. The
104special operator names may be used to call the functions explicitly (not
105allowed in \Cpp for constructors).
106
107The special name for a constructor is @?{}@, where the name @{}@ comes from the
108initialization syntax in C, \eg @Structure s = {...}@.
109% That initialization syntax is also the operator form.
110\CFA generates a constructor call each time a variable is declared,
111passing the initialization arguments to the constructor.
112\begin{cfa}
113struct Structure { ... };
114void ?{}(Structure & this) { ... }
115{
116        Structure a;
117        Structure b = {};
118}
119void ?{}(Structure & this, char first, int num) { ... }
120{
121        Structure c = {'a', 2};
122}
123\end{cfa}
124Both @a@ and @b@ are initialized with the first constructor,
125while @c@ is initialized with the second.
126Currently, there is no general way to skip initialization.
127
128% I don't like the \^{} symbol but $^\wedge$ isn't better.
129Similarly, destructors use the special name @^?{}@ (the @^@ has no special
130meaning).  Normally, they are implicitly called on a variable when it goes out
131of scope but they can be called explicitly as well.
132\begin{cfa}
133void ^?{}(Structure & this) { ... }
134{
135        Structure d;
136} // <- implicit destructor call
137\end{cfa}
138
139Whenever a type is defined, \CFA creates a default zero-argument
140constructor, a copy constructor, a series of argument-per-field constructors
141and a destructor. All user constructors are defined after this.
142Because operators are never part of the type definition they may be added
143at any time, including on built-in types.
144
145\section{Polymorphism}
146\CFA uses parametric polymorphism to create functions and types that are
147defined over multiple types. \CFA polymorphic declarations serve the same role
148as \Cpp templates or Java generics. The ``parametric'' means the polymorphism is
149accomplished by passing argument operations to associate \emph{parameters} at
150the call site, and these parameters are used in the function to differentiate
151among the types the function operates on.
152
153Polymorphic declarations start with a universal @forall@ clause that goes
154before the standard (monomorphic) declaration. These declarations have the same
155syntax except they may use the universal type names introduced by the @forall@
156clause.  For example, the following is a polymorphic identity function that
157works on any type @T@:
158\begin{cfa}
159forall( T ) T identity( T val ) { return val; }
160int forty_two = identity( 42 );
161char capital_a = identity( 'A' );
162\end{cfa}
163Each use of a polymorphic declaration resolves its polymorphic parameters
164(in this case, just @T@) to concrete types (@int@ in the first use and @char@
165in the second).
166
167To allow a polymorphic function to be separately compiled, the type @T@ must be
168constrained by the operations used on @T@ in the function body. The @forall@
169clause is augmented with a list of polymorphic variables (local type names)
170and assertions (constraints), which represent the required operations on those
171types used in a function, \eg:
172\begin{cfa}
173forall( T | { void do_once(T); } )
174void do_twice(T value) {
175        do_once(value);
176        do_once(value);
177}
178\end{cfa}
179
180A polymorphic function can be used in the same way as a normal function.  The
181polymorphic variables are filled in with concrete types and the assertions are
182checked. An assertion is checked by verifying each assertion operation (with
183all the variables replaced with the concrete types from the arguments) is
184defined at a call site.
185\begin{cfa}
186void do_once(int i) { ... }
187int i;
188do_twice(i);
189\end{cfa}
190Any object with a type fulfilling the assertion may be passed as an argument to
191a @do_twice@ call.
192
193Note, a function named @do_once@ is not required in the scope of @do_twice@ to
194compile it, unlike \Cpp template expansion. Furthermore, call-site inferencing
195allows local replacement of the most specific parametric functions needs for a
196call.
197\begin{cfa}
198void do_once(double y) { ... }
199int quadruple(int x) {
200        void do_once(int y) { y = y * 2; } // replace global do_once
201        do_twice(x); // use local do_once
202        do_twice(x + 1.5); // use global do_once
203        return x;
204}
205\end{cfa}
206Specifically, the complier deduces that @do_twice@'s T is an integer from the
207argument @x@. It then looks for the most \emph{specific} definition matching the
208assertion, which is the nested integral @do_once@ defined within the
209function. The matched assertion function is then passed as a function pointer
210to @do_twice@ and called within it.  The global definition of @do_once@ is used
211for the second call because the float-point argument is a better match.
212
213To avoid typing long lists of assertions, constraints can be collect into
214convenient packages called a @trait@, which can then be used in an assertion
215instead of the individual constraints.
216\begin{cfa}
217trait done_once(T) {
218        void do_once(T);
219}
220\end{cfa}
221and the @forall@ list in the previous example is replaced with the trait.
222\begin{cfa}
223forall(dtype T | done_once(T))
224\end{cfa}
225In general, a trait can contain an arbitrary number of assertions, both
226functions and variables, and are usually used to create a shorthand for, and
227give descriptive names to, common groupings of assertions describing a certain
228functionality, like @sumable@, @listable@, \etc.
229
230Polymorphic structures and unions are defined by qualifying the aggregate type
231with @forall@. The type variables work the same except they are used in field
232declarations instead of parameters, returns, and local variable declarations.
233\begin{cfa}
234forall(dtype T)
235struct node {
236        node(T) * next;
237        T * data;
238}
239node(int) inode;
240\end{cfa}
241The generic type @node(T)@ is an example of a polymorphic type usage.  Like \Cpp
242template usage, a polymorphic type usage must specify a type parameter.
243
244There are many other polymorphism features in \CFA but these are the ones used
245by the exception system.
246
247\section{Control Flow}
248\CFA has a number of advanced control-flow features: @generator@, @coroutine@, @monitor@, @mutex@ parameters, and @thread@.
249The two features that interact with
250the exception system are @coroutine@ and @thread@; they and their supporting
251constructs are described here.
252
253\subsection{Coroutine}
254A coroutine is a type with associated functions, where the functions are not
255required to finish execution when control is handed back to the caller. Instead
256they may suspend execution at any time and be resumed later at the point of
257last suspension. (Generators are stackless and coroutines are stackful.) These
258types are not concurrent but share some similarities along with common
259underpinnings, so they are combined with the \CFA threading library. Further
260discussion in this section only refers to the coroutine because generators are
261similar.
262
263In \CFA, a coroutine is created using the @coroutine@ keyword, which is an
264aggregate type like @struct,@ except the structure is implicitly modified by
265the compiler to satisfy the @is_coroutine@ trait; hence, a coroutine is
266restricted by the type system to types that provide this special trait.  The
267coroutine structure acts as the interface between callers and the coroutine,
268and its fields are used to pass information in and out of coroutine interface
269functions.
270
271Here is a simple example where a single field is used to pass (communicate) the
272next number in a sequence.
273\begin{cfa}
274coroutine CountUp {
275        unsigned int next;
276}
277CountUp countup;
278\end{cfa}
279Each coroutine has a @main@ function, which takes a reference to a coroutine
280object and returns @void@.
281\begin{cfa}[numbers=left]
282void main(CountUp & this) {
283        for (unsigned int next = 0 ; true ; ++next) {
284                next = up;
285                suspend;$\label{suspend}$
286        }
287}
288\end{cfa}
289In this function, or functions called by this function (helper functions), the
290@suspend@ statement is used to return execution to the coroutine's caller
291without terminating the coroutine's function.
292
293A coroutine is resumed by calling the @resume@ function, \eg @resume(countup)@.
294The first resume calls the @main@ function at the top. Thereafter, resume calls
295continue a coroutine in the last suspended function after the @suspend@
296statement, in this case @main@ line~\ref{suspend}.  The @resume@ function takes
297a reference to the coroutine structure and returns the same reference. The
298return value allows easy access to communication variables defined in the
299coroutine object. For example, the @next@ value for coroutine object @countup@
300is both generated and collected in the single expression:
301@resume(countup).next@.
302
303\subsection{Monitor and Mutex Parameter}
304Concurrency does not guarantee ordering; without ordering results are
305non-deterministic. To claw back ordering, \CFA uses monitors and @mutex@
306(mutual exclusion) parameters. A monitor is another kind of aggregate, where
307the compiler implicitly inserts a lock and instances are compatible with
308@mutex@ parameters.
309
310A function that requires deterministic (ordered) execution, acquires mutual
311exclusion on a monitor object by qualifying an object reference parameter with
312@mutex@.
313\begin{cfa}
314void example(MonitorA & mutex argA, MonitorB & mutex argB);
315\end{cfa}
316When the function is called, it implicitly acquires the monitor lock for all of
317the mutex parameters without deadlock.  This semantics means all functions with
318the same mutex type(s) are part of a critical section for objects of that type
319and only one runs at a time.
320
321\subsection{Thread}
322Functions, generators, and coroutines are sequential so there is only a single
323(but potentially sophisticated) execution path in a program. Threads introduce
324multiple execution paths that continue independently.
325
326For threads to work safely with objects requires mutual exclusion using
327monitors and mutex parameters. For threads to work safely with other threads,
328also requires mutual exclusion in the form of a communication rendezvous, which
329also supports internal synchronization as for mutex objects. For exceptions,
330only two basic thread operations are important: fork and join.
331
332Threads are created like coroutines with an associated @main@ function:
333\begin{cfa}
334thread StringWorker {
335        const char * input;
336        int result;
337};
338void main(StringWorker & this) {
339        const char * localCopy = this.input;
340        // ... do some work, perhaps hashing the string ...
341        this.result = result;
342}
343{
344        StringWorker stringworker; // fork thread running in "main"
345} // <- implicitly join with thread / wait for completion
346\end{cfa}
347The thread main is where a new thread starts execution after a fork operation
348and then the thread continues executing until it is finished. If another thread
349joins with an executing thread, it waits until the executing main completes
350execution. In other words, everything a thread does is between a fork and join.
351
352From the outside, this behaviour is accomplished through creation and
353destruction of a thread object.  Implicitly, fork happens after a thread
354object's constructor is run and join happens before the destructor runs. Join
355can also be specified explicitly using the @join@ function to wait for a
356thread's completion independently from its deallocation (\ie destructor
357call). If @join@ is called explicitly, the destructor does not implicitly join.
Note: See TracBrowser for help on using the repository browser.