source: doc/theses/andrew_beach_MMath/existing.tex @ 4bb5d36

ADTast-experimentalpthread-emulationqualifiedEnum
Last change on this file since 4bb5d36 was 6aa84e0, checked in by Andrew Beach <ajbeach@…>, 3 years ago

Andrew MMath: Removed (updated one) the remaining \todo items.

  • Property mode set to 100644
File size: 14.8 KB
RevLine 
[382edbe]1\chapter{\CFA{} Existing Features}
[553f8abe]2\label{c:existing}
[f28fdee]3
[382edbe]4\CFA is an open-source project extending ISO C with
[6c79bef]5modern safety and productivity features, while still ensuring backwards
6compatibility with C and its programmers.  \CFA is designed to have an
7orthogonal feature-set based closely on the C programming paradigm
[9cdfa5fb]8(non-object-oriented), and these features can be added incrementally to an
9existing C code-base,
10allowing programmers to learn \CFA on an as-needed basis.
[6c79bef]11
[382edbe]12Only those \CFA features pertaining to this thesis are discussed.
[be497c6]13A familiarity with
[382edbe]14C or C-like languages is assumed.
[6c79bef]15
[9af0fe2d]16\section{Overloading and \lstinline{extern}}
[6c79bef]17\CFA has extensive overloading, allowing multiple definitions of the same name
[67c6a47]18to be defined~\cite{Moss18}.
[f42a6b8]19\begin{cfa}
20char i; int i; double i;
21int f(); double f();
22void g( int ); void g( double );
23\end{cfa}
[6c79bef]24This feature requires name mangling so the assembly symbols are unique for
25different overloads. For compatibility with names in C, there is also a syntax
26to disable name mangling. These unmangled names cannot be overloaded but act as
27the interface between C and \CFA code.  The syntax for disabling/enabling
28mangling is:
[f28fdee]29\begin{cfa}
[edc6ea2]30// name mangling on by default
[6c79bef]31int i; // _X1ii_1
[21f2e92]32extern "C" {  // disables name mangling
[6c79bef]33        int j; // j
[21f2e92]34        extern "Cforall" {  // enables name mangling
[6c79bef]35                int k; // _X1ki_1
36        }
[edc6ea2]37        // revert to no name mangling
[6e7b969]38}
[edc6ea2]39// revert to name mangling
[6c79bef]40\end{cfa}
41Both forms of @extern@ affect all the declarations within their nested lexical
42scope and transition back to the previous mangling state when the lexical scope
43ends.
44
45\section{Reference Type}
[03c0e44]46\CFA adds a reference type to C as an auto-dereferencing pointer.
47They work very similarly to pointers.
[9cdfa5fb]48Reference-types are written the same way as pointer-types, but each
[03c0e44]49asterisk (@*@) is replaced with a ampersand (@&@);
[9cdfa5fb]50this includes cv-qualifiers (\snake{const} and \snake{volatile})
51and multiple levels of reference.
[03c0e44]52
[9cdfa5fb]53Generally, references act like pointers with an implicit dereferencing
[382edbe]54operation added to each use of the variable.
55These automatic dereferences may be disabled with the address-of operator
56(@&@).
[21f2e92]57
[382edbe]58% Check to see if these are generating errors.
59\begin{minipage}{0,5\textwidth}
[03c0e44]60With references:
61\begin{cfa}
62int i, j;
63int & ri = i;
64int && rri = ri;
65rri = 3;
[f42a6b8]66&ri = &j;
[03c0e44]67ri = 5;
68\end{cfa}
69\end{minipage}
[382edbe]70\begin{minipage}{0,5\textwidth}
[03c0e44]71With pointers:
[6c79bef]72\begin{cfa}
73int i, j;
[03c0e44]74int * pi = &i
75int ** ppi = &pi;
76**ppi = 3;
[21f2e92]77pi = &j;
[03c0e44]78*pi = 5;
[f28fdee]79\end{cfa}
[03c0e44]80\end{minipage}
[6e7b969]81
[be497c6]82References are intended to be used when the indirection of a pointer is
83required, but the address is not as important as the value and dereferencing
84is the common usage.
[382edbe]85Mutable references may be assigned to by converting them to a pointer
[be497c6]86with a @&@ and then assigning a pointer to them, as in @&ri = &j;@ above.
[6e7b969]87
[382edbe]88\section{Operators}
[6c79bef]89
[be497c6]90\CFA implements operator overloading by providing special names, where
91operator expressions are translated into function calls using these names.
92An operator name is created by taking the operator symbols and joining them with
[1e567ab]93@?@s to show where the arguments go.
[382edbe]94For example,
[be497c6]95infixed multiplication is @?*?@, while prefix dereference is @*?@.
[9cdfa5fb]96This syntax makes it easy to tell the difference between prefix operations
97(such as @++?@) and postfix operations (@?++@).
[f42a6b8]98
[be497c6]99As an example, here are the addition and equality operators for a point type.
[382edbe]100\begin{cfa}
[6071efc]101point ?+?(point a, point b) { return point{a.x + b.x, a.y + b.y}; }
[be497c6]102int ?==?(point a, point b) { return a.x == b.x && a.y == b.y; }
[382edbe]103{
104        assert(point{1, 2} + point{3, 4} == point{4, 6});
105}
106\end{cfa}
[9cdfa5fb]107Note that this syntax works effectively as a textual transformation;
[be497c6]108the compiler converts all operators into functions and then resolves them
109normally. This means any combination of types may be used,
110although nonsensical ones (like @double ?==?(point, int);@) are discouraged.
111This feature is also used for all builtin operators as well,
112although those are implicitly provided by the language.
[382edbe]113
114%\subsection{Constructors and Destructors}
[be497c6]115In \CFA, constructors and destructors are operators, which means they are
[9cdfa5fb]116functions with special operator names, rather than type names as in \Cpp.
[be497c6]117Both constructors and destructors can be implicity called by the compiler,
118however the operator names allow explicit calls.
[f42a6b8]119% Placement new means that this is actually equivant to C++.
[382edbe]120
[21f2e92]121The special name for a constructor is @?{}@, which comes from the
[382edbe]122initialization syntax in C, \eg @Example e = { ... }@.
[be497c6]123\CFA generates a constructor call each time a variable is declared,
124passing the initialization arguments to the constructor.
[edc6ea2]125\begin{cfa}
[21f2e92]126struct Example { ... };
127void ?{}(Example & this) { ... }
[f42a6b8]128{
129        Example a;
130        Example b = {};
131}
[21f2e92]132void ?{}(Example & this, char first, int num) { ... }
[f42a6b8]133{
134        Example c = {'a', 2};
135}
[edc6ea2]136\end{cfa}
[f42a6b8]137Both @a@ and @b@ will be initalized with the first constructor,
[be497c6]138@b@ because of the explicit call and @a@ implicitly.
139@c@ will be initalized with the second constructor.
[9cdfa5fb]140Currently, there is no general way to skip initialization.
[be497c6]141% I don't use @= anywhere in the thesis.
[f42a6b8]142
[6c79bef]143% I don't like the \^{} symbol but $^\wedge$ isn't better.
[be497c6]144Similarly, destructors use the special name @^?{}@ (the @^@ has no special
[382edbe]145meaning).
[6c79bef]146\begin{cfa}
[21f2e92]147void ^?{}(Example & this) { ... }
[6c79bef]148{
[f42a6b8]149        Example d;
[be497c6]150        ^?{}(d);
151
152        Example e;
153} // Implicit call of ^?{}(e);
[6c79bef]154\end{cfa}
[edc6ea2]155
[be497c6]156Whenever a type is defined, \CFA creates a default zero-argument
[edc6ea2]157constructor, a copy constructor, a series of argument-per-field constructors
158and a destructor. All user constructors are defined after this.
[6e7b969]159
160\section{Polymorphism}
[6c79bef]161\CFA uses parametric polymorphism to create functions and types that are
162defined over multiple types. \CFA polymorphic declarations serve the same role
[29c9b23]163as \Cpp templates or Java generics. The ``parametric'' means the polymorphism is
[6c79bef]164accomplished by passing argument operations to associate \emph{parameters} at
165the call site, and these parameters are used in the function to differentiate
166among the types the function operates on.
167
168Polymorphic declarations start with a universal @forall@ clause that goes
169before the standard (monomorphic) declaration. These declarations have the same
170syntax except they may use the universal type names introduced by the @forall@
171clause.  For example, the following is a polymorphic identity function that
172works on any type @T@:
173\begin{cfa}
[edc6ea2]174forall( T ) T identity( T val ) { return val; }
175int forty_two = identity( 42 );
176char capital_a = identity( 'A' );
[6c79bef]177\end{cfa}
[382edbe]178Each use of a polymorphic declaration resolves its polymorphic parameters
[edc6ea2]179(in this case, just @T@) to concrete types (@int@ in the first use and @char@
180in the second).
[6e7b969]181
[6c79bef]182To allow a polymorphic function to be separately compiled, the type @T@ must be
183constrained by the operations used on @T@ in the function body. The @forall@
[382edbe]184clause is augmented with a list of polymorphic variables (local type names)
[6c79bef]185and assertions (constraints), which represent the required operations on those
186types used in a function, \eg:
[f28fdee]187\begin{cfa}
[382edbe]188forall( T | { void do_once(T); } )
[6c79bef]189void do_twice(T value) {
190        do_once(value);
191        do_once(value);
[6e7b969]192}
[f28fdee]193\end{cfa}
[6c79bef]194
195A polymorphic function can be used in the same way as a normal function.  The
196polymorphic variables are filled in with concrete types and the assertions are
197checked. An assertion is checked by verifying each assertion operation (with
198all the variables replaced with the concrete types from the arguments) is
199defined at a call site.
[edc6ea2]200\begin{cfa}
201void do_once(int i) { ... }
202int i;
203do_twice(i);
204\end{cfa}
[9cdfa5fb]205Any value with a type fulfilling the assertion may be passed as an argument to
[edc6ea2]206a @do_twice@ call.
[6c79bef]207
208Note, a function named @do_once@ is not required in the scope of @do_twice@ to
[29c9b23]209compile it, unlike \Cpp template expansion. Furthermore, call-site inferencing
[be497c6]210allows local replacement of the specific parametric functions needs for a
[6c79bef]211call.
[f28fdee]212\begin{cfa}
[edc6ea2]213void do_once(double y) { ... }
[6e7b969]214int quadruple(int x) {
[382edbe]215        void do_once(int & y) { y = y * 2; }
[21f2e92]216        do_twice(x);
[6c79bef]217        return x;
[6e7b969]218}
[f28fdee]219\end{cfa}
[6c79bef]220Specifically, the complier deduces that @do_twice@'s T is an integer from the
[21f2e92]221argument @x@. It then looks for the most specific definition matching the
[6c79bef]222assertion, which is the nested integral @do_once@ defined within the
223function. The matched assertion function is then passed as a function pointer
[21f2e92]224to @do_twice@ and called within it.
[9cdfa5fb]225The global definition of @do_once@ is ignored, however if @quadruple@ took a
[be497c6]226@double@ argument, then the global definition would be used instead as it
[6cf21ed8]227would then be a better match.\cite{Moss19}
[6c79bef]228
[be497c6]229To avoid typing long lists of assertions, constraints can be collected into
[9cdfa5fb]230a convenient package called a @trait@, which can then be used in an assertion
[6c79bef]231instead of the individual constraints.
[f28fdee]232\begin{cfa}
[6c79bef]233trait done_once(T) {
234        void do_once(T);
[6e7b969]235}
[f28fdee]236\end{cfa}
[6c79bef]237and the @forall@ list in the previous example is replaced with the trait.
[f28fdee]238\begin{cfa}
[edc6ea2]239forall(dtype T | done_once(T))
[f28fdee]240\end{cfa}
[6c79bef]241In general, a trait can contain an arbitrary number of assertions, both
242functions and variables, and are usually used to create a shorthand for, and
243give descriptive names to, common groupings of assertions describing a certain
[9cdfa5fb]244functionality, like @summable@, @listable@, \etc.
[6c79bef]245
[be497c6]246Polymorphic structures and unions are defined by qualifying an aggregate type
[6c79bef]247with @forall@. The type variables work the same except they are used in field
[9cdfa5fb]248declarations instead of parameters, returns and local variable declarations.
[f28fdee]249\begin{cfa}
[edc6ea2]250forall(dtype T)
[6e7b969]251struct node {
[9b0bb79]252        node(T) * next;
[edc6ea2]253        T * data;
[be497c6]254};
[edc6ea2]255node(int) inode;
[f28fdee]256\end{cfa}
[edc6ea2]257The generic type @node(T)@ is an example of a polymorphic type usage.  Like \Cpp
258template usage, a polymorphic type usage must specify a type parameter.
[6e7b969]259
[6c79bef]260There are many other polymorphism features in \CFA but these are the ones used
261by the exception system.
[6e7b969]262
[67c6a47]263\section{Control Flow}
[9cdfa5fb]264\CFA has a number of advanced control-flow features: @generator@, @coroutine@,
265@monitor@, @mutex@ parameters, and @thread@.
[67c6a47]266The two features that interact with
267the exception system are @coroutine@ and @thread@; they and their supporting
[6c79bef]268constructs are described here.
269
270\subsection{Coroutine}
271A coroutine is a type with associated functions, where the functions are not
[9cdfa5fb]272required to finish execution when control is handed back to the caller.
273Instead,
[6c79bef]274they may suspend execution at any time and be resumed later at the point of
[9cdfa5fb]275last suspension.
276Coroutine
[6c79bef]277types are not concurrent but share some similarities along with common
[9cdfa5fb]278underpinnings, so they are combined with the \CFA threading library.
279% I had mention of generators, but they don't actually matter here.
[6c79bef]280
281In \CFA, a coroutine is created using the @coroutine@ keyword, which is an
282aggregate type like @struct,@ except the structure is implicitly modified by
283the compiler to satisfy the @is_coroutine@ trait; hence, a coroutine is
284restricted by the type system to types that provide this special trait.  The
285coroutine structure acts as the interface between callers and the coroutine,
286and its fields are used to pass information in and out of coroutine interface
287functions.
288
289Here is a simple example where a single field is used to pass (communicate) the
290next number in a sequence.
[f28fdee]291\begin{cfa}
[6e7b969]292coroutine CountUp {
[9b0bb79]293        unsigned int next;
[be497c6]294};
[6c79bef]295CountUp countup;
[f28fdee]296\end{cfa}
[67c6a47]297Each coroutine has a @main@ function, which takes a reference to a coroutine
[6c79bef]298object and returns @void@.
[382edbe]299%[numbers=left] Why numbers on this one?
[f42a6b8]300\begin{cfa}
[edc6ea2]301void main(CountUp & this) {
[f42a6b8]302        for (unsigned int next = 0 ; true ; ++next) {
[be497c6]303                this.next = next;
[edc6ea2]304                suspend;$\label{suspend}$
[6c79bef]305        }
[6e7b969]306}
[f28fdee]307\end{cfa}
[6c79bef]308In this function, or functions called by this function (helper functions), the
[f42a6b8]309@suspend@ statement is used to return execution to the coroutine's caller
310without terminating the coroutine's function.
[6c79bef]311
312A coroutine is resumed by calling the @resume@ function, \eg @resume(countup)@.
313The first resume calls the @main@ function at the top. Thereafter, resume calls
314continue a coroutine in the last suspended function after the @suspend@
[be497c6]315statement. In this case there is only one and, hence, the difference between
316subsequent calls is the state of variables inside the function and the
317coroutine object.
318The return value of @resume@ is a reference to the coroutine, to make it
319convent to access fields of the coroutine in the same expression.
320Here is a simple example in a helper function:
321\begin{cfa}
322unsigned int get_next(CountUp & this) {
323        return resume(this).next;
324}
325\end{cfa}
326
[9cdfa5fb]327When the main function returns, the coroutine halts and can no longer be
[be497c6]328resumed.
[6e7b969]329
[67c6a47]330\subsection{Monitor and Mutex Parameter}
[9cdfa5fb]331Concurrency does not guarantee ordering; without ordering, results are
[6c79bef]332non-deterministic. To claw back ordering, \CFA uses monitors and @mutex@
333(mutual exclusion) parameters. A monitor is another kind of aggregate, where
334the compiler implicitly inserts a lock and instances are compatible with
335@mutex@ parameters.
336
[9cdfa5fb]337A function that requires deterministic (ordered) execution acquires mutual
[6c79bef]338exclusion on a monitor object by qualifying an object reference parameter with
[9cdfa5fb]339the @mutex@ qualifier.
[f42a6b8]340\begin{cfa}
341void example(MonitorA & mutex argA, MonitorB & mutex argB);
342\end{cfa}
[6c79bef]343When the function is called, it implicitly acquires the monitor lock for all of
344the mutex parameters without deadlock.  This semantics means all functions with
345the same mutex type(s) are part of a critical section for objects of that type
346and only one runs at a time.
[6e7b969]347
[67c6a47]348\subsection{Thread}
[9cdfa5fb]349Functions, generators and coroutines are sequential, so there is only a single
[6c79bef]350(but potentially sophisticated) execution path in a program. Threads introduce
351multiple execution paths that continue independently.
[6e7b969]352
[6c79bef]353For threads to work safely with objects requires mutual exclusion using
[9cdfa5fb]354monitors and mutex parameters. For threads to work safely with other threads
[6c79bef]355also requires mutual exclusion in the form of a communication rendezvous, which
[67c6a47]356also supports internal synchronization as for mutex objects. For exceptions,
357only two basic thread operations are important: fork and join.
[6e7b969]358
[6c79bef]359Threads are created like coroutines with an associated @main@ function:
[f28fdee]360\begin{cfa}
[6e7b969]361thread StringWorker {
[6c79bef]362        const char * input;
363        int result;
[6e7b969]364};
365void main(StringWorker & this) {
[6c79bef]366        const char * localCopy = this.input;
367        // ... do some work, perhaps hashing the string ...
368        this.result = result;
[6e7b969]369}
[6c79bef]370{
371        StringWorker stringworker; // fork thread running in "main"
[be497c6]372} // Implicit call to join(stringworker), waits for completion.
[f28fdee]373\end{cfa}
[6c79bef]374The thread main is where a new thread starts execution after a fork operation
375and then the thread continues executing until it is finished. If another thread
376joins with an executing thread, it waits until the executing main completes
377execution. In other words, everything a thread does is between a fork and join.
378
379From the outside, this behaviour is accomplished through creation and
380destruction of a thread object.  Implicitly, fork happens after a thread
381object's constructor is run and join happens before the destructor runs. Join
382can also be specified explicitly using the @join@ function to wait for a
383thread's completion independently from its deallocation (\ie destructor
384call). If @join@ is called explicitly, the destructor does not implicitly join.
Note: See TracBrowser for help on using the repository browser.