# Changeset 6c79bef for doc/theses/andrew_beach_MMath/existing.tex

Ignore:
Timestamp:
Jan 22, 2021, 12:02:59 PM (9 months ago)
Branches:
arm-eh, jacob/cs343-translation, master, new-ast-unique-expr
Children:
4706098
Parents:
62c0f8a
Message:

 r62c0f8a \chapter{\CFA Existing Features} \CFA (C-for-all)~\cite{Cforall} is an open-source project extending ISO C with modern safety and productivity features, while still ensuring backwards compatibility with C and its programmers. \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm (non-object-oriented) and these features can be added incrementally to an existing C code-base allowing programmers to learn \CFA on an as-needed basis. \section{Overloading and extern} Cforall has overloading, allowing multiple definitions of the same name to be defined.~\cite{Moss18} This also adds name mangling so that the assembly symbols are unique for different overloads. For compatability with names in C there is also a syntax to diable the name mangling. These unmangled names cannot be overloaded but act as the interface between C and \CFA code. The syntax for disabling mangling is: \begin{cfa} extern "C" { ... } \end{cfa} To re-enable mangling once it is disabled the syntax is: \begin{cfa} extern "Cforall" { ... } \end{cfa} Both should occur at the declaration level and effect all the declarations in @...@. Neither care about the state of mangling when they begin and will return to that state after the group is finished. So re-enabling is only used to nest areas of mangled and unmangled declarations. \section{References} \CFA adds references to C. These are auto-dereferencing pointers and use the same syntax as pointers except they use ampersand (@&@) instead of the asterisk (@*@). They can also be constaint or mutable, if they are mutable they may be assigned to by using the address-of operator (@&@) which converts them into a pointer. \CFA (C-for-all)~\cite{Cforall} is an open-source project extending ISO C with modern safety and productivity features, while still ensuring backwards compatibility with C and its programmers.  \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm (non-object-oriented) and these features can be added incrementally to an existing C code-base allowing programmers to learn \CFA on an as-needed basis. Only those \CFA features pertinent to this thesis are discussed.  Many of the \CFA syntactic and semantic features used in the thesis should be fairly obvious to the reader. \section{Overloading and \lstinline|extern|} \CFA has extensive overloading, allowing multiple definitions of the same name to be defined.~\cite{Moss18} \begin{cfa} char i; int i; double i;                        $\C[3.75in]{// variable overload}$ int f(); double f();                            $\C{// return overload}$ void g( int ); void g( double );        $\C{// parameter overload}\CRT$ \end{cfa} This feature requires name mangling so the assembly symbols are unique for different overloads. For compatibility with names in C, there is also a syntax to disable name mangling. These unmangled names cannot be overloaded but act as the interface between C and \CFA code.  The syntax for disabling/enabling mangling is: \begin{cfa} // name mangling int i; // _X1ii_1 @extern "C"@ {  // no name mangling int j; // j @extern "Cforall"@ {  // name mangling int k; // _X1ki_1 } // no name mangling } // name mangling \end{cfa} Both forms of @extern@ affect all the declarations within their nested lexical scope and transition back to the previous mangling state when the lexical scope ends. \section{Reference Type} \CFA adds a rebindable reference type to C, but more expressive than the \CC reference.  Multi-level references are allowed and act like auto-dereferenced pointers using the ampersand (@&@) instead of the pointer asterisk (@*@). \CFA references may also be mutable or non-mutable. If mutable, a reference variable may be assigned to using the address-of operator (@&@), which converts the reference to a pointer. \begin{cfa} int i, j; int @&@ ri = i, @&&@ rri = ri; rri = 3;  // auto-dereference assign to i @&@ri = @&@j; // rebindable ri = 5;   // assign to j \end{cfa} \section{Constructors and Destructors} Both constructors and destructors are operators, which means they are just functions with special names. The special names are used to define them and may be used to call the functions expicately. The \CFA special names are constructed by taking the tokens in the operators and putting @?@ where the arguments would go. So multiplication is @?*?@ while dereference is @*?@. This also make it easy to tell the difference between pre-fix operations (such as @++?@) and post-fix operations (@?++@). The special name for contructors is @?{}@, which comes from the initialization syntax in C. The special name for destructors is @^{}@. % I don't like the \^{} symbol but $^\wedge$ isn't better. Any time a type T goes out of scope the destructor matching @void ^?{}(T &);@ is called. In theory this is also true of primitive types such as @int@, but in practice those are no-ops and are usually omitted for optimization. functions with special operator names rather than type names in \CC. The special operator names may be used to call the functions explicitly (not allowed in \CC for constructors). In general, operator names in \CFA are constructed by bracketing an operator token with @?@, which indicates where the arguments. For example, infixed multiplication is @?*?@ while prefix dereference is @*?@. This syntax make it easy to tell the difference between prefix operations (such as @++?@) and post-fix operations (@?++@). The special name for a constructor is @?{}@, which comes from the initialization syntax in C. The special name for a destructor is @^{}@, where the @^@ has no special meaning. % I don't like the \^{} symbol but $^\wedge$ isn't better. \begin{cfa} struct T { ... }; void ?@{}@(@T &@ this, ...) { ... }  // constructor void ?@^{}@(@T &@ this, ...) { ... } // destructor { T s = @{@ ... @}@;  // same constructor/initialization braces } // destructor call automatically generated \end{cfa} The first parameter is a reference parameter to the type for the constructor/destructor. Destructors may have multiple parameters.  The compiler implicitly matches an overloaded constructor @void ^?{}(T &, ...);@ to an object declaration with associated initialization, and generates a construction call after the object is allocated. When an object goes out of scope, the matching overloaded destructor @void ^?{}(T &);@ is called.  Without explicit definition, \CFA creates a default and copy constructor, destructor and assignment (like \CC). It is possible to define constructors/destructors for basic and existing types. \section{Polymorphism} \CFA uses polymorphism to create functions and types that are defined over different types. \CFA polymorphic declarations serve the same role as \CC templates or Java generics. Polymorphic declaractions start with a forall clause that goes before the standard (monomorphic) declaration. These declarations have the same syntax except that you may use the names introduced by the forall clause in them. Forall clauses are written @forall( ... )@ where @...@ becomes the list of polymorphic variables (local type names) and assertions, which repersent required operations on those types. \begin{cfa} forall(dtype T | { void do_once(T &); }) void do_twice(T & value) { do_once(value); do_once(value); } \end{cfa} A polymorphic function can be used in the same way normal functions are. The polymorphics variables are filled in with concrete types and the assertions are checked. An assertion checked by seeing if that name of that type (with all the variables replaced with the concrete types) is defined at the the call site. As an example, even if no function named @do_once@ is not defined near the definition of @do_twice@ the following code will work. \begin{cfa} \CFA uses parametric polymorphism to create functions and types that are defined over multiple types. \CFA polymorphic declarations serve the same role as \CC templates or Java generics. The parametric'' means the polymorphism is accomplished by passing argument operations to associate \emph{parameters} at the call site, and these parameters are used in the function to differentiate among the types the function operates on. Polymorphic declarations start with a universal @forall@ clause that goes before the standard (monomorphic) declaration. These declarations have the same syntax except they may use the universal type names introduced by the @forall@ clause.  For example, the following is a polymorphic identity function that works on any type @T@: \begin{cfa} @forall( T )@ @T@ identity( @T@ val ) { return val; } int forty_two = identity( 42 ); // T bound to int, forty_two == 42 \end{cfa} To allow a polymorphic function to be separately compiled, the type @T@ must be constrained by the operations used on @T@ in the function body. The @forall@ clauses is augmented with a list of polymorphic variables (local type names) and assertions (constraints), which represent the required operations on those types used in a function, \eg: \begin{cfa} forall( T @| { void do_once(T); }@) // assertion void do_twice(T value) { do_once(value); do_once(value); } void do_once(int i) { ... }  // provide assertion int i; do_twice(i); // implicitly pass assertion do_once to do_twice \end{cfa} Any object with a type fulfilling the assertion may be passed as an argument to a @do_twice@ call. A polymorphic function can be used in the same way as a normal function.  The polymorphic variables are filled in with concrete types and the assertions are checked. An assertion is checked by verifying each assertion operation (with all the variables replaced with the concrete types from the arguments) is defined at a call site. Note, a function named @do_once@ is not required in the scope of @do_twice@ to compile it, unlike \CC template expansion. Furthermore, call-site inferencing allows local replacement of the most specific parametric functions needs for a call. \begin{cfa} void do_once(double y) { ... } // global int quadruple(int x) { void do_once(int & y) { y = y * 2; } do_twice(x); return x; } \end{cfa} This is not the recommended way to implement a quadruple function but it does work. The complier will deduce that @do_twice@'s T is an integer from the argument. It will then look for a definition matching the assertion which is the @do_once@ defined within the function. That function will be passed in as a function pointer to @do_twice@ and called within it. To avoid typing out long lists of assertions again and again there are also traits which collect assertions into convenent packages that can then be used in assertion lists instead of all of their components. \begin{cfa} trait done_once(dtype T) { void do_once(T &); } \end{cfa} After this the forall list in the previous example could instead be written with the trait instead of the assertion itself. \begin{cfa} forall(dtype T | done_once(T)) \end{cfa} Traits can have arbitrary number of assertions in them and are usually used to create short hands for, and give descriptive names to, commond groupings of assertions. Polymorphic structures and unions may also be defined by putting a forall clause before the declaration. The type variables work the same way except are now used in field declaractions instead of parameters and local variables. void do_once(int y) { y = y * 2; } // local do_twice(x); // using local "do_once" return x; } \end{cfa} Specifically, the complier deduces that @do_twice@'s T is an integer from the argument @x@. It then looks for the most specific definition matching the assertion, which is the nested integral @do_once@ defined within the function. The matched assertion function is then passed as a function pointer to @do_twice@ and called within it. To avoid typing long lists of assertions, constraints can be collect into convenient packages called a @trait@, which can then be used in an assertion instead of the individual constraints. \begin{cfa} trait done_once(T) { void do_once(T); } \end{cfa} and the @forall@ list in the previous example is replaced with the trait. \begin{cfa} forall(dtype T | @done_once(T)@) \end{cfa} In general, a trait can contain an arbitrary number of assertions, both functions and variables, and are usually used to create a shorthand for, and give descriptive names to, common groupings of assertions describing a certain functionality, like @sumable@, @listable@, \etc. Polymorphic structures and unions are defined by qualifying the aggregate type with @forall@. The type variables work the same except they are used in field declarations instead of parameters, returns, and local variable declarations. \begin{cfa} forall(dtype T) struct node { node(T) * next; T * data; } \end{cfa} The @node(T)@ is a use of a polymorphic structure. Polymorphic types must be provided their polymorphic parameters. There are many other features of polymorphism that have not given here but these are the ones used by the exception system. node(T) * next;  // generic linked node T * data; } \end{cfa} The generic type @node(T)@ is an example of a polymorphic-type usage.  Like \CC templates usage, a polymorphic-type usage must specify a type parameter. There are many other polymorphism features in \CFA but these are the ones used by the exception system. \section{Concurrency} \CFA has a number of concurrency features, @thread@s, @monitor@s and @mutex@ parameters, @coroutine@s and @generator@s. The two features that interact with the exception system are @thread@s and @coroutine@s; they and their supporting constructs will be described here. \subsection{Coroutines} Coroutines are routines that do not have to finish execution to hand control back to their caller, instead they may suspend their execution at any time and resume it later. Coroutines are not true concurrency but share some similarities and many of the same underpinnings and so are included as part of the \CFA threading library. In \CFA coroutines are created using the @coroutine@ keyword which works just like @struct@ except that the created structure will be modified by the compiler to satify the @is_coroutine@ trait. These structures act as the interface between callers and the coroutine, the fields are used to pass information in and out. Here is a simple example where the single field is used to pass the next number in a sequence out. \CFA has a number of concurrency features: @thread@, @monitor@, @mutex@ parameters, @coroutine@ and @generator@. The two features that interact with the exception system are @thread@ and @coroutine@; they and their supporting constructs are described here. \subsection{Coroutine} A coroutine is a type with associated functions, where the functions are not required to finish execution when control is handed back to the caller. Instead they may suspend execution at any time and be resumed later at the point of last suspension. (Generators are stackless and coroutines are stackful.) These types are not concurrent but share some similarities along with common underpinnings, so they are combined with the \CFA threading library. Further discussion in this section only refers to the coroutine because generators are similar. In \CFA, a coroutine is created using the @coroutine@ keyword, which is an aggregate type like @struct,@ except the structure is implicitly modified by the compiler to satisfy the @is_coroutine@ trait; hence, a coroutine is restricted by the type system to types that provide this special trait.  The coroutine structure acts as the interface between callers and the coroutine, and its fields are used to pass information in and out of coroutine interface functions. Here is a simple example where a single field is used to pass (communicate) the next number in a sequence. \begin{cfa} coroutine CountUp { unsigned int next; } \end{cfa} The routine part of the coroutine is a main function for the coroutine. It takes a reference to a coroutine object and returns nothing. In this function, and any functions called by this function, the suspend statement may be used to return execution to the coroutine's caller. When control returns to the function it continue from that same suspend statement instead of at the top of the function. \begin{cfa} void main(CountUp & this) { unsigned int next = 0; while (true) { this.next = next; suspend; next = next + 1; } } \end{cfa} Control is passed to the coroutine with the resume function. This includes the first time when the coroutine is starting up. The resume function takes a reference to the coroutine structure and returns the same reference. The return value is for easy access to communication variables. For example the next value from a count-up can be generated and collected in a single expression: @resume(count).next@. unsigned int next; // communication variable } CountUp countup; \end{cfa} Each coroutine has @main@ function, which takes a reference to a coroutine object and returns @void@. \begin{cfa}[numbers=left] void main(@CountUp & this@) { // argument matches trait is_coroutine unsigned int up = 0;  // retained between calls while (true) { next = up; // make "up" available outside function @suspend;@$\label{suspend}$ up += 1; } } \end{cfa} In this function, or functions called by this function (helper functions), the @suspend@ statement is used to return execution to the coroutine's caller without terminating the coroutine. A coroutine is resumed by calling the @resume@ function, \eg @resume(countup)@. The first resume calls the @main@ function at the top. Thereafter, resume calls continue a coroutine in the last suspended function after the @suspend@ statement, in this case @main@ line~\ref{suspend}.  The @resume@ function takes a reference to the coroutine structure and returns the same reference. The return value allows easy access to communication variables defined in the coroutine object. For example, the @next@ value for coroutine object @countup@ is both generated and collected in the single expression: @resume(countup).next@. \subsection{Monitors and Mutex} True concurrency does not garrenty ordering. To get some of that ordering back \CFA uses monitors and mutex (mutual exclution) parameters. A monitor is another special declaration that contains a lock and is compatable with mutex parameters. Function parameters can have the @mutex@ qualifiers on reference arguments, for example @void example(a_monitor & mutex arg);@. When the function is called it will acquire the lock on all of the mutex parameters. This means that all functions that mutex on a type are part of a critical section and only one will ever run at a time. Concurrency does not guarantee ordering; without ordering results are non-deterministic. To claw back ordering, \CFA uses monitors and @mutex@ (mutual exclusion) parameters. A monitor is another kind of aggregate, where the compiler implicitly inserts a lock and instances are compatible with @mutex@ parameters. A function that requires deterministic (ordered) execution, acquires mutual exclusion on a monitor object by qualifying an object reference parameter with @mutex@. \begin{cfa} void example(MonitorA & @mutex@ argA, MonitorB & @mutex@ argB); \end{cfa} When the function is called, it implicitly acquires the monitor lock for all of the mutex parameters without deadlock.  This semantics means all functions with the same mutex type(s) are part of a critical section for objects of that type and only one runs at a time. \subsection{Threads} While coroutines allow new things to be done with a single execution path threads actually introduce new paths of execution that continue independently. Now for threads to work together their must be some communication between them and that means the timing of certain operations does have to be known. There or various means of syncronization and mutual exclution provided by \CFA but for exceptions only the basic two -- fork and join -- are needed. Threads are created like coroutines except the keyword is changed: Functions, generators, and coroutines are sequential so there is only a single (but potentially sophisticated) execution path in a program. Threads introduce multiple execution paths that continue independently. For threads to work safely with objects requires mutual exclusion using monitors and mutex parameters. For threads to work safely with other threads, also requires mutual exclusion in the form of a communication rendezvous, which also supports internal synchronization as for mutex objects. For exceptions only the basic two basic operations are important: thread fork and join. Threads are created like coroutines with an associated @main@ function: \begin{cfa} thread StringWorker { const char * input; int result; const char * input; int result; }; void main(StringWorker & this) { const char * localCopy = this.input; // ... do some work, perhaps hashing the string ... this.result = result; } \end{cfa} The main function will start executing after the fork operation and continue executing until it is finished. If another thread joins with this one it will wait until main has completed execution. In other words everything the thread does is between fork and join. From the outside this is the creation and destruction of the thread object. Fork happens after the constructor is run and join happens before the destructor runs. Join also happens during the @join@ function which can be used to join a thread earlier. If it is used the destructor does not join as that has already been completed. const char * localCopy = this.input; // ... do some work, perhaps hashing the string ... this.result = result; } { StringWorker stringworker; // fork thread running in "main" } // implicitly join with thread $$$\Rightarrow$$$ wait for completion \end{cfa} The thread main is where a new thread starts execution after a fork operation and then the thread continues executing until it is finished. If another thread joins with an executing thread, it waits until the executing main completes execution. In other words, everything a thread does is between a fork and join. From the outside, this behaviour is accomplished through creation and destruction of a thread object.  Implicitly, fork happens after a thread object's constructor is run and join happens before the destructor runs. Join can also be specified explicitly using the @join@ function to wait for a thread's completion independently from its deallocation (\ie destructor call). If @join@ is called explicitly, the destructor does not implicitly join.