[29c9b23] | 1 | \chapter{\CFA Existing Features} |
---|
[553f8abe] | 2 | \label{c:existing} |
---|
[f28fdee] | 3 | |
---|
[4ed7946e] | 4 | \CFA is an open-source project extending ISO C with |
---|
[6c79bef] | 5 | modern safety and productivity features, while still ensuring backwards |
---|
| 6 | compatibility with C and its programmers. \CFA is designed to have an |
---|
| 7 | orthogonal feature-set based closely on the C programming paradigm |
---|
| 8 | (non-object-oriented) and these features can be added incrementally to an |
---|
| 9 | existing C code-base allowing programmers to learn \CFA on an as-needed basis. |
---|
| 10 | |
---|
[4ed7946e] | 11 | Only those \CFA features pertaining to this thesis are discussed. Many of the |
---|
[6c79bef] | 12 | \CFA syntactic and semantic features used in the thesis should be fairly |
---|
| 13 | obvious to the reader. |
---|
| 14 | |
---|
[9af0fe2d] | 15 | \section{Overloading and \lstinline{extern}} |
---|
[6c79bef] | 16 | \CFA has extensive overloading, allowing multiple definitions of the same name |
---|
[67c6a47] | 17 | to be defined~\cite{Moss18}. |
---|
[f28fdee] | 18 | \begin{cfa} |
---|
[df24d37] | 19 | char i; int i; double i; |
---|
| 20 | int f(); double f(); |
---|
| 21 | void g( int ); void g( double ); |
---|
[f28fdee] | 22 | \end{cfa} |
---|
[6c79bef] | 23 | This feature requires name mangling so the assembly symbols are unique for |
---|
| 24 | different overloads. For compatibility with names in C, there is also a syntax |
---|
| 25 | to disable name mangling. These unmangled names cannot be overloaded but act as |
---|
| 26 | the interface between C and \CFA code. The syntax for disabling/enabling |
---|
| 27 | mangling is: |
---|
[f28fdee] | 28 | \begin{cfa} |
---|
[edc6ea2] | 29 | // name mangling on by default |
---|
[6c79bef] | 30 | int i; // _X1ii_1 |
---|
[4ed7946e] | 31 | @extern "C"@ { // disables name mangling |
---|
[6c79bef] | 32 | int j; // j |
---|
[4ed7946e] | 33 | @extern "Cforall"@ { // enables name mangling |
---|
[6c79bef] | 34 | int k; // _X1ki_1 |
---|
| 35 | } |
---|
[edc6ea2] | 36 | // revert to no name mangling |
---|
[6e7b969] | 37 | } |
---|
[edc6ea2] | 38 | // revert to name mangling |
---|
[6c79bef] | 39 | \end{cfa} |
---|
| 40 | Both forms of @extern@ affect all the declarations within their nested lexical |
---|
| 41 | scope and transition back to the previous mangling state when the lexical scope |
---|
| 42 | ends. |
---|
| 43 | |
---|
| 44 | \section{Reference Type} |
---|
[03c0e44] | 45 | \CFA adds a reference type to C as an auto-dereferencing pointer. |
---|
| 46 | They work very similarly to pointers. |
---|
[4ed7946e] | 47 | Reference-types are written the same way as a pointer-type but each |
---|
[03c0e44] | 48 | asterisk (@*@) is replaced with a ampersand (@&@); |
---|
[4ed7946e] | 49 | this includes cv-qualifiers and multiple levels of reference, \eg: |
---|
[03c0e44] | 50 | |
---|
[4ed7946e] | 51 | \begin{minipage}{0,5\textwidth} |
---|
[03c0e44] | 52 | With references: |
---|
| 53 | \begin{cfa} |
---|
| 54 | int i, j; |
---|
| 55 | int & ri = i; |
---|
| 56 | int && rri = ri; |
---|
| 57 | rri = 3; |
---|
[4ed7946e] | 58 | &ri = &j; // reference assignment |
---|
[03c0e44] | 59 | ri = 5; |
---|
| 60 | \end{cfa} |
---|
| 61 | \end{minipage} |
---|
[4ed7946e] | 62 | \begin{minipage}{0,5\textwidth} |
---|
[03c0e44] | 63 | With pointers: |
---|
[6c79bef] | 64 | \begin{cfa} |
---|
| 65 | int i, j; |
---|
[03c0e44] | 66 | int * pi = &i |
---|
| 67 | int ** ppi = π |
---|
| 68 | **ppi = 3; |
---|
[4ed7946e] | 69 | pi = &j; // pointer assignment |
---|
[03c0e44] | 70 | *pi = 5; |
---|
[f28fdee] | 71 | \end{cfa} |
---|
[03c0e44] | 72 | \end{minipage} |
---|
[6e7b969] | 73 | |
---|
[4ed7946e] | 74 | References are intended for cases where you would want to use pointers but would |
---|
| 75 | be dereferencing them (almost) every usage. |
---|
| 76 | In most cases a reference can just be thought of as a pointer that |
---|
| 77 | automatically puts a dereference in front of each of its uses (per-level of |
---|
| 78 | reference). |
---|
| 79 | The address-of operator (@&@) acts as an escape and removes one of the |
---|
| 80 | automatic dereference operations. |
---|
| 81 | Mutable references may be assigned by converting them to a pointer |
---|
| 82 | with a @&@ and then assigning a pointer to them, as in @&ri = &j;@ above. |
---|
[6e7b969] | 83 | |
---|
[4ed7946e] | 84 | \section{Operators} |
---|
[6c79bef] | 85 | |
---|
| 86 | In general, operator names in \CFA are constructed by bracketing an operator |
---|
[edc6ea2] | 87 | token with @?@, which indicates the position of the arguments. For example, |
---|
| 88 | infixed multiplication is @?*?@ while prefix dereference is @*?@. |
---|
| 89 | This syntax make it easy to tell the difference between prefix operations |
---|
| 90 | (such as @++?@) and post-fix operations (@?++@). |
---|
[6c79bef] | 91 | |
---|
[4ed7946e] | 92 | An operator name may describe any function signature (it is just a name) but |
---|
| 93 | only certain signatures may be called in operator form. |
---|
[edc6ea2] | 94 | \begin{cfa} |
---|
[4ed7946e] | 95 | int ?+?( int i, int j, int k ) { return i + j + k; } |
---|
[edc6ea2] | 96 | { |
---|
[4ed7946e] | 97 | sout | ?+?( 3, 4, 5 ); // no infix form |
---|
[edc6ea2] | 98 | } |
---|
[4ed7946e] | 99 | \end{cfa} |
---|
| 100 | Some ``near-misses" for unary/binary operator prototypes generate warnings. |
---|
| 101 | |
---|
| 102 | Both constructors and destructors are operators, which means they are |
---|
| 103 | functions with special operator names rather than type names in \Cpp. The |
---|
| 104 | special operator names may be used to call the functions explicitly (not |
---|
| 105 | allowed in \Cpp for constructors). |
---|
| 106 | |
---|
| 107 | The special name for a constructor is @?{}@, where the name @{}@ comes from the |
---|
| 108 | initialization syntax in C, \eg @Structure s = {...}@. |
---|
| 109 | % That initialization syntax is also the operator form. |
---|
| 110 | \CFA generates a constructor call each time a variable is declared, |
---|
| 111 | passing the initialization arguments to the constructor. |
---|
| 112 | \begin{cfa} |
---|
| 113 | struct Structure { ... }; |
---|
| 114 | void ?{}(Structure & this) { ... } |
---|
[edc6ea2] | 115 | { |
---|
[4ed7946e] | 116 | Structure a; |
---|
| 117 | Structure b = {}; |
---|
| 118 | } |
---|
| 119 | void ?{}(Structure & this, char first, int num) { ... } |
---|
| 120 | { |
---|
| 121 | Structure c = {'a', 2}; |
---|
[edc6ea2] | 122 | } |
---|
| 123 | \end{cfa} |
---|
[4ed7946e] | 124 | Both @a@ and @b@ are initialized with the first constructor, |
---|
| 125 | while @c@ is initialized with the second. |
---|
| 126 | Currently, there is no general way to skip initialization. |
---|
[edc6ea2] | 127 | |
---|
[6c79bef] | 128 | % I don't like the \^{} symbol but $^\wedge$ isn't better. |
---|
[4ed7946e] | 129 | Similarly, destructors use the special name @^?{}@ (the @^@ has no special |
---|
| 130 | meaning). Normally, they are implicitly called on a variable when it goes out |
---|
| 131 | of scope but they can be called explicitly as well. |
---|
[6c79bef] | 132 | \begin{cfa} |
---|
[4ed7946e] | 133 | void ^?{}(Structure & this) { ... } |
---|
[6c79bef] | 134 | { |
---|
[4ed7946e] | 135 | Structure d; |
---|
[edc6ea2] | 136 | } // <- implicit destructor call |
---|
[6c79bef] | 137 | \end{cfa} |
---|
[edc6ea2] | 138 | |
---|
[4ed7946e] | 139 | Whenever a type is defined, \CFA creates a default zero-argument |
---|
[edc6ea2] | 140 | constructor, a copy constructor, a series of argument-per-field constructors |
---|
| 141 | and a destructor. All user constructors are defined after this. |
---|
| 142 | Because operators are never part of the type definition they may be added |
---|
| 143 | at any time, including on built-in types. |
---|
[6e7b969] | 144 | |
---|
| 145 | \section{Polymorphism} |
---|
[6c79bef] | 146 | \CFA uses parametric polymorphism to create functions and types that are |
---|
| 147 | defined over multiple types. \CFA polymorphic declarations serve the same role |
---|
[29c9b23] | 148 | as \Cpp templates or Java generics. The ``parametric'' means the polymorphism is |
---|
[6c79bef] | 149 | accomplished by passing argument operations to associate \emph{parameters} at |
---|
| 150 | the call site, and these parameters are used in the function to differentiate |
---|
| 151 | among the types the function operates on. |
---|
| 152 | |
---|
| 153 | Polymorphic declarations start with a universal @forall@ clause that goes |
---|
| 154 | before the standard (monomorphic) declaration. These declarations have the same |
---|
| 155 | syntax except they may use the universal type names introduced by the @forall@ |
---|
| 156 | clause. For example, the following is a polymorphic identity function that |
---|
| 157 | works on any type @T@: |
---|
| 158 | \begin{cfa} |
---|
[edc6ea2] | 159 | forall( T ) T identity( T val ) { return val; } |
---|
| 160 | int forty_two = identity( 42 ); |
---|
| 161 | char capital_a = identity( 'A' ); |
---|
[6c79bef] | 162 | \end{cfa} |
---|
[4ed7946e] | 163 | Each use of a polymorphic declaration resolves its polymorphic parameters |
---|
[edc6ea2] | 164 | (in this case, just @T@) to concrete types (@int@ in the first use and @char@ |
---|
| 165 | in the second). |
---|
[6e7b969] | 166 | |
---|
[6c79bef] | 167 | To allow a polymorphic function to be separately compiled, the type @T@ must be |
---|
| 168 | constrained by the operations used on @T@ in the function body. The @forall@ |
---|
[4ed7946e] | 169 | clause is augmented with a list of polymorphic variables (local type names) |
---|
[6c79bef] | 170 | and assertions (constraints), which represent the required operations on those |
---|
| 171 | types used in a function, \eg: |
---|
[f28fdee] | 172 | \begin{cfa} |
---|
[4ed7946e] | 173 | forall( T | { void do_once(T); } ) |
---|
[6c79bef] | 174 | void do_twice(T value) { |
---|
| 175 | do_once(value); |
---|
| 176 | do_once(value); |
---|
[6e7b969] | 177 | } |
---|
[f28fdee] | 178 | \end{cfa} |
---|
[6c79bef] | 179 | |
---|
| 180 | A polymorphic function can be used in the same way as a normal function. The |
---|
| 181 | polymorphic variables are filled in with concrete types and the assertions are |
---|
| 182 | checked. An assertion is checked by verifying each assertion operation (with |
---|
| 183 | all the variables replaced with the concrete types from the arguments) is |
---|
| 184 | defined at a call site. |
---|
[edc6ea2] | 185 | \begin{cfa} |
---|
| 186 | void do_once(int i) { ... } |
---|
| 187 | int i; |
---|
| 188 | do_twice(i); |
---|
| 189 | \end{cfa} |
---|
| 190 | Any object with a type fulfilling the assertion may be passed as an argument to |
---|
| 191 | a @do_twice@ call. |
---|
[6c79bef] | 192 | |
---|
| 193 | Note, a function named @do_once@ is not required in the scope of @do_twice@ to |
---|
[29c9b23] | 194 | compile it, unlike \Cpp template expansion. Furthermore, call-site inferencing |
---|
[6c79bef] | 195 | allows local replacement of the most specific parametric functions needs for a |
---|
| 196 | call. |
---|
[f28fdee] | 197 | \begin{cfa} |
---|
[edc6ea2] | 198 | void do_once(double y) { ... } |
---|
[6e7b969] | 199 | int quadruple(int x) { |
---|
[4ed7946e] | 200 | void do_once(int y) { y = y * 2; } // replace global do_once |
---|
| 201 | do_twice(x); // use local do_once |
---|
| 202 | do_twice(x + 1.5); // use global do_once |
---|
[6c79bef] | 203 | return x; |
---|
[6e7b969] | 204 | } |
---|
[f28fdee] | 205 | \end{cfa} |
---|
[6c79bef] | 206 | Specifically, the complier deduces that @do_twice@'s T is an integer from the |
---|
[4ed7946e] | 207 | argument @x@. It then looks for the most \emph{specific} definition matching the |
---|
[6c79bef] | 208 | assertion, which is the nested integral @do_once@ defined within the |
---|
| 209 | function. The matched assertion function is then passed as a function pointer |
---|
[4ed7946e] | 210 | to @do_twice@ and called within it. The global definition of @do_once@ is used |
---|
| 211 | for the second call because the float-point argument is a better match. |
---|
[6c79bef] | 212 | |
---|
| 213 | To avoid typing long lists of assertions, constraints can be collect into |
---|
| 214 | convenient packages called a @trait@, which can then be used in an assertion |
---|
| 215 | instead of the individual constraints. |
---|
[f28fdee] | 216 | \begin{cfa} |
---|
[6c79bef] | 217 | trait done_once(T) { |
---|
| 218 | void do_once(T); |
---|
[6e7b969] | 219 | } |
---|
[f28fdee] | 220 | \end{cfa} |
---|
[6c79bef] | 221 | and the @forall@ list in the previous example is replaced with the trait. |
---|
[f28fdee] | 222 | \begin{cfa} |
---|
[edc6ea2] | 223 | forall(dtype T | done_once(T)) |
---|
[f28fdee] | 224 | \end{cfa} |
---|
[6c79bef] | 225 | In general, a trait can contain an arbitrary number of assertions, both |
---|
| 226 | functions and variables, and are usually used to create a shorthand for, and |
---|
| 227 | give descriptive names to, common groupings of assertions describing a certain |
---|
| 228 | functionality, like @sumable@, @listable@, \etc. |
---|
| 229 | |
---|
| 230 | Polymorphic structures and unions are defined by qualifying the aggregate type |
---|
| 231 | with @forall@. The type variables work the same except they are used in field |
---|
| 232 | declarations instead of parameters, returns, and local variable declarations. |
---|
[f28fdee] | 233 | \begin{cfa} |
---|
[edc6ea2] | 234 | forall(dtype T) |
---|
[6e7b969] | 235 | struct node { |
---|
[9b0bb79] | 236 | node(T) * next; |
---|
[edc6ea2] | 237 | T * data; |
---|
[6e7b969] | 238 | } |
---|
[edc6ea2] | 239 | node(int) inode; |
---|
[f28fdee] | 240 | \end{cfa} |
---|
[edc6ea2] | 241 | The generic type @node(T)@ is an example of a polymorphic type usage. Like \Cpp |
---|
| 242 | template usage, a polymorphic type usage must specify a type parameter. |
---|
[6e7b969] | 243 | |
---|
[6c79bef] | 244 | There are many other polymorphism features in \CFA but these are the ones used |
---|
| 245 | by the exception system. |
---|
[6e7b969] | 246 | |
---|
[67c6a47] | 247 | \section{Control Flow} |
---|
| 248 | \CFA has a number of advanced control-flow features: @generator@, @coroutine@, @monitor@, @mutex@ parameters, and @thread@. |
---|
| 249 | The two features that interact with |
---|
| 250 | the exception system are @coroutine@ and @thread@; they and their supporting |
---|
[6c79bef] | 251 | constructs are described here. |
---|
| 252 | |
---|
| 253 | \subsection{Coroutine} |
---|
| 254 | A coroutine is a type with associated functions, where the functions are not |
---|
| 255 | required to finish execution when control is handed back to the caller. Instead |
---|
| 256 | they may suspend execution at any time and be resumed later at the point of |
---|
| 257 | last suspension. (Generators are stackless and coroutines are stackful.) These |
---|
| 258 | types are not concurrent but share some similarities along with common |
---|
| 259 | underpinnings, so they are combined with the \CFA threading library. Further |
---|
| 260 | discussion in this section only refers to the coroutine because generators are |
---|
| 261 | similar. |
---|
| 262 | |
---|
| 263 | In \CFA, a coroutine is created using the @coroutine@ keyword, which is an |
---|
| 264 | aggregate type like @struct,@ except the structure is implicitly modified by |
---|
| 265 | the compiler to satisfy the @is_coroutine@ trait; hence, a coroutine is |
---|
| 266 | restricted by the type system to types that provide this special trait. The |
---|
| 267 | coroutine structure acts as the interface between callers and the coroutine, |
---|
| 268 | and its fields are used to pass information in and out of coroutine interface |
---|
| 269 | functions. |
---|
| 270 | |
---|
| 271 | Here is a simple example where a single field is used to pass (communicate) the |
---|
| 272 | next number in a sequence. |
---|
[f28fdee] | 273 | \begin{cfa} |
---|
[6e7b969] | 274 | coroutine CountUp { |
---|
[9b0bb79] | 275 | unsigned int next; |
---|
[6e7b969] | 276 | } |
---|
[6c79bef] | 277 | CountUp countup; |
---|
[f28fdee] | 278 | \end{cfa} |
---|
[67c6a47] | 279 | Each coroutine has a @main@ function, which takes a reference to a coroutine |
---|
[6c79bef] | 280 | object and returns @void@. |
---|
[4ed7946e] | 281 | \begin{cfa}[numbers=left] |
---|
[edc6ea2] | 282 | void main(CountUp & this) { |
---|
[4ed7946e] | 283 | for (unsigned int next = 0 ; true ; ++next) { |
---|
[edc6ea2] | 284 | next = up; |
---|
| 285 | suspend;$\label{suspend}$ |
---|
[6c79bef] | 286 | } |
---|
[6e7b969] | 287 | } |
---|
[f28fdee] | 288 | \end{cfa} |
---|
[6c79bef] | 289 | In this function, or functions called by this function (helper functions), the |
---|
| 290 | @suspend@ statement is used to return execution to the coroutine's caller |
---|
[67c6a47] | 291 | without terminating the coroutine's function. |
---|
[6c79bef] | 292 | |
---|
| 293 | A coroutine is resumed by calling the @resume@ function, \eg @resume(countup)@. |
---|
| 294 | The first resume calls the @main@ function at the top. Thereafter, resume calls |
---|
| 295 | continue a coroutine in the last suspended function after the @suspend@ |
---|
| 296 | statement, in this case @main@ line~\ref{suspend}. The @resume@ function takes |
---|
| 297 | a reference to the coroutine structure and returns the same reference. The |
---|
| 298 | return value allows easy access to communication variables defined in the |
---|
| 299 | coroutine object. For example, the @next@ value for coroutine object @countup@ |
---|
| 300 | is both generated and collected in the single expression: |
---|
| 301 | @resume(countup).next@. |
---|
[6e7b969] | 302 | |
---|
[67c6a47] | 303 | \subsection{Monitor and Mutex Parameter} |
---|
[6c79bef] | 304 | Concurrency does not guarantee ordering; without ordering results are |
---|
| 305 | non-deterministic. To claw back ordering, \CFA uses monitors and @mutex@ |
---|
| 306 | (mutual exclusion) parameters. A monitor is another kind of aggregate, where |
---|
| 307 | the compiler implicitly inserts a lock and instances are compatible with |
---|
| 308 | @mutex@ parameters. |
---|
| 309 | |
---|
| 310 | A function that requires deterministic (ordered) execution, acquires mutual |
---|
| 311 | exclusion on a monitor object by qualifying an object reference parameter with |
---|
| 312 | @mutex@. |
---|
| 313 | \begin{cfa} |
---|
[edc6ea2] | 314 | void example(MonitorA & mutex argA, MonitorB & mutex argB); |
---|
[6c79bef] | 315 | \end{cfa} |
---|
| 316 | When the function is called, it implicitly acquires the monitor lock for all of |
---|
| 317 | the mutex parameters without deadlock. This semantics means all functions with |
---|
| 318 | the same mutex type(s) are part of a critical section for objects of that type |
---|
| 319 | and only one runs at a time. |
---|
[6e7b969] | 320 | |
---|
[67c6a47] | 321 | \subsection{Thread} |
---|
[6c79bef] | 322 | Functions, generators, and coroutines are sequential so there is only a single |
---|
| 323 | (but potentially sophisticated) execution path in a program. Threads introduce |
---|
| 324 | multiple execution paths that continue independently. |
---|
[6e7b969] | 325 | |
---|
[6c79bef] | 326 | For threads to work safely with objects requires mutual exclusion using |
---|
| 327 | monitors and mutex parameters. For threads to work safely with other threads, |
---|
| 328 | also requires mutual exclusion in the form of a communication rendezvous, which |
---|
[67c6a47] | 329 | also supports internal synchronization as for mutex objects. For exceptions, |
---|
| 330 | only two basic thread operations are important: fork and join. |
---|
[6e7b969] | 331 | |
---|
[6c79bef] | 332 | Threads are created like coroutines with an associated @main@ function: |
---|
[f28fdee] | 333 | \begin{cfa} |
---|
[6e7b969] | 334 | thread StringWorker { |
---|
[6c79bef] | 335 | const char * input; |
---|
| 336 | int result; |
---|
[6e7b969] | 337 | }; |
---|
| 338 | void main(StringWorker & this) { |
---|
[6c79bef] | 339 | const char * localCopy = this.input; |
---|
| 340 | // ... do some work, perhaps hashing the string ... |
---|
| 341 | this.result = result; |
---|
[6e7b969] | 342 | } |
---|
[6c79bef] | 343 | { |
---|
| 344 | StringWorker stringworker; // fork thread running in "main" |
---|
[9b0bb79] | 345 | } // <- implicitly join with thread / wait for completion |
---|
[f28fdee] | 346 | \end{cfa} |
---|
[6c79bef] | 347 | The thread main is where a new thread starts execution after a fork operation |
---|
| 348 | and then the thread continues executing until it is finished. If another thread |
---|
| 349 | joins with an executing thread, it waits until the executing main completes |
---|
| 350 | execution. In other words, everything a thread does is between a fork and join. |
---|
| 351 | |
---|
| 352 | From the outside, this behaviour is accomplished through creation and |
---|
| 353 | destruction of a thread object. Implicitly, fork happens after a thread |
---|
| 354 | object's constructor is run and join happens before the destructor runs. Join |
---|
| 355 | can also be specified explicitly using the @join@ function to wait for a |
---|
| 356 | thread's completion independently from its deallocation (\ie destructor |
---|
| 357 | call). If @join@ is called explicitly, the destructor does not implicitly join. |
---|