# Changeset 07bc165

Ignore:
Timestamp:
Jul 10, 2016, 12:56:11 PM (6 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, ctor, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, memory, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
6e4b913
Parents:
0b4d93ab
Message:

update section on pointer/reference and LaTeX macros

Location:
doc
Files:
2 edited

### Legend:

Unmodified
 r0b4d93ab %% Created On       : Wed Apr  6 14:53:29 2016 %% Last Modified By : Peter A. Buhr %% Last Modified On : Thu Jul  7 08:25:37 2016 %% Update Count     : 1099 %% Last Modified On : Sun Jul 10 12:52:09 2016 %% Update Count     : 1200 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} \CFA\footnote{Pronounced C-for-all'', and written \CFA, CFA, or \CFL.} is a modern general-purpose programming-language, designed an as evolutionary step forward from the C programming language. The syntax of the \CFA language builds from C, and should look immediately familiar to C/\CC programmers. \CFA\footnote{Pronounced C-for-all'', and written \CFA, CFA, or \CFL.} is a modern general-purpose programming-language, designed as an evolutionary step forward from the C programming language. The syntax of the \CFA language builds from C, and should look immediately familiar to C/\Index*[C++]{\CC} programmers. % Any language feature that is not described here can be assumed to be using the standard C11 syntax. \CFA adds many modern programming-language features that directly lead to increased \emph{safety} and \emph{productivity}, while maintaining interoperability with existing C programs and achieving C performance. Like C, \CFA is a statically typed, procedural language with a low-overhead runtime, meaning there is no global garbage-collection. The primary new features include parametric-polymorphism routines and types, exceptions, concurrency, and modules. The primary new features include parametric-polymorphic routines and types, exceptions, concurrency, and modules. One of the main design philosophies of \CFA is to describe not prescribe'', which means \CFA tries to provide a pathway from low-level C programming to high-level \CFA programming, but it does not force programmers to do the right thing''. instead, a programmer evolves an existing C program into \CFA by incrementally incorporating \CFA features. New programs can be written in \CFA using a combination of C and \CFA features. \CC had a similar goal 30 years ago, but has struggled over the intervening time to incorporate modern programming-language features because of early design choices. \Index*[C++]{\CC} had a similar goal 30 years ago, but has struggled over the intervening time to incorporate modern programming-language features because of early design choices. \CFA has 30 years of hindsight and a clean starting point. Like \CC, there may be both an old and new ways to achieve the same effect. Like \Index*[C++]{\CC}, there may be both an old and new ways to achieve the same effect. For example, the following programs compare the \CFA and C I/O mechanisms. \begin{quote2} \end{quote2} Both programs output the same result. While the \CFA I/O looks similar to the \CC output style, there are important differences, such as automatic spacing between variables as in Python (see also~\VRef{s:IOLibrary}). While the \CFA I/O looks similar to the \Index*[C++]{\CC} output style, there are important differences, such as automatic spacing between variables as in \Index*{Python} (see also~\VRef{s:IOLibrary}). This document is a user manual for the \CFA programming language, targeted at \CFA programmers. The \CFA project started with K-W C~\cite{Buhr94a,Till89}, which extended C with new declaration syntax, multiple return values from routines, and extended assignment capabilities using the notion of tuples. (See~\cite{Werther96} for some similar work, but for \CC.) The original \CFA project~\cite{Ditchfield92} extended the C type system with parametric polymorphism and overloading, as opposed to the \CC approach of object-oriented extensions to the C type-system. (See~\cite{Werther96} for some similar work, but for \Index*[C++]{\CC}.) The original \CFA project~\cite{Ditchfield92} extended the C type system with parametric polymorphism and overloading, as opposed to the \Index*[C++]{\CC} approach of object-oriented extensions to the C type-system. A first implementation of the core Cforall language was created~\cite{Bilson03,Esteves04}, but at the time there was little interesting in extending C, so work did not continue. As the saying goes, What goes around, comes around.'', and there is now renewed interest in the C programming language because of legacy code-bases, so the \CFA project has been restarted. For system programming, where direct access to hardware and dealing with real-time issues is a requirement, C is usually the language of choice. As well, there are millions of lines of C legacy code, forming the base for many software development projects (especially on UNIX systems). The TIOBE index (\url{http://www.tiobe.com/tiobe_index}) for March 2016 shows programming-language popularity, with Java 20.5\%, C 14.5\%, \CC 6.7\%, \CS 4.3\%, Python 4.3\%, and all other programming languages below 3\%. The TIOBE index (\url{http://www.tiobe.com/tiobe_index}) for March 2016 shows programming-language popularity, with \Index*{Java} 20.5\%, C 14.5\%, \Index*[C++]{\CC} 6.7\%, \CS 4.3\%, \Index*{Python} 4.3\%, and all other programming languages below 3\%. As well, for 30 years, C has been the number 1 and 2 most popular programming language: \begin{center} \end{tabular} \end{center} Hence, C is still an extremely important programming language, with double the usage of \CC, where \CC itself is largely C code. Hence, C is still an extremely important programming language, with double the usage of \Index*[C++]{\CC}, where \CC itself is largely C code. Finally, love it or hate it, C has been an important and influential part of computer science for 40 years and it appears it will continue to be for many more years. Unfortunately, C has too many problems and omissions to make it an acceptable programming language for modern needs. however, it largely extended the language, and did not address many existing problems.\footnote{% Two important existing problems addressed were changing the type of character literals from ©int© to ©char© and enumerator from ©int© to the type of its enumerators.} Fortran~\cite{Fortran08}, Ada~\cite{Ada12}, and Cobol~\cite{Cobol14} are examples of programming languages that took an evolutionary approach, where modern language features (e.g., objects, concurrency) are added and problems fixed within the framework of the existing language. Java~\cite{Java8}, Go~\cite{Go}, Rust~\cite{Rust} and D~\cite{D} are examples of the revolutionary approach for modernizing C/\CC, resulting in a new language rather than an extension of the descendent. \Index*{Fortran}~\cite{Fortran08}, \Index*{Ada}~\cite{Ada12}, and \Index*{Cobol}~\cite{Cobol14} are examples of programming languages that took an evolutionary approach, where modern language features (e.g., objects, concurrency) are added and problems fixed within the framework of the existing language. \Index*{Java}~\cite{Java8}, \Index*{Go}~\cite{Go}, \Index*{Rust}~\cite{Rust} and \Index*{D}~\cite{D} are examples of the revolutionary approach for modernizing C/\CC, resulting in a new language rather than an extension of the descendent. These languages have different syntax and semantics from C, and do not interoperate directly with C, largely because of garbage collection. As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. In fact, one of the biggest issues for any new programming language is establishing a minimum level of library code to support a large body of activities. Language developers often state that adequate library support takes more work than designing and implementing the language itself. Like \CC, \CFA starts with immediate access to all exiting C libraries, and in many cases, can easily wrap library routines with simpler and safer interfaces, at very low cost. Like \Index*[C++]{\CC}, \CFA starts with immediate access to all exiting C libraries, and in many cases, can easily wrap library routines with simpler and safer interfaces, at very low cost. Hence, \CFA begins by leveraging the large repository of C libraries with little cost. char abs( char ); extern "C" { int abs( int );                                 // use default C routine for int int abs( int );                                 §\C{// use default C routine for int}§ } // extern "C" long int abs( long int ); \end{lstlisting} The problem is the name clash between the library routine ©abs© and the \CFA names ©abs©. Hence, names appearing in an ©extern "C"© block have \newterm{C linkage}. Then overloading polymorphism uses a mechanism called \newterm{name mangling} to create unique names that are different from C names, which are not mangled. Hence, names appearing in an ©extern "C"© block have \newterm*{C linkage}. Then overloading polymorphism uses a mechanism called \newterm{name mangling}\index{mangling!name} to create unique names that are different from C names, which are not mangled. Hence, there is the same need as in \CC, to know if a name is a C or \CFA name, so it can be correctly formed. There is no way around this problem, other than C's approach of creating unique names for each pairing of operation and type. \section[Compiling CFA Program]{Compiling \CFA Program} The command ©cfa© is used to compile \CFA program(s), and is based on the GNU ©gcc©\index{gcc} command, e.g.: The command ©cfa© is used to compile \CFA program(s), and is based on the GNU \Indexc{gcc} command, e.g.: \begin{lstlisting} cfa§\indexc{cfa}\index{compilation!cfa@©cfa©}§ [ gcc-options ] C/§\CFA§-files [ assembler/loader-files ] \end{lstlisting} By default, \CFA programs having the following ©gcc© flags turned on: \CFA programs having the following ©gcc© flags turned on: \begin{description} \item\hspace*{-0.6ex}\Indexc{-std=gnu99}\index{compilation option!-std=gnu99@{©-std=gnu99©}} \item \Indexc{-std=gnu99}\index{compilation option!-std=gnu99@{©-std=gnu99©}} The 1999 C standard plus GNU extensions. \item\hspace*{-0.6ex}\Indexc{-fgnu89-¶inline¶}\index{compilation option!-fgnu89-inline@{©-fgnu89-¶inline¶©}} \item \Indexc{-fgnu89-inline}\index{compilation option!-fgnu89-inline@{©-fgnu89-inline©}} Use the traditional GNU semantics for inline routines in C99 mode, which allows inline routines in header files. \end{description} The following new \CFA option is available: The following new \CFA options are available: \begin{description} \item\hspace*{-0.6ex}\Indexc{-CFA}\index{compilation option!-CFA@{©-CFA©}} \item \Indexc{-CFA}\index{compilation option!-CFA@©-CFA©} Only the C preprocessor and the \CFA translator steps are performed and the transformed program is written to standard output, which makes it possible to examine the code generated by the \CFA translator. The generated code started with the standard \CFA prelude. \item \Indexc{-debug}\index{compilation option!-debug@©-debug©} The program is linked with the debugging version of the runtime system. The debug version performs runtime checks to help during the debugging phase of a \CFA program, but substantially slows the execution of the program. The runtime checks should only be removed after the program is completely debugged. \textbf{This option is the default.} \item \Indexc{-nodebug}\index{compilation option!-nodebug@©-nodebug©} The program is linked with the non-debugging version of the unikernel or multikernel, so the execution of the program is faster. \Emph{However, no runtime checks or ©assert©s are performed so errors usually result in abnormal program termination.} \item \Indexc{-help}\index{compilation option!-help@©-help©} Information about the set of \CFA compilation flags is printed. \item \Indexc{-nohelp}\index{compilation option!-nohelp@©-nohelp©} Information about the set of \CFA compilation flags is not printed. \textbf{This option is the default.} \item \Indexc{-quiet}\index{compilation option!-quiet@©-quiet©} The \CFA compilation message is not printed at the beginning of a compilation. \item \Indexc{-noquiet}\index{compilation option!-noquiet@©-noquiet©} The \CFA compilation message is printed at the beginning of a compilation. \textbf{This option is the default.} \end{description} The following preprocessor variables are available: \begin{description} \item\hspace*{-0.6ex}\Indexc{__CFA__}\index{preprocessor variables!__CFA__@{©__CFA__©}} \item \Indexc{__CFA__}\index{preprocessor variables!__CFA__@{©__CFA__©}} is always available during preprocessing and its value is the current major \Index{version number} of \CFA.\footnote{ The C preprocessor allows only integer values in a preprocessor variable so a value like \Version'' is not allowed. Hence, the need to have three variables for the major, minor and patch version number.} \item\hspace*{-0.6ex}\Indexc{__CFA_MINOR__}\index{preprocessor variables!__CFA_MINOR__@{©__CFA_MINOR__©}} \item \Indexc{__CFA_MINOR__}\index{preprocessor variables!__CFA_MINOR__@{©__CFA_MINOR__©}} is always available during preprocessing and its value is the current minor \Index{version number} of \CFA. \item\hspace*{-0.6ex}\Indexc{__CFA_PATCH__}\index{preprocessor variables!__CFA_PATCH__@©__CFA_PATCH__©} \item \Indexc{__CFA_PATCH__}\index{preprocessor variables!__CFA_PATCH__@©__CFA_PATCH__©} is always available during preprocessing and its value is the current patch \Index{version number} of \CFA. \item\hspace*{-0.6ex}\Indexc{__CFA__}\index{preprocessor variables!__CFA__@©__CFA__©} and \Indexc{__CFORALL__}\index{preprocessor variables!__CFORALL__@©__CFORALL__©} are always available during preprocessing and have no value. \item \Indexc{__CFORALL__}\index{preprocessor variables!__CFORALL__@©__CFORALL__©} is always available during preprocessing and has no value. \end{description} These preprocessor variables allow conditional compilation of programs that must work differently in these situations. For example, to toggle between C and \CFA extensions, using the following: #endif \end{lstlisting} which conditionally includes the correct header file, if the program is compiled using ©gcc© or ©cfa©. which conditionally includes the correct header file, if the program is compiled using \Indexc{gcc} or \Indexc{cfa}. \end{enumerate} It is significantly easier to read and enter long constants when they are broken up into smaller groupings (most cultures use comma or period among digits for the same purpose). This extension is backwards compatible, matches with the use of underscore in variable names, and appears in Ada and Java 8. This extension is backwards compatible, matches with the use of underscore in variable names, and appears in \Index*{Ada} and \Index*{Java} 8. \end{tabular} \end{quote2} Is this an array of 5 pointers to integers or a pointer to an array of 5 integers? Is this an array of 5 pointers to integers or a \Index{pointer} to an array of 5 integers? The fact this declaration is unclear to many C programmers means there are \Index{productivity} and \Index{safety} issues even for basic programs. Another example of confusion results from the fact that a routine name and its parameters are embedded within the return type, mimicking the way the return value is used at the routine's call site. For example, a routine returning a pointer to an array of integers is defined and used in the following way: For example, a routine returning a \Index{pointer} to an array of integers is defined and used in the following way: \begin{lstlisting} int (*f())[5] {...};                    §\C{}§ The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type. In the following example, \R{red} is for the base type and \B{blue} is for the qualifiers. The \CFA declarations move the qualifiers to the left of the base type, i.e., blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type. The \CFA declarations move the qualifiers to the left of the base type, i.e., move the blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type. \begin{quote2} \begin{tabular}{@{}l@{\hspace{3em}}l@{}} % Specifically, the character ©*© is used to indicate a pointer, square brackets ©[©\,©]© are used to represent an array or function return value, and parentheses ©()© are used to indicate a routine parameter. However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list. For instance, variables ©x© and ©y© of type pointer to integer are defined in \CFA as follows: For instance, variables ©x© and ©y© of type \Index{pointer} to integer are defined in \CFA as follows: \begin{quote2} \begin{tabular}{@{}l@{\hspace{3em}}l@{}} \end{tabular} \end{quote2} The downside of this semantics is the need to separate regular and pointer declarations: The downside of this semantics is the need to separate regular and \Index{pointer} declarations: \begin{quote2} \begin{tabular}{@{}l@{\hspace{3em}}l@{}} \end{tabular} \end{quote2} where the right example is how the compiler logically interpreters the variables in the left example. Since a variable name only points to one location during its lifetime, it is an \Index{immutable} pointer; where the right example is how the compiler logically interprets the variables in the left example. Since a variable name only points to one location during its lifetime, it is an \Index{immutable} \Index{pointer}; hence, variables ©x© and ©y© are constant pointers in the compiler interpretation. In general, variable addresses are stored in instructions instead of loaded independently, so an instruction fetch implicitly loads a variable's address. \end{tabular} \end{quote2} Finally, the immutable nature of a variable's address and the fact that there is no storage for a variable address means pointer assignment is impossible. Finally, the immutable nature of a variable's address and the fact that there is no storage for a variable address means pointer assignment\index{pointer!assignment}\index{assignment!pointer} is impossible. Therefore, the expression ©x = y© only has one meaning, ©*x = *y©, i.e., manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of instruction decoding. A pointer/reference is a generalization of a variable name, i.e., a mutable address that can point to more than one memory location during its lifetime. A \Index{pointer}/\Index{reference} is a generalization of a variable name, i.e., a mutable address that can point to more than one memory location during its lifetime. (Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime and may not occupy storage as the literal is embedded directly into instructions.) Hence, a pointer occupies memory to store its current address, and the pointer's value is loaded by dereferencing, e.g.: int x, y, ®*® p1, ®*® p2, ®**® p3; p1 = ®&®x;               // p1 points to x p2 = p1;                 // p2 also points to x p2 = p1;                 // p2 points to x p1 = ®&®y;               // p1 points to y p1 = p2 + 1;    // p1 points to y, pointer arithmetic p3 = &p2;               // p3 points to p2 \end{lstlisting} Notice, an address has a duality\index{address!duality}: a location in memory or the value at that location. In many cases, the compiler can infer the meaning: In many cases, a compiler might be able to infer the meaning: \begin{lstlisting} p2 = p1 + x;                                    §\C{// compiler infers *p2 = *p1 + x;}§ \end{lstlisting} because adding the arbitrary integer value in ©x© to the address of ©p1© and storing the resulting address into ©p2© is an unlikely operation. Algol68~\cite{Algol68} inferences pointer dereferencing to select the best meaning for each pointer usage. However, there are ambiguous cases, especially when \Index{pointer arithmetic} is possible, as in C: \Index*{Algol68}~\cite{Algol68} inferences pointer dereferencing to select the best meaning for each pointer usage. However, in C, the following cases are ambiguous, especially with pointer arithmetic: \begin{lstlisting} p1 = p2;                                                §\C{// p1 = p2\ \ or\ \ *p1 = *p2}§ \end{lstlisting} Most languages pick one meaning as the default and the programmer explicitly indicates the other meaning to resolve the address-duality ambiguity\index{address!duality ambiguity}. Most languages pick one meaning as the default and the programmer explicitly indicates the other meaning to resolve the address-duality ambiguity\index{address! ambiguity}. In C, the default meaning for pointers is to manipulate the pointer's address and the pointed-to value is explicitly accessed by the dereference operator ©*©. \begin{lstlisting} *p1 = *p1 + 1;                                  §\C{// pointed-to value assignment / operation}§ \end{lstlisting} which works well for situations where manipulation of addresses in the primary meaning and data is rarely accessed, such as storage management (©malloc©/©free©). which works well for situations where manipulation of addresses is the primary meaning and data is rarely accessed, such as storage management (©malloc©/©free©). However, in most other situations, the pointed-to value is requested more often than the pointer address. int x, y, ®&® r1, ®&® r2, ®&&® r3; ®&®r1 = &x;                                             §\C{// r1 points to x}§ ®&®r2 = &r1;                                    §\C{// r2 also points to x}§ ®&®r1 = &y;                                             §\C{// r2 also points to x}§ ®&®r1 = &r2 + 1;                                §\C{// r1 points to y, pointer arithmetic}§ ®&®r3 = ®&®&r2;                                 §\C{// r3 points to r2}§ ®&®r2 = &r1;                                    §\C{// r2 points to x}§ ®&®r1 = &y;                                             §\C{// r1 points to y}§ ®&&®r3 = ®&®&r2;                                §\C{// r3 points to r2}§ r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15); §\C{// implicit dereferencing}§ \end{lstlisting} Hence, assigning to a reference requires the address of the reference variable (\Index{lvalue}): \begin{lstlisting} (&®*®)r1 = &x;                                  §\C{// (\&*) cancel out giving variable r1 not the variable pointed-to by r1}§ (&®*®)r1 = &x;                                  §\C{// (\&*) cancel giving variable r1 not variable pointed-to by r1}§ \end{lstlisting} Similarly, the address of a reference can be obtained for assignment or computation (\Index{rvalue}): \begin{lstlisting} (&®*®)r3 = &(&®*®)r2;                   §\C{// (\&*) cancel out giving the address of variable r2}§ \end{lstlisting} \Index{Cancellation}\index{pointer!cancellation rule} works to arbitrary depth, and pointer and reference values are interchangeable because both contain addresses. (&(&®*®)®*®)r3 = &(&®*®)r2;             §\C{// (\&*) cancel giving address of r2, (\&(\&*)*) cancel giving variable r3}§ \end{lstlisting} Cancellation\index{cancellation!pointer/reference}\index{pointer!cancellation} works to arbitrary depth, and pointer and reference values are interchangeable because both contain addresses. \begin{lstlisting} int x, *p1 = &x, **p2 = &p1, ***p3 = &p2, &r1 = &x, &&r2 = &&r1, &&&r3 = &&r2; &r1 = x,    &&r2 = r1,   &&&r3 = r2; ***p3 = 3;                                              §\C{// change x}§ r3 = 3;                                                 §\C{// change x, ***r3}§ Finally, implicit dereferencing and cancellation are a static (compilation) phenomenon not a dynamic one. That is, all implicit dereferencing and any cancellation is carried out prior to the start of the program, so reference performance is equivalent to pointer performance. A programmer selects a pointer or reference type solely on whether the address is dereferences frequently or infrequently, which dictates the amount of direct aid from the compiler; A programmer selects a pointer or reference type solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of direct aid from the compiler; otherwise, everything else is equal. Interestingly, \CC deals with the address duality by making the pointed-to value the default, and prevent\-ing changes to the reference address, which eliminates half of the duality. Java deals with the address duality by making address assignment the default and providing a \Index{clone} mechanism to change the pointed-to value. Interestingly, \Index*[C++]{\CC} deals with the address duality by making the pointed-to value the default, and prevent\-ing changes to the reference address, which eliminates half of the duality. \Index*{Java} deals with the address duality by making address assignment the default and requiring field assignment (direct or indirect via methods), i.e., there is no builtin bit-wise or method-wise assignment, which eliminates half of the duality. As for a pointer, a reference may have qualifiers: \begin{lstlisting} const int cx = 5;                               §\C{// cannot change cx;}§ const int & r3 = &cx;                   §\C{// cannot change what r3 is pointing to}§ ®&®r3 = &cx;                                    §\C{// can change r3}§ r3 = 7;                                                 §\C{// error, cannot change cx}§ int & const r4 = &x;                    §\C{// must be initialized, \CC reference}§ ®&®r4 = &x;                                             §\C{// error, cannot change r4}§ const int & const r5 = &cx;             §\C{// must be initialized, \CC reference}§ r5 = 7;                                                 §\C{// error, cannot change cx}§ ®&®r5 = &cx;                                    §\C{// error, cannot change r5}§ \end{lstlisting} Hence, for type ©& const©, there is no pointer assignment, so ©&r4 = &x© is disallowed, and \emph{the address value cannot be ©0©}. In effect, the compiler is managing the addresses fpr type ©& const© not the programmer. \Index{Initialization} is different than \Index{assignment} because initialization occurs on the empty (uninitialized) storage on an object, while assignment occurs on possibly initialized storage for an object. const int & cr = cx;                    §\C{// cannot change what cr points to}§ ®&®cr = &cx;                                    §\C{// can change cr}§ cr = 7;                                                 §\C{// error, cannot change cx}§ int & const rc = x;                             §\C{// must be initialized, \CC reference}§ ®&®rc = &x;                                             §\C{// error, cannot change rc}§ const int & const crc = cx;             §\C{// must be initialized, \CC reference}§ crc = 7;                                                §\C{// error, cannot change cx}§ ®&®crc = &cx;                                   §\C{// error, cannot change crc}§ \end{lstlisting} Hence, for type ©& const©, there is no pointer assignment, so ©&rc = &x© is disallowed, and \emph{the address value cannot be ©0© unless an arbitrary pointer is assigned to the reference}. In effect, the compiler is managing the addresses for type ©& const© not the programmer, and by a programming discipline of only using references with references, address errors can be prevented. \Index{Initialization} is different than \Index{assignment} because initialization occurs on the empty (uninitialized) storage on an object, while assignment occurs on possibly initialized storage of an object. There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, return/temporary binding. For reference initialization (like pointer), the initializing value must be an address (lvalue) not a value (rvalue). For reference initialization (like pointer), the initializing value must be an address (\Index{lvalue}) not a value (\Index{rvalue}). \begin{lstlisting} int * p = &x;                                   §\C{// both \&x and x are possible interpretations}§ int & r = x;                                    §\C{// x unlikely interpretation, because of auto-dereferencing}§ \end{lstlisting} Hence, the compiler implicitly inserts a reference operator, ©&©, before the initialization expression: Hence, the compiler implicitly inserts a reference operator, ©&©, before the initialization expression. Similarly, when a reference is used for a parameter/return type, the call-site argument does not require a reference operator. \begin{lstlisting} int & f( int & ri );                    §\C{// reference parameter and return}§ z = f( x ) + f( y );                    §\C{// reference operator added not required}§ \end{lstlisting} Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©ri© can be locally reassigned within ©f©. int & f( int & rp );                    §\C{// reference parameter and return}§ z = f( x ) + f( y );                    §\C{// reference operator added}§ \end{lstlisting} Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©rp© can be locally reassigned within ©f©. The return reference from ©f© is copied into a compiler generated temporary, which is treated as an initialization. When a pointer/reference parameter has a ©const© value (immutable), it is possible to pass literals and expressions. \begin{lstlisting} void f( ®const® int & cri ); void g( ®const® int * cri ); void f( ®const® int & crp ); void g( ®const® int * cpp ); f( 3 );                   g( &3 ); f( x + y );             g( &(x + y) ); \end{lstlisting} Here, the compiler passes the address to the literal 3 or the temporary for the expression ©x + y©, knowing the argument cannot be changed through the parameter. (The ©&© is necessary for the pointer parameter to make the types match, and is common requirement for a C programmer.) \CFA extends this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed to by the reference parameter and can be changed. \begin{lstlisting} void f( int & cri ); void g( int * cri ); (The ©&© is necessary for the pointer parameter to make the types match, and is a common requirement for a C programmer.) \CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed. \begin{lstlisting} void f( int & rp ); void g( int * pp ); f( 3 );                   g( &3 );              §\C{// compiler implicit generates temporaries}§ f( x + y );             g( &(x + y) );  §\C{// compiler implicit generates temporaries}§ \end{lstlisting} Essentially, there is an implicit rvalue to lvalue conversion in this case.\footnote{ This conversion attempts to address the \newterm{const Hell} problem, when the innocent addition of a ©const© qualifier causes a cascade of type failures, requiring an unknown number of additional ©const© qualifiers, until it is discovered a ©const© qualifier cannot be added and all the ©const© qualifiers must be removed.} The implicit conversion allows seamless calls to any routine without having to explicit name/copy the literal/expression to allow the call. While \CFA attempts to handle pointers and references in a uniform, symmetric manner, C handles routine pointers in an inconsistent way: a routine pointer is both a reference and a pointer (particle and wave). Essentially, there is an implicit \Index{rvalue} to \Index{lvalue} conversion in this case.\footnote{ This conversion attempts to address the \newterm{const hell} problem, when the innocent addition of a ©const© qualifier causes a cascade of type failures, requiring an unknown number of additional ©const© qualifiers, until it is discovered a ©const© qualifier cannot be added and all the ©const© qualifiers must be removed.} The implicit conversion allows seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call. While \CFA attempts to handle pointers and references in a uniform, symmetric manner, C handles routine variables in an inconsistent way: a routine variable is both a pointer and a reference (particle and wave). \begin{lstlisting} void f( int p ) {...} const void (&fp)( int ) = f; fp( 3 ); fp = ...                                                §\C{// change code not allowed}§ &fp = ...;                                              §\C{// change routine reference}§ fp = ...                                                §\C{// error, cannot change code}§ &fp = ...;                                              §\C{// changing routine reference}§ \end{lstlisting} because the value of the routine variable is a routine literal, i.e., the routine code is normally immutable during execution.\footnote{ Dynamic code rewriting is possible but only in special circumstances.} \CFA allows this additional use of references for routine variables in an attempt to give a more consistent meaning for routine variables. \CFA allows this additional use of references for routine variables in an attempt to give a more consistent meaning for them. In detail, the brackets, ©[]©, enclose the result type, where each return value is named and that name is a local variable of the particular return type.\footnote{ Michael Tiemann, with help from Doug Lea, provided named return values in g++, circa 1989.} \Index*{Michael Tiemann}, with help from \Index*{Doug Lea}, provided named return values in g++, circa 1989.} The value of each local return variable is automatically returned at routine termination. Declaration qualifiers can only appear at the start of a routine definition, e.g.: Given the \CFA restrictions above, both named and default arguments are backwards compatible. \CC only supports default arguments; Ada supports both named and default arguments. \Index*[C++]{\CC} only supports default arguments; \Index*{Ada} supports both named and default arguments. \end{figure} In the left example in C, types ©C©, ©U© and ©T© are implicitly hoisted outside of type ©S© into the containing block scope. In the right example in \CFA, the types are not hoisted and accessed using the field-selection operator ©.©'' for type qualification, as does Java, rather than the \CC type-selection operator ©::©''. In the right example in \CFA, the types are not hoisted and accessed using the field-selection operator ©.©'' for type qualification, as does \Index*{Java}, rather than the \CC type-selection operator ©::©''. Nested routines are not first-class, meaning a nested routine cannot be returned if it has references to variables in its enclosing blocks; the only exception is references to the external block of the translation unit, as these variables persist for the duration of the program. The following program in undefined in \CFA (and ©gcc©\index{gcc}) The following program in undefined in \CFA (and Indexc{gcc}) \begin{lstlisting} [* [int]( int )] foo() {                §\C{// int (*foo())( int )}§ [ §\emph{lvalue}§, ..., §\emph{lvalue}§ ] = §\emph{expr}§; \end{lstlisting} \index{lvalue} The left-hand side is a tuple of \emph{lvalues}, which is a list of expressions each yielding an address, i.e., any data object that can appear on the left-hand side of a conventional assignment statement. ©$\emph{expr}$© is any standard arithmetic expression. [ §\emph{lvalue}§, . . ., §\emph{lvalue}§ ] = [ §\emph{expr}§, . . ., §\emph{expr}§ ]; \end{lstlisting} \index{lvalue} The left-hand side is a tuple of \emph{lvalues}, and the right-hand side is a tuple of \emph{expr}s. Each \emph{expr} appearing on the righthand side of a multiple assignment statement is assigned to the corresponding \emph{lvalues} on the left-hand side of the statement using parallel semantics for each assignment. Often transferring into a block can bypass variable declaration and/or its initialization, which results in subsequent errors. There are virtually no positive arguments for this kind of control flow, and therefore, there is a strong impetus to eliminate it. Nevertheless, C does have an idiom where this capability is used, known as Duff's device''~\cite{Duff83}: Nevertheless, C does have an idiom where this capability is used, known as \Index*{Duff's device}''~\cite{Duff83}: \begin{lstlisting} register int n = (count + 7) / 8; the number of ©switch© statements is small, \item most ©switch© statements are well formed (i.e., no Duff's device), most ©switch© statements are well formed (i.e., no \Index*{Duff's device}), \item the ©default© clause is usually written as the last case-clause, The ability to fall-through is retained because it is a sufficient C-idiom that most C programmers simply expect it, and its absence might discourage these programmers from using the ©choose© statement. \item Eliminating Duff's device is straightforward and only invalidates a small amount of very questionable code. Eliminating \Index*{Duff's device} is straightforward and only invalidates a small amount of very questionable code. The solution is to allow ©case© clauses to only appear at the same nesting level as the ©switch© body, as is done in most other programming languages with ©switch© statements. \item \CFA supports C initialization of structures, but it also adds constructors for more advanced initialization. Additionally, \CFA adds destructors that are called when a variable is de-allocated (variable goes out of scope or object is deleted). These functions take a reference to the structure as a parameter (see References for more information). These functions take a reference to the structure as a parameter (see References for more information). \begin{figure} \section{References} \CFA supports reference types similar to rvalue references in \CC. A reference is essentially an alias for a variable, allowing multiple names to refer to the same object. A reference can be used as a safer alternative to a pointer, because it can be used to pass a variable by reference, but it must always reference an object. It cannot be NULL, it must be assigned a value at initialization, and the object it references cannot change once it is set. By introducing references in parameter types, users are given an easy way to pass a value by reference, without the need for NULL pointer checks. In structures, a reference can replace a pointer to an object that should always have a valid value. When initializing a reference, \CFA uses a different syntax which differentiates reference initialization from assignment to a reference. The ©&© is used on both sides of the expression to clarify that the address of the reference is being set to the address of the variable to which it refers. \begin{figure} \begin{lstlisting} // parameter p is a reference to a Point void movePointUp(Point &p) { p.y += 1.0; // Uses dot-notation to access fields } Point p1 = ...; ColoredPoint cp1 = ...; movePoint(p1); // reference to p1 passed to movePoint movePoint(cp1); // reference to cp1 passed to movePoint // a ListElement cannot be created without a valid list struct ListElement { int element; List &list; // a list element has a reference to the list } // The constructor must initialize the reference void ?{}(ListElement &le, int e, List &l) { le.element = e; &le.list = &l; // initialize the reference } ListElement e1{88, numberList}; // uses constructor ListElement e2; // compiler error: uninitialized reference Listing 10: References \end{lstlisting} \end{figure} \end{comment} The operations ©&&©, ©||©, and ©!© can be applied to any scalar arguments and are defined in terms of comparison against 0 (ex. ©(a && b)© becomes ©(a != 0 && b != 0)©). In C, the integer constants 0 and 1 suffice because the integer promotion rules can convert them to any arithmetic type, and the rules for pointer expressions treat constant expressions evaluating to 0 as a special case. However, user-defined arithmetic types often need the equivalent of a 1 or 0 for their functions or operators, polymorphic functions often need 0 and 1 constants of a type matching their polymorphic parameters, and user-defined pointer-like types may need a null value. Defining special constants for a user-defined type is more efficient than defining a conversion to the type from ©_Bool©. In C, the integer constants 0 and 1 suffice because the integer promotion rules can convert them to any arithmetic type, and the rules for pointer expressions treat constant expressions evaluating to 0 as a special case. However, user-defined arithmetic types often need the equivalent of a 1 or 0 for their functions or operators, polymorphic functions often need 0 and 1 constants of a type matching their polymorphic parameters, and user-defined pointer-like types may need a null value. Defining special constants for a user-defined type is more efficient than defining a conversion to the type from ©_Bool©. Why just 0 and 1? Why not other integers? No other integers have special status in C. \begin{quote2} \begin{tabular}{@{}l@{\hspace{3em}}ll@{}} \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CC}} & \multicolumn{1}{c}{©gcc©}\index{gcc} \\ \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CC}} & \multicolumn{1}{c}{Indexc{gcc}} \\ \begin{lstlisting} In \CFA, ©typedef© provides a mechanism to alias long type names with short ones, both globally and locally, but not eliminate the use of the short name. ©gcc© provides ©typeof© to declare a secondary variable from a primary variable. \Indexc{gcc} provides ©typeof© to declare a secondary variable from a primary variable. \CFA also relies heavily on the specification of the left-hand side of assignment for type inferencing, so in many cases it is crucial to specify the type of the left-hand side to select the correct type of the right-hand expression. Only for overloaded routines with the same return type is variable type-inferencing possible. Generics allow programmers to use type variables in place of concrete types so that the code can be reused with multiple types. The type parameters can be restricted to satisfy a set of constraints. This enables \CFA to build fully compiled generic functions and types, unlike other languages like \CC where templates are expanded or must be explicitly instantiated. This enables \CFA to build fully compiled generic functions and types, unlike other languages like \Index*[C++]{\CC} where templates are expanded or must be explicitly instantiated. \subsection{Generic Functions} Generic functions in \CFA are similar to template functions in \CC, and will sometimes be expanded into specialized versions, just like in \CC. Generic functions in \CFA are similar to template functions in \Index*[C++]{\CC}, and will sometimes be expanded into specialized versions, just like in \CC. The difference, however, is that generic functions in \CFA can also be separately compiled, using function pointers for callers to pass in all needed functionality for the given type. This means that compiled libraries can contain generic functions that can be used by programs linked with them (statically or dynamically). Generic types are defined using the same mechanisms as those described above for generic functions. This feature allows users to create types that have one or more fields that use generic parameters as types, similar to a template classes in \CC. This feature allows users to create types that have one or more fields that use generic parameters as types, similar to a template classes in \Index*[C++]{\CC}. For example, to make a generic linked list, a placeholder is created for the type of the elements, so that the specific type of the elements in the list need not be specified when defining the list. In C, something like this would have to be done using void pointers and unsafe casting. Throwing an exception terminates execution of the current block, invokes the destructors of variables that are local to the block, and propagates the exception to the parent block. The exception is immediately re-thrown from the parent block unless it is caught as described below. \CFA uses keywords similar to \CC for exception handling. \CFA uses keywords similar to \Index*[C++]{\CC} for exception handling. An exception is thrown using a throw statement, which accepts one argument. \end{lstlisting} While this sytact is awkward, it is unlikely many programers will name fields of a structure 0 or 1. Like the \CC lexical problem with closing template-syntax, e.g, ©Foo>®©, this issue can be solved with a more powerful lexer/parser. Like the \Index*[C++]{\CC} lexical problem with closing template-syntax, e.g, ©Foo>®©, this issue can be solved with a more powerful lexer/parser. There are several ambiguous cases with operator identifiers, e.g., ©int *?*?()©, where the string ©*?*?© can be lexed as ©*©/©?*?© or ©*?©/©*?©. This enables a very familiar interface to all programmers, even those with no parallel programming experience. It also allows the compiler to do static type checking of all communication, a very important safety feature. This controlled communication with type safety has some similarities with channels in Go, and can actually implement This controlled communication with type safety has some similarities with channels in \Index*{Go}, and can actually implement channels exactly, as well as create additional communication patterns that channels cannot. Mutex objects, monitors, are used to contain mutual exclusion within an object and synchronization across concurrent threads. There are two levels of encapsulation in \CFA, module and package. A module is a logical grouping of functionality that can be easily pulled into another project, much like a module in Python or a package in Go. A module is a logical grouping of functionality that can be easily pulled into another project, much like a module in \Index*{Python} or a package in \Index*{Go}. A module forms a namespace to limit the visibility and prevent naming conflicts of variables. Furthermore, a module is an independent translation unit, which can be compiled separately to accelerate the compilation speed. A package is a physical grouping of one or more modules that is used for code distribution and version management. Package is also the level of granularity at which dependences are managed. A package is similar to the Crate in Rust. A package is similar to the Crate in \Index*{Rust}. \subsection{No Declarations, No Header Files} In C and \CC, it is necessary to declare or define every global variable, global function, and type before it is used in each file. In C and \Index*[C++]{\CC}, it is necessary to declare or define every global variable, global function, and type before it is used in each file. Header files and a preprocessor are normally used to avoid repeating code. Thus, many variables, functions, and types are described twice, which exposes an opportunity for errors and causes additional maintenance work. This information is then stored in the object files for each module, in a format that can quickly be read by the compiler, and stored at the top of the file, for quick access. In addition to the user productivity improvements, this simple change also improves compile time, by saving the information in a simple machine readable format, instead of making the compiler parse the same information over and over from a header file. This seems like a minor change, but according to (Pike, Go at Google: Language Design in the Service of Software Engineering), this simple change can cause massive reductions in compile time. This seems like a minor change, but according to (Pike, \Index*{Go} at Google: Language Design in the Service of Software Engineering), this simple change can cause massive reductions in compile time. In \CFA, multiple definitions are not necessary. In developing \CFA, many other languages were consulted for ideas, constructs, and syntax. Therefore, it is important to show how these languages each compare with Do. In this section, \CFA is compared with what the writers of this document consider to be the closest competitors of Do: \CC, Go, Rust, and D. In this section, \CFA is compared with what the writers of this document consider to be the closest competitors of Do: \Index*[C++]{\CC}, \Index*{Go}, \Index*{Rust}, and \Index*{D}. {% local change to lstlising to reduce font size \lstset{basicstyle=\linespread{0.9}\sf\relsize{-2}} \subsubsection{Constructors and Destructors} \lstset{basicstyle=\sf\relsize{-2}} \begin{flushleft} \end{flushleft} \lstset{basicstyle=\sf\relsize{-1}} }% local change to lstlising to reduce font size \subsubsection[C++]{\CC} \CC is a general-purpose programming language. \Index*[C++]{\CC} is a general-purpose programming language. It has imperative, object-oriented and generic programming features, while also providing facilities for low-level memory manipulation. (Wikipedia) Classes in \CC also enable inheritance among types. Instead of inheritance, \CFA embraces composition and interfaces to achieve the same goals with more flexibility. There are many studies and articles comparing inheritance and composition (or is-a versus has-a relationships), so we will not go into more detail here (Venners, 1998) (Pike, Go at Google: Language Design in the Service of Software Engineering , 2012). There are many studies and articles comparing inheritance and composition (or is-a versus has-a relationships), so we will not go into more detail here (Venners, 1998) (Pike, \Index*{Go} at Google: Language Design in the Service of Software Engineering , 2012). Overloading in \CFA is very similar to overloading in \CC, with the exception of the additional use, in \CFA, of the return type to differentiate between overloaded functions. \subsubsection{Go} Go, also commonly referred to as golang, is a programming language developed at Google in 2007 [.]. \Index*{Go}, also commonly referred to as golang, is a programming language developed at Google in 2007 [.]. It is a statically typed language with syntax loosely derived from that of C, adding garbage collection, type safety, some structural typing capabilities, additional built-in types such as variable-length arrays and key-value maps, and a large standard library. (Wikipedia) \subsubsection{Rust} Rust is a general-purpose, multi-paradigm, compiled programming language developed by Mozilla Research. \Index*{Rust} is a general-purpose, multi-paradigm, compiled programming language developed by Mozilla Research. It is designed to be a "safe, concurrent, practical language", supporting pure-functional, concurrent-actor[dubious . discuss][citation needed], imperative-procedural, and object-oriented styles. \subsubsection{D} The D programming language is an object-oriented, imperative, multi-paradigm system programming The \Index*{D} programming language is an object-oriented, imperative, multi-paradigm system programming language created by Walter Bright of Digital Mars and released in 2001. [.] Though it originated as a re-engineering of \CC, D is a distinct language, having redesigned some core \CC features while also taking inspiration from other languages, notably Java, Python, Ruby, C\#, and Eiffel. Though it originated as a re-engineering of \CC, D is a distinct language, having redesigned some core \CC features while also taking inspiration from other languages, notably \Index*{Java}, \Index*{Python}, Ruby, C\#, and Eiffel. D and \CFA both start with C and add productivity features. \item[How widely used:] C programs rarely have nest types because they are equivalent to the hoisted version. \CFA is C \emph{incompatible} on this issue, and provides semantics similar to \CC. \CFA is C \emph{incompatible} on this issue, and provides semantics similar to \Index*[C++]{\CC}. Nested types are not hoisted and can be referenced using the field selection operator ©.©'', unlike the \CC scope-resolution operator ©::©''. Given that nested types in C are equivalent to not using them, i.e., they are essentially useless, it is unlikely there are any realistic usages that break because of this incompatibility. \index{input/output library} The goal for \CFA I/O is to make I/O as simple as possible for the general case, while fully supporting polmorphism and user defined types in a consistent way. The general case is printing out a sequence of variables separated by whitespace. The goal for the \CFA I/O is to make I/O as simple as possible in the common cases, while fully supporting polmorphism and user defined types in a consistent way. The common case is printing out a sequence of variables separated by whitespace. \begin{quote2} \begin{tabular}{@{}l@{\hspace{3em}}l@{}} \end{tabular} \end{quote2} The \CFA form is half as many characters, and is similar to \Index{Python} I/O with respect to implicit separators. The \CFA form has half as many characters as the \CC form, and is similar to \Index*{Python} I/O with respect to implicit separators. The logical-or operator is used because it is the lowest-priority overloadable operator, other than assignment. \subsection{malloc} \begin{lstlisting} \begin{lstlisting}[aboveskip=0pt,belowskip=0pt] forall( otype T ) T * malloc( void );§\indexc{malloc}§ forall( otype T ) T * malloc( char fill ); forall( otype T ) T * memset( T * ptr );                                // remove when default value available \end{lstlisting} \ \subsection{ato / strto} \begin{lstlisting} \begin{lstlisting}[aboveskip=0pt,belowskip=0pt] int ato( const char * ptr );§\indexc{ato}§ unsigned int ato( const char * ptr ); long double _Complex strto( const char * sptr, char ** eptr ); \end{lstlisting} \ \subsection{bsearch / qsort} \begin{lstlisting} \begin{lstlisting}[aboveskip=0pt,belowskip=0pt] forall( otype T | { int ?