Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Paper.tex

    r43bbdf3 r5ff188f  
    3939\newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
    4040%\newcommand{\TODO}[1]{} % TODO elided
    41 
    4241% Default underscore is too low and wide. Cannot use lstlisting "literate" as replacing underscore
    43 % removes it as a variable-name character so keywords in variables are highlighted. MUST APPEAR
    44 % AFTER HYPERREF.
    45 %\DeclareTextCommandDefault{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.1ex}}}
    46 \renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}}
     42% removes it as a variable-name character so keyworks in variables are highlighted
     43\DeclareTextCommandDefault{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.1ex}}}
    4744
    4845\makeatletter
     
    5249\setlength{\parindentlnth}{\parindent}
    5350
    54 \newcommand{\LstKeywordStyle}[1]{{\lst@basicstyle{\lst@keywordstyle{#1}}}}
    55 \newcommand{\LstCommentStyle}[1]{{\lst@basicstyle{\lst@commentstyle{#1}}}}
    56 
    5751\newlength{\gcolumnposn}                                % temporary hack because lstlisting does not handle tabs correctly
    5852\newlength{\columnposn}
    5953\setlength{\gcolumnposn}{2.75in}
    6054\setlength{\columnposn}{\gcolumnposn}
    61 \newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@basicstyle{\LstCommentStyle{#2}}}}
     55\newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@commentstyle{#2}}}
    6256\newcommand{\CRT}{\global\columnposn=\gcolumnposn}
    63 
    64 % Denote newterms in particular font and index them without particular font and in lowercase, e.g., \newterm{abc}.
    65 % The option parameter provides an index term different from the new term, e.g., \newterm[\texttt{abc}]{abc}
    66 % The star version does not lowercase the index information, e.g., \newterm*{IBM}.
    67 \newcommand{\newtermFontInline}{\emph}
    68 \newcommand{\newterm}{\@ifstar\@snewterm\@newterm}
    69 \newcommand{\@newterm}[2][\@empty]{\lowercase{\def\temp{#2}}{\newtermFontInline{#2}}\ifx#1\@empty\index{\temp}\else\index{#1@{\protect#2}}\fi}
    70 \newcommand{\@snewterm}[2][\@empty]{{\newtermFontInline{#2}}\ifx#1\@empty\index{#2}\else\index{#1@{\protect#2}}\fi}
    7157
    7258% Latin abbreviation
    7359\newcommand{\abbrevFont}{\textit}       % set empty for no italics
    74 \newcommand{\EG}{\abbrevFont{e}.\abbrevFont{g}.}
    7560\newcommand*{\eg}{%
    76         \@ifnextchar{,}{\EG}%
    77                 {\@ifnextchar{:}{\EG}%
    78                         {\EG,\xspace}}%
     61        \@ifnextchar{,}{\abbrevFont{e}.\abbrevFont{g}.}%
     62                {\@ifnextchar{:}{\abbrevFont{e}.\abbrevFont{g}.}%
     63                        {\abbrevFont{e}.\abbrevFont{g}.,\xspace}}%
    7964}%
    80 \newcommand{\IE}{\abbrevFont{i}.\abbrevFont{e}.}
    8165\newcommand*{\ie}{%
    82         \@ifnextchar{,}{\IE}%
    83                 {\@ifnextchar{:}{\IE}%
    84                         {\IE,\xspace}}%
     66        \@ifnextchar{,}{\abbrevFont{i}.\abbrevFont{e}.}%
     67                {\@ifnextchar{:}{\abbrevFont{i}.\abbrevFont{e}.}%
     68                        {\abbrevFont{i}.\abbrevFont{e}.,\xspace}}%
    8569}%
    86 \newcommand{\ETC}{\abbrevFont{etc}}
    8770\newcommand*{\etc}{%
    88         \@ifnextchar{.}{\ETC}%
    89         {\ETC\xspace}%
     71        \@ifnextchar{.}{\abbrevFont{etc}}%
     72        {\abbrevFont{etc}.\xspace}%
    9073}%
    91 \newcommand{\ETAL}{\abbrevFont{et}\hspace{2pt}\abbrevFont{al}}
    92 \newcommand*{\etal}{%
    93         \@ifnextchar{.}{\protect\ETAL}%
    94                 {\abbrevFont{\protect\ETAL}.\xspace}%
    95 }%
    96 \newcommand{\VIZ}{\abbrevFont{viz}}
    97 \newcommand*{\viz}{%
    98         \@ifnextchar{.}{\VIZ}%
    99                 {\abbrevFont{\VIZ}.\xspace}%
     74\newcommand{\etal}{%
     75        \@ifnextchar{.}{\abbrevFont{et~al}}%
     76                {\abbrevFont{et al}.\xspace}%
    10077}%
    10178\makeatother
     
    10380% CFA programming language, based on ANSI C (with some gcc additions)
    10481\lstdefinelanguage{CFA}[ANSI]{C}{
    105         morekeywords={
    106                 _Alignas, _Alignof, __alignof, __alignof__, asm, __asm, __asm__, _At, __attribute,
    107                 __attribute__, auto, _Bool, catch, catchResume, choose, _Complex, __complex, __complex__,
    108                 __const, __const__, disable, dtype, enable, __extension__, fallthrough, fallthru,
    109                 finally, forall, ftype, _Generic, _Imaginary, inline, __label__, lvalue, _Noreturn, one_t,
    110                 otype, restrict, _Static_assert, throw, throwResume, trait, try, ttype, typeof, __typeof,
    111                 __typeof__, virtual, with, zero_t},
    112         moredirectives={defined,include_next}%
     82        morekeywords={_Alignas,_Alignof,__alignof,__alignof__,asm,__asm,__asm__,_At,_Atomic,__attribute,__attribute__,auto,
     83                _Bool,catch,catchResume,choose,_Complex,__complex,__complex__,__const,__const__,disable,dtype,enable,__extension__,
     84                fallthrough,fallthru,finally,forall,ftype,_Generic,_Imaginary,inline,__label__,lvalue,_Noreturn,one_t,otype,restrict,_Static_assert,
     85                _Thread_local,throw,throwResume,trait,try,ttype,typeof,__typeof,__typeof__,zero_t},
    11386}%
    11487
     
    11891basicstyle=\linespread{0.9}\sf,                                                 % reduce line spacing and use sanserif font
    11992stringstyle=\tt,                                                                                % use typewriter font
    120 tabsize=5,                                                                                              % N space tabbing
     93tabsize=4,                                                                                              % 4 space tabbing
    12194xleftmargin=\parindentlnth,                                                             % indent code to paragraph indentation
    12295%mathescape=true,                                                                               % LaTeX math escape in CFA code $...$
     
    128101belowskip=3pt,
    129102% replace/adjust listing characters that look bad in sanserif
    130 literate={-}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.8ex}{0.1ex}}}}1 {^}{\raisebox{0.6ex}{$\scriptscriptstyle\land\,$}}1
     103literate={-}{\makebox[1.4ex][c]{\raisebox{0.5ex}{\rule{1.2ex}{0.06ex}}}}1 {^}{\raisebox{0.6ex}{$\scriptscriptstyle\land\,$}}1
    131104        {~}{\raisebox{0.3ex}{$\scriptstyle\sim\,$}}1 % {`}{\ttfamily\upshape\hspace*{-0.1ex}`}1
    132         {<-}{$\leftarrow$}2 {=>}{$\Rightarrow$}2 {->}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.8ex}{0.075ex}}}\kern-0.2ex\textgreater}2,
     105        {<-}{$\leftarrow$}2 {=>}{$\Rightarrow$}2 {->}{\makebox[1.4ex][c]{\raisebox{0.5ex}{\rule{1.2ex}{0.06ex}}}\kern-0.3ex\textgreater}2,
    133106moredelim=**[is][\color{red}]{`}{`},
    134107}% lstset
     
    136109% inline code @...@
    137110\lstMakeShortInline@%
    138 
    139 \lstnewenvironment{cfa}[1][]
    140 {\lstset{#1}}
    141 {}
    142 \lstnewenvironment{C++}[1][]                            % use C++ style
    143 {\lstset{language=C++,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}}
    144 {}
    145 
    146111
    147112\title{Generic and Tuple Types with Efficient Dynamic Layout in \protect\CFA}
     
    183148The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects.
    184149This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
    185 The TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \Csharp 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail.
     150The \cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \Csharp 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail.
    186151The top 3 rankings over the past 30 years are:
    187152\lstDeleteShortInline@%
     
    220185\label{sec:poly-fns}
    221186
    222 \CFA{}\hspace{1pt}'s polymorphism was originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
     187\CFA{}\hspace{1pt}'s polymorphism was originally formalized by \cite{Ditchfield92}, and first implemented by \cite{Bilson03}.
    223188The signature feature of \CFA is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96} with functions generalized using a @forall@ clause (giving the language its name):
    224189\begin{lstlisting}
     
    293258}
    294259\end{lstlisting}
    295 Within the block, the nested version of @?<?@ performs @?>?@ and this local version overrides the built-in @?<?@ so it is passed to @qsort@.
     260Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@.
    296261Hence, programmers can easily form local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types.
    297262
     
    579544p`->0` = 5;                                                                     $\C{// change quotient}$
    580545bar( qr`.1`, qr );                                                      $\C{// pass remainder and quotient/remainder}$
    581 rem = [div( 13, 5 ), 42]`.0.1`;                         $\C{// access 2nd component of 1st component of tuple expression}$
     546rem = [42, div( 13, 5 )]`.0.1`;                         $\C{// access 2nd component of 1st component of tuple expression}$
    582547\end{lstlisting}
    583548
     
    765730\end{lstlisting}
    766731Hence, function parameter and return lists are flattened for the purposes of type unification allowing the example to pass expression resolution.
    767 This relaxation is possible by extending the thunk scheme described by Bilson~\cite{Bilson03}.
     732This relaxation is possible by extending the thunk scheme described by~\cite{Bilson03}.
    768733Whenever a candidate's parameter structure does not exactly match the formal parameter's structure, a thunk is generated to specialize calls to the actual function:
    769734\begin{lstlisting}
     
    937902
    938903
    939 \section{Control Structures}
    940 
    941 
    942 \subsection{\texorpdfstring{Labelled \LstKeywordStyle{continue} / \LstKeywordStyle{break}}{Labelled continue / break}}
    943 
    944 While C provides @continue@ and @break@ statements for altering control flow, both are restricted to one level of nesting for a particular control structure.
    945 Unfortunately, this restriction forces programmers to use @goto@ to achieve the equivalent control-flow for more than one level of nesting.
    946 To prevent having to switch to the @goto@, \CFA extends the @continue@ and @break@ with a target label to support static multi-level exit~\cite{Buhr85}, as in Java.
    947 For both @continue@ and @break@, the target label must be directly associated with a @for@, @while@ or @do@ statement;
    948 for @break@, the target label can also be associated with a @switch@, @if@ or compound (@{}@) statement.
    949 Figure~\ref{f:MultiLevelExit} shows @continue@ and @break@ indicating the specific control structure, and the corresponding C program using only @goto@ and labels.
    950 The innermost loop has 7 exit points, which cause continuation or termination of one or more of the 7 nested control-structures.
    951 
    952 \begin{figure}
    953 \lstDeleteShortInline@%
    954 \begin{tabular}{@{\hspace{\parindentlnth}}l@{\hspace{\parindentlnth}}l@{\hspace{\parindentlnth}}l@{}}
    955 \multicolumn{1}{@{\hspace{\parindentlnth}}c@{\hspace{\parindentlnth}}}{\textbf{\CFA}}   & \multicolumn{1}{@{\hspace{\parindentlnth}}c}{\textbf{C}}      \\
    956 \begin{cfa}
    957 `LC:` {
    958         ... $declarations$ ...
    959         `LS:` switch ( ... ) {
    960           case 3:
    961                 `LIF:` if ( ... ) {
    962                         `LF:` for ( ... ) {
    963                                 `LW:` while ( ... ) {
    964                                         ... break `LC`; ...
    965                                         ... break `LS`; ...
    966                                         ... break `LIF`; ...
    967                                         ... continue `LF;` ...
    968                                         ... break `LF`; ...
    969                                         ... continue `LW`; ...
    970                                         ... break `LW`; ...
    971                                 } // while
    972                         } // for
    973                 } else {
    974                         ... break `LIF`; ...
    975                 } // if
    976         } // switch
    977 } // compound
    978 \end{cfa}
    979 &
    980 \begin{cfa}
    981 {
    982         ... $declarations$ ...
    983         switch ( ... ) {
    984           case 3:
    985                 if ( ... ) {
    986                         for ( ... ) {
    987                                 while ( ... ) {
    988                                         ... goto `LC`; ...
    989                                         ... goto `LS`; ...
    990                                         ... goto `LIF`; ...
    991                                         ... goto `LFC`; ...
    992                                         ... goto `LFB`; ...
    993                                         ... goto `LWC`; ...
    994                                         ... goto `LWB`; ...
    995                                   `LWC`: ; } `LWB:` ;
    996                           `LFC:` ; } `LFB:` ;
    997                 } else {
    998                         ... goto `LIF`; ...
    999                 } `L3:` ;
    1000         } `LS:` ;
    1001 } `LC:` ;
    1002 \end{cfa}
    1003 &
    1004 \begin{cfa}
    1005 
    1006 
    1007 
    1008 
    1009 
    1010 
    1011 
    1012 // terminate compound
    1013 // terminate switch
    1014 // terminate if
    1015 // continue loop
    1016 // terminate loop
    1017 // continue loop
    1018 // terminate loop
    1019 
    1020 
    1021 
    1022 // terminate if
    1023 
    1024 
    1025 
    1026 \end{cfa}
    1027 \end{tabular}
    1028 \lstMakeShortInline@%
    1029 \caption{Multi-level Exit}
    1030 \label{f:MultiLevelExit}
    1031 \end{figure}
    1032 
    1033 Both labelled @continue@ and @break@ are a @goto@ restricted in the following ways:
    1034 \begin{itemize}
    1035 \item
    1036 They cannot create a loop, which means only the looping constructs cause looping.
    1037 This restriction means all situations resulting in repeated execution are clearly delineated.
    1038 \item
    1039 They cannot branch into a control structure.
    1040 This restriction prevents missing declarations and/or initializations at the start of a control structure resulting in undefined behaviour.
    1041 \end{itemize}
    1042 The advantage of the labelled @continue@/@break@ is allowing static multi-level exits without having to use the @goto@ statement, and tying control flow to the target control structure rather than an arbitrary point in a program.
    1043 Furthermore, the location of the label at the \emph{beginning} of the target control structure informs the reader (eye candy) that complex control-flow is occurring in the body of the control structure.
    1044 With @goto@, the label is at the end of the control structure, which fails to convey this important clue early enough to the reader.
    1045 Finally, using an explicit target for the transfer instead of an implicit target allows new constructs to be added or removed without affecting existing constructs.
    1046 The implicit targets of the current @continue@ and @break@, \ie the closest enclosing loop or @switch@, change as certain constructs are added or removed.
    1047 
    1048 \TODO{choose and fallthrough here as well?}
    1049 
    1050 
    1051 \subsection{\texorpdfstring{\LstKeywordStyle{with} Clause / Statement}{with Clause / Statement}}
    1052 \label{s:WithClauseStatement}
    1053 
    1054 Grouping heterogenous data into \newterm{aggregate}s is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers:
    1055 \begin{cfa}
    1056 struct S {                                                              $\C{// aggregate}$
    1057         char c;                                                         $\C{// fields}$
    1058         int i;
    1059         double d;
    1060 };
    1061 S s, as[10];
    1062 \end{cfa}
    1063 However, routines manipulating aggregates have repeition of the aggregate name to access its containing fields:
    1064 \begin{cfa}
    1065 void f( S s ) {
    1066         `s.`c; `s.`i; `s.`d;                            $\C{// access containing fields}$
    1067 }
    1068 \end{cfa}
    1069 A similar situation occurs in object-oriented programming, \eg \CC:
    1070 \begin{C++}
    1071 class C {
    1072         char c;                                                         $\C{// fields}$
    1073         int i;
    1074         double d;
    1075         int mem() {                                                     $\C{// implicit "this" parameter}$
    1076                 `this->`c; `this->`i; `this->`d;$\C{// access containing fields}$
    1077         }
    1078 }
    1079 \end{C++}
    1080 Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of nested lexical-scoping.
    1081 
    1082 % In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
    1083 % In any programming language, some functions have a naturally close relationship with a particular data type.
    1084 % Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
    1085 % Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
    1086 % When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
    1087 %
    1088 % \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
    1089 
    1090 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided aggregate qualification to fields by opening a scope containing field identifiers.
    1091 Hence, the qualified fields become variables, and making it easier to optimizing field references in a block.
    1092 \begin{cfa}
    1093 void f( S s ) `with s` {                                $\C{// with clause}$
    1094         c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
    1095 }
    1096 \end{cfa}
    1097 and the equivalence for object-style programming is:
    1098 \begin{cfa}
    1099 int mem( S & this ) `with this` {               $\C{// with clause}$
    1100         c; i; d;                                                        $\C{\color{red}// this.c, this.i, this.d}$
    1101 }
    1102 \end{cfa}
    1103 The key generality over the object-oriented approach is that one aggregate parameter \lstinline[language=C++]@this@ is not treated specially over other aggregate parameters:
    1104 \begin{cfa}
    1105 struct T { double m, n; };
    1106 int mem( S & s, T & t ) `with s, t` {   $\C{// multiple aggregate parameters}$
    1107         c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
    1108         m; n;                                                           $\C{\color{red}// t.m, t.n}$
    1109 }
    1110 \end{cfa}
    1111 The equivalent object-oriented style is:
    1112 \begin{cfa}
    1113 int S::mem( T & t ) {                                   $\C{// multiple aggregate parameters}$
    1114         c; i; d;                                                        $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$
    1115         `t.`m; `t.`n;
    1116 }
    1117 \end{cfa}
    1118 
    1119 The statement form is used within a block:
    1120 \begin{cfa}
    1121 int foo() {
    1122         struct S1 { ... } s1;
    1123         struct S2 { ... } s2;
    1124         `with s1` {                                             $\C{// with statement}$
    1125                 // access fields of s1 without qualification
    1126                 `with s2` {                                     $\C{// nesting}$
    1127                         // access fields of s1 and s2 without qualification
    1128                 }
    1129         }
    1130         `with s1, s2` {
    1131                 // access unambiguous fields of s1 and s2 without qualification
    1132         }
    1133 }
    1134 \end{cfa}
    1135 
    1136 When opening multiple structures, fields with the same name and type are ambiguous and must be fully qualified.
    1137 For fields with the same name but different type, context/cast can be used to disambiguate.
    1138 \begin{cfa}
    1139 struct S { int i; int j; double m; } a, c;
    1140 struct T { int i; int k; int m } b, c;
    1141 `with a, b` {
    1142         j + k;                                                  $\C{// unambiguous, unique names define unique types}$
    1143         i;                                                              $\C{// ambiguous, same name and type}$
    1144         a.i + b.i;                                              $\C{// unambiguous, qualification defines unique names}$
    1145         m;                                                              $\C{// ambiguous, same name and no context to define unique type}$
    1146         m = 5.0;                                                $\C{// unambiguous, same name and context defines unique type}$
    1147         m = 1;                                                  $\C{// unambiguous, same name and context defines unique type}$
    1148 }
    1149 `with c` { ... }                                        $\C{// ambiguous, same name and no context}$
    1150 `with (S)c` { ... }                                     $\C{// unambiguous, same name and cast defines unique type}$
    1151 \end{cfa}
    1152 
    1153 The components in the "with" clause
    1154 
    1155   with a, b, c { ... }
    1156 
    1157 serve 2 purposes: each component provides a type and object. The type must be a
    1158 structure type. Enumerations are already opened, and I think a union is opened
    1159 to some extent, too. (Or is that just unnamed unions?) The object is the target
    1160 that the naked structure-fields apply to. The components are open in "parallel"
    1161 at the scope of the "with" clause/statement, so opening "a" does not affect
    1162 opening "b", etc. This semantic is different from Pascal, which nests the
    1163 openings.
    1164 
    1165 Having said the above, it seems reasonable to allow a "with" component to be an
    1166 expression. The type is the static expression-type and the object is the result
    1167 of the expression. Again, the type must be an aggregate. Expressions require
    1168 parenthesis around the components.
    1169 
    1170   with( a, b, c ) { ... }
    1171 
    1172 Does this now make sense?
    1173 
    1174 Having written more CFA code, it is becoming clear to me that I *really* want
    1175 the "with" to be implemented because I hate having to type all those object
    1176 names for fields. It's a great way to drive people away from the language.
    1177 
    1178 
    1179 \subsection{Exception Handling ???}
    1180 
    1181 
    1182 \section{Declarations}
    1183 
    1184 It is important to the design team that \CFA subjectively ``feel like'' C to user programmers.
    1185 An important part of this subjective feel is maintaining C's procedural programming paradigm, as opposed to the object-oriented paradigm of other systems languages such as \CC and Rust.
    1186 Maintaining this procedural paradigm means that coding patterns that work in C will remain not only functional but idiomatic in \CFA, reducing the mental burden of retraining C programmers and switching between C and \CFA development.
    1187 Nonetheless, some features of object-oriented languages are undeniably convienient, and the \CFA design team has attempted to adapt them to a procedural paradigm so as to incorporate their benefits into \CFA; two of these features are resource management and name scoping.
    1188 
    1189 
    1190 \subsection{Alternative Declaration Syntax}
    1191 
    1192 
    1193 \subsection{References}
    1194 
    1195 All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
    1196 The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
    1197 Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
    1198 Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
    1199 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
    1200 In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
    1201 
    1202 \begin{cfa}
    1203 int x = 1, y = 2, * p1, * p2, ** p3;
    1204 p1 = &x;  $\C{// p1 points to x}$
    1205 p2 = &y;  $\C{// p2 points to y}$
    1206 p3 = &p1;  $\C{// p3 points to p1}$
    1207 \end{cfa}
    1208 
    1209 Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
    1210 It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity.
    1211 However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
    1212 To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below:
    1213 
    1214 \begin{cfa}
    1215 inx x = 1, y = 2, & r1, & r2, && r3;
    1216 &r1 = &x;  $\C{// r1 points to x}$
    1217 &r2 = &y;  $\C{// r2 points to y}$
    1218 &&r3 = &&r1;  $\C{// r3 points to r2}$
    1219 r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15);  $\C{// implicit dereferencing}$
    1220 \end{cfa}
    1221 
    1222 Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example.
    1223 Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value.
    1224 One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like:
    1225 
    1226 \begin{cfa}
    1227 `*`r2 = ((`*`r1 + `*`r2) * (`**`r3 - `*`r1)) / (`**`r3 - 15);
    1228 \end{cfa}
    1229 
    1230 References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
    1231 Firstly, \CFA does not forbid references to references, unlike \CC.
    1232 This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
    1233 
    1234 Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
    1235 This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime.
    1236 This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
    1237 In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
    1238 In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
    1239 The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
    1240 This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case).
    1241 The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x;@ or @(&(&`*`)`*`)r3 = &(&`*`)r1;@ or even @(&`*`)r2 = (&`*`)`*`r3;@ for @&r2 = &r3;@.
    1242 
    1243 Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
    1244 By analogy to pointers, \CFA references also allow cv-qualifiers:
    1245 
    1246 \begin{cfa}
    1247 const int cx = 5;               $\C{// cannot change cx}$
    1248 const int & cr = cx;    $\C{// cannot change cr's referred value}$
    1249 &cr = &cx;                              $\C{// rebinding cr allowed}$
    1250 cr = 7;                                 $\C{// ERROR, cannot change cr}$
    1251 int & const rc = x;             $\C{// must be initialized, like in \CC}$
    1252 &rc = &x;                               $\C{// ERROR, cannot rebind rc}$
    1253 rc = 7;                                 $\C{// x now equal to 7}$
    1254 \end{cfa}
    1255 
    1256 \TODO{Pull more draft text from user manual; make sure to discuss initialization and reference conversions}
    1257 
    1258 
    1259 \subsection{Constructors and Destructors}
    1260 
    1261 One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management.
    1262 However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
    1263 \CC is well-known for an approach to manual memory management that addresses both these issues, Resource Allocation Is Initialization (RAII), implemented by means of special \emph{constructor} and \emph{destructor} functions; we have implemented a similar feature in \CFA.
    1264 
    1265 \TODO{Fill out section. Mention field-constructors and at-equal escape hatch to C-style initialization. Probably pull some text from Rob's thesis for first draft.}
    1266 
    1267 
    1268 \subsection{Default Parameters}
    1269 
    1270 
    1271 \section{Literals}
    1272 
    1273 
    1274 \subsection{0/1}
    1275 
    1276 \TODO{Some text already at the end of Section~\ref{sec:poly-fns}}
    1277 
    1278 
    1279 \subsection{Units}
    1280 
    1281 Alternative call syntax (literal argument before routine name) to convert basic literals into user literals.
    1282 
    1283 {\lstset{language=CFA,deletedelim=**[is][]{`}{`},moredelim=**[is][\color{red}]{@}{@}}
    1284 \begin{cfa}
    1285 struct Weight { double stones; };
    1286 
    1287 void ?{}( Weight & w ) { w.stones = 0; } $\C{// operations}$
    1288 void ?{}( Weight & w, double w ) { w.stones = w; }
    1289 Weight ?+?( Weight l, Weight r ) { return (Weight){ l.stones + r.stones }; }
    1290 
    1291 Weight @?`st@( double w ) { return (Weight){ w }; } $\C{// backquote for units}$
    1292 Weight @?`lb@( double w ) { return (Weight){ w / 14.0 }; }
    1293 Weight @?`kg@( double w ) { return (Weight) { w * 0.1575}; }
    1294 
    1295 int main() {
    1296         Weight w, hw = { 14 };                  $\C{// 14 stone}$
    1297         w = 11@`st@ + 1@`lb@;
    1298         w = 70.3@`kg@;
    1299         w = 155@`lb@;
    1300         w = 0x_9b_u@`lb@;                               $\C{// hexadecimal unsigned weight (155)}$
    1301         w = 0_233@`lb@;                                 $\C{// octal weight (155)}$
    1302         w = 5@`st@ + 8@`kg@ + 25@`lb@ + hw;
    1303 }
    1304 \end{cfa}
    1305 }%
    1306 
    1307 
    1308904\section{Evaluation}
    1309905\label{sec:eval}
     
    14171013In contrast, \CFA has a single facility for polymorphic code supporting type-safe separate-compilation of polymorphic functions and generic (opaque) types, which uniformly leverage the C procedural paradigm.
    14181014The key mechanism to support separate compilation is \CFA's \emph{explicit} use of assumed properties for a type.
    1419 Until \CC concepts~\cite{C++Concepts} are standardized (anticipated for \CCtwenty), \CC provides no way to specify the requirements of a generic function in code beyond compilation errors during template expansion;
     1015Until \CC~\cite{C++Concepts} are standardized (anticipated for \CCtwenty), \CC provides no way to specify the requirements of a generic function in code beyond compilation errors during template expansion;
    14201016furthermore, \CC concepts are restricted to template polymorphism.
    14211017
     
    14251021In \CFA terms, all Cyclone polymorphism must be dtype-static.
    14261022While the Cyclone design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, it is more restrictive than \CFA's general model.
    1427 Smith and Volpano~\cite{Smith98} present Polymorphic C, an ML dialect with polymorphic functions, C-like syntax, and pointer types; it lacks many of C's features, however, most notably structure types, and so is not a practical C replacement.
    1428 
    1429 Objective-C~\cite{obj-c-book} is an industrially successful extension to C.
     1023\cite{Smith98} present Polymorphic C, an ML dialect with polymorphic functions and C-like syntax and pointer types; it lacks many of C's features, however, most notably structure types, and so is not a practical C replacement.
     1024
     1025\cite{obj-c-book} is an industrially successful extension to C.
    14301026However, Objective-C is a radical departure from C, using an object-oriented model with message-passing.
    14311027Objective-C did not support type-checked generics until recently \cite{xcode7}, historically using less-efficient runtime checking of object types.
    1432 The GObject~\cite{GObject} framework also adds object-oriented programming with runtime type-checking and reference-counting garbage-collection to C;
     1028The~\cite{GObject} framework also adds object-oriented programming with runtime type-checking and reference-counting garbage-collection to C;
    14331029these features are more intrusive additions than those provided by \CFA, in addition to the runtime overhead of reference-counting.
    1434 Vala~\cite{Vala} compiles to GObject-based C, adding the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases.
     1030\cite{Vala} compiles to GObject-based C, adding the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases.
    14351031Java~\cite{Java8} included generic types in Java~5, which are type-checked at compilation and type-erased at runtime, similar to \CFA's.
    14361032However, in Java, each object carries its own table of method pointers, while \CFA passes the method pointers separately to maintain a C-compatible layout.
    14371033Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens.
    14381034
    1439 D~\cite{D}, Go, and Rust~\cite{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust.
     1035D~\cite{D}, Go, and~\cite{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust.
    14401036However, each language represents a significant departure from C in terms of language model, and none has the same level of compatibility with C as \CFA.
    14411037D and Go are garbage-collected languages, imposing the associated runtime overhead.
     
    14581054\CCeleven introduced @std::tuple@ as a library variadic template structure.
    14591055Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values.
    1460 Operations include @std::get<N>@ to extract values, @std::tie@ to create a tuple of references used for assignment, and lexicographic comparisons.
     1056Operations include @std::get<N>@ to extract vales, @std::tie@ to create a tuple of references used for assignment, and lexicographic comparisons.
    14611057\CCseventeen proposes \emph{structured bindings}~\cite{Sutter15} to eliminate pre-declaring variables and use of @std::tie@ for binding the results.
    14621058This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism.
     
    14721068The goal of \CFA is to provide an evolutionary pathway for large C development-environments to be more productive and safer, while respecting the talent and skill of C programmers.
    14731069While other programming languages purport to be a better C, they are in fact new and interesting languages in their own right, but not C extensions.
    1474 The purpose of this paper is to introduce \CFA, and showcase language features that illustrate the \CFA type-system and approaches taken to achieve the goal of evolutionary C extension.
     1070The purpose of this paper is to introduce \CFA, and showcase two language features that illustrate the \CFA type-system and approaches taken to achieve the goal of evolutionary C extension.
    14751071The contributions are a powerful type-system using parametric polymorphism and overloading, generic types, and tuples, which all have complex interactions.
    14761072The work is a challenging design, engineering, and implementation exercise.
Note: See TracChangeset for help on using the changeset viewer.