489 | | |
490 | | |
491 | | \section{Declarations} |
492 | | \label{s:Declarations} |
493 | | |
494 | | C declaration syntax is notoriously confusing and error prone. |
495 | | For example, many C programmers are confused by a declaration as simple as: |
496 | | \begin{quote2} |
497 | | \begin{tabular}{@{}ll@{}} |
498 | | \begin{cfa} |
499 | | int * x[5] |
500 | | \end{cfa} |
501 | | & |
502 | | \raisebox{-0.75\totalheight}{\input{Cdecl}} |
503 | | \end{tabular} |
504 | | \end{quote2} |
505 | | Is this an array of 5 pointers to integers or a \Index{pointer} to an array of 5 integers? |
506 | | The fact this declaration is unclear to many C programmers means there are \Index{productivity} and \Index{safety} issues even for basic programs. |
507 | | Another example of confusion results from the fact that a routine name and its parameters are embedded within the return type, mimicking the way the return value is used at the routine's call site. |
508 | | For example, a routine returning a \Index{pointer} to an array of integers is defined and used in the following way: |
509 | | \begin{cfa} |
510 | | int ®(*®f®())[®5®]® {...}; §\C{definition}§ |
511 | | ... ®(*®f®())[®3®]® += 1; §\C{usage}§ |
512 | | \end{cfa} |
513 | | Essentially, the return type is wrapped around the routine name in successive layers (like an \Index{onion}). |
514 | | While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice. |
515 | | |
516 | | \CFA provides its own type, variable and routine declarations, using a different syntax. |
517 | | The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type. |
518 | | In the following example, \R{red} is the base type and \B{blue} is qualifiers. |
519 | | The \CFA declarations move the qualifiers to the left of the base type, \ie move the blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type. |
520 | | \begin{quote2} |
521 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
522 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
523 | | \begin{cfa} |
524 | | ß[5] *ß ®int® x1; |
525 | | ß* [5]ß ®int® x2; |
526 | | ß[* [5] int]ß f®( int p )®; |
527 | | \end{cfa} |
528 | | & |
529 | | \begin{cfa} |
530 | | ®int® ß*ß x1 ß[5]ß; |
531 | | ®int® ß(*ßx2ß)[5]ß; |
532 | | ßint (*ßf®( int p )®ß)[5]ß; |
533 | | \end{cfa} |
534 | | \end{tabular} |
535 | | \end{quote2} |
536 | | The only exception is \Index{bit field} specification, which always appear to the right of the base type. |
537 | | % Specifically, the character ©*© is used to indicate a pointer, square brackets ©[©\,©]© are used to represent an array or function return value, and parentheses ©()© are used to indicate a routine parameter. |
538 | | However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list. |
539 | | For instance, variables ©x© and ©y© of type \Index{pointer} to integer are defined in \CFA as follows: |
540 | | \begin{quote2} |
541 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
542 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
543 | | \begin{cfa} |
544 | | ®*® int x, y; |
545 | | \end{cfa} |
546 | | & |
547 | | \begin{cfa} |
548 | | int ®*®x, ®*®y; |
549 | | \end{cfa} |
550 | | \end{tabular} |
551 | | \end{quote2} |
552 | | The downside of this semantics is the need to separate regular and \Index{pointer} declarations: |
553 | | \begin{quote2} |
554 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
555 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
556 | | \begin{cfa} |
557 | | ®*® int x; |
558 | | int y; |
559 | | \end{cfa} |
560 | | & |
561 | | \begin{cfa} |
562 | | int ®*®x, y; |
563 | | |
564 | | \end{cfa} |
565 | | \end{tabular} |
566 | | \end{quote2} |
567 | | which is \Index{prescribing} a safety benefit. |
568 | | Other examples are: |
569 | | \begin{quote2} |
570 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}} |
571 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\ |
572 | | \begin{cfa} |
573 | | [ 5 ] int z; |
574 | | [ 5 ] * char w; |
575 | | * [ 5 ] double v; |
576 | | struct s { |
577 | | int f0:3; |
578 | | * int f1; |
579 | | [ 5 ] * int f2; |
580 | | }; |
581 | | \end{cfa} |
582 | | & |
583 | | \begin{cfa} |
584 | | int z[ 5 ]; |
585 | | char * w[ 5 ]; |
586 | | double (* v)[ 5 ]; |
587 | | struct s { |
588 | | int f0:3; |
589 | | int * f1; |
590 | | int * f2[ 5 ] |
591 | | }; |
592 | | \end{cfa} |
593 | | & |
594 | | \begin{cfa} |
595 | | // array of 5 integers |
596 | | // array of 5 pointers to char |
597 | | // pointer to array of 5 doubles |
598 | | |
599 | | // common bit field syntax |
600 | | |
601 | | |
602 | | |
603 | | \end{cfa} |
604 | | \end{tabular} |
605 | | \end{quote2} |
606 | | |
607 | | All type qualifiers, \eg ©const©, ©volatile©, etc., are used in the normal way with the new declarations and also appear left to right, \eg: |
608 | | \begin{quote2} |
609 | | \begin{tabular}{@{}l@{\hspace{1em}}l@{\hspace{1em}}l@{}} |
610 | | \multicolumn{1}{c@{\hspace{1em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{1em}}}{\textbf{C}} \\ |
611 | | \begin{cfa} |
612 | | const * const int x; |
613 | | const * [ 5 ] const int y; |
614 | | \end{cfa} |
615 | | & |
616 | | \begin{cfa} |
617 | | int const * const x; |
618 | | const int (* const y)[ 5 ] |
619 | | \end{cfa} |
620 | | & |
621 | | \begin{cfa} |
622 | | // const pointer to const integer |
623 | | // const pointer to array of 5 const integers |
624 | | \end{cfa} |
625 | | \end{tabular} |
626 | | \end{quote2} |
627 | | All declaration qualifiers, \eg ©extern©, ©static©, etc., are used in the normal way with the new declarations but can only appear at the start of a \CFA routine declaration,\footnote{\label{StorageClassSpecifier} |
628 | | The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.~\cite[\S~6.11.5(1)]{C11}} \eg: |
629 | | \begin{quote2} |
630 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}} |
631 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\ |
632 | | \begin{cfa} |
633 | | extern [ 5 ] int x; |
634 | | static * const int y; |
635 | | \end{cfa} |
636 | | & |
637 | | \begin{cfa} |
638 | | int extern x[ 5 ]; |
639 | | const int static * y; |
640 | | \end{cfa} |
641 | | & |
642 | | \begin{cfa} |
643 | | // externally visible array of 5 integers |
644 | | // internally visible pointer to constant int |
645 | | \end{cfa} |
646 | | \end{tabular} |
647 | | \end{quote2} |
648 | | |
649 | | The new declaration syntax can be used in other contexts where types are required, \eg casts and the pseudo-routine ©sizeof©: |
650 | | \begin{quote2} |
651 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
652 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
653 | | \begin{cfa} |
654 | | y = (®* int®)x; |
655 | | i = sizeof(®[ 5 ] * int®); |
656 | | \end{cfa} |
657 | | & |
658 | | \begin{cfa} |
659 | | y = (®int *®)x; |
660 | | i = sizeof(®int * [ 5 ]®); |
661 | | \end{cfa} |
662 | | \end{tabular} |
663 | | \end{quote2} |
664 | | |
665 | | Finally, new \CFA declarations may appear together with C declarations in the same program block, but cannot be mixed within a specific declaration. |
666 | | Therefore, a programmer has the option of either continuing to use traditional C declarations or take advantage of the new style. |
667 | | Clearly, both styles need to be supported for some time due to existing C-style header-files, particularly for UNIX systems. |
668 | | |
669 | | |
670 | | \section{Pointer/Reference} |
671 | | |
672 | | C provides a \newterm{pointer type}; |
673 | | \CFA adds a \newterm{reference type}. |
674 | | These types may be derived from an object or routine type, called the \newterm{referenced type}. |
675 | | Objects of these types contain an \newterm{address}, which is normally a location in memory, but may also address memory-mapped registers in hardware devices. |
676 | | An integer constant expression with the value 0, or such an expression cast to type ©void *©, is called a \newterm{null-pointer constant}.\footnote{ |
677 | | One way to conceptualize the null pointer is that no variable is placed at this address, so the null-pointer address can be used to denote an uninitialized pointer/reference object; |
678 | | \ie the null pointer is guaranteed to compare unequal to a pointer to any object or routine.} |
679 | | An address is \newterm{sound}, if it points to a valid memory location in scope, \ie within the program's execution-environment and has not been freed. |
680 | | Dereferencing an \newterm{unsound} address, including the null pointer, is \Index{undefined}, often resulting in a \Index{memory fault}. |
681 | | |
682 | | A program \newterm{object} is a region of data storage in the execution environment, the contents of which can represent values. |
683 | | In most cases, objects are located in memory at an address, and the variable name for an object is an implicit address to the object generated by the compiler and automatically dereferenced, as in: |
684 | | \begin{quote2} |
685 | | \begin{tabular}{@{}ll@{\hspace{2em}}l@{}} |
686 | | \begin{cfa} |
687 | | int x; |
688 | | x = 3; |
689 | | int y; |
690 | | y = x; |
691 | | \end{cfa} |
692 | | & |
693 | | \raisebox{-0.45\totalheight}{\input{pointer1}} |
694 | | & |
695 | | \begin{cfa} |
696 | | int * ®const® x = (int *)100 |
697 | | *x = 3; // implicit dereference |
698 | | int * ®const® y = (int *)104; |
699 | | *y = *x; // implicit dereference |
700 | | \end{cfa} |
701 | | \end{tabular} |
702 | | \end{quote2} |
703 | | where the right example is how the compiler logically interprets the variables in the left example. |
704 | | Since a variable name only points to one address during its lifetime, it is an \Index{immutable} \Index{pointer}; |
705 | | hence, the implicit type of pointer variables ©x© and ©y© are constant pointers in the compiler interpretation. |
706 | | In general, variable addresses are stored in instructions instead of loaded from memory, and hence may not occupy storage. |
707 | | These approaches are contrasted in the following: |
708 | | \begin{quote2} |
709 | | \begin{tabular}{@{}l|l@{}} |
710 | | \multicolumn{1}{c|}{explicit variable address} & \multicolumn{1}{c}{implicit variable address} \\ |
711 | | \hline |
712 | | \begin{cfa} |
713 | | lda r1,100 // load address of x |
714 | | ld r2,(r1) // load value of x |
715 | | lda r3,104 // load address of y |
716 | | st r2,(r3) // store x into y |
717 | | \end{cfa} |
718 | | & |
719 | | \begin{cfa} |
720 | | |
721 | | ld r2,(100) // load value of x |
722 | | |
723 | | st r2,(104) // store x into y |
724 | | \end{cfa} |
725 | | \end{tabular} |
726 | | \end{quote2} |
727 | | Finally, the immutable nature of a variable's address and the fact that there is no storage for the variable pointer means pointer assignment\index{pointer!assignment}\index{assignment!pointer} is impossible. |
728 | | Therefore, the expression ©x = y© has only one meaning, ©*x = *y©, \ie manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of \Index{instruction decoding}. |
729 | | |
730 | | A \Index{pointer}/\Index{reference} object is a generalization of an object variable-name, \ie a mutable address that can point to more than one memory location during its lifetime. |
731 | | (Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime, and like a variable name, may not occupy storage if the literal is embedded directly into instructions.) |
732 | | Hence, a pointer occupies memory to store its current address, and the pointer's value is loaded by dereferencing, \eg: |
733 | | \begin{quote2} |
734 | | \begin{tabular}{@{}l@{\hspace{2em}}l@{}} |
735 | | \begin{cfa} |
736 | | int x, y, ®*® p1, ®*® p2, ®**® p3; |
737 | | p1 = ®&®x; // p1 points to x |
738 | | p2 = p1; // p2 points to x |
739 | | p1 = ®&®y; // p1 points to y |
740 | | p3 = &p2; // p3 points to p2 |
741 | | \end{cfa} |
742 | | & |
743 | | \raisebox{-0.5\totalheight}{\input{pointer2.pstex_t}} |
744 | | \end{tabular} |
745 | | \end{quote2} |
746 | | |
747 | | Notice, an address has a \Index{duality}\index{address!duality}: a location in memory or the value at that location. |
748 | | In many cases, a compiler might be able to infer the best meaning for these two cases. |
749 | | For example, \Index*{Algol68}~\cite{Algol68} infers pointer dereferencing to select the best meaning for each pointer usage |
750 | | \begin{cfa} |
751 | | p2 = p1 + x; §\C{// compiler infers *p2 = *p1 + x;}§ |
752 | | \end{cfa} |
753 | | Algol68 infers the following dereferencing ©*p2 = *p1 + x©, because adding the arbitrary integer value in ©x© to the address of ©p1© and storing the resulting address into ©p2© is an unlikely operation. |
754 | | Unfortunately, automatic dereferencing does not work in all cases, and so some mechanism is necessary to fix incorrect choices. |
755 | | |
756 | | Rather than inferring dereference, most programming languages pick one implicit dereferencing semantics, and the programmer explicitly indicates the other to resolve address-duality. |
757 | | In C, objects of pointer type always manipulate the pointer object's address: |
758 | | \begin{cfa} |
759 | | p1 = p2; §\C{// p1 = p2\ \ rather than\ \ *p1 = *p2}§ |
760 | | p2 = p1 + x; §\C{// p2 = p1 + x\ \ rather than\ \ *p2 = *p1 + x}§ |
761 | | \end{cfa} |
762 | | even though the assignment to ©p2© is likely incorrect, and the programmer probably meant: |
763 | | \begin{cfa} |
764 | | p1 = p2; §\C{// pointer address assignment}§ |
765 | | ®*®p2 = ®*®p1 + x; §\C{// pointed-to value assignment / operation}§ |
766 | | \end{cfa} |
767 | | The C semantics work well for situations where manipulation of addresses is the primary meaning and data is rarely accessed, such as storage management (©malloc©/©free©). |
768 | | |
769 | | However, in most other situations, the pointed-to value is requested more often than the pointer address. |
770 | | \begin{cfa} |
771 | | *p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15); |
772 | | \end{cfa} |
773 | | In this case, it is tedious to explicitly write the dereferencing, and error prone when pointer arithmetic is allowed. |
774 | | It is better to have the compiler generate the dereferencing and have no implicit pointer arithmetic: |
775 | | \begin{cfa} |
776 | | p2 = ((p1 + p2) * (p3 - p1)) / (p3 - 15); |
777 | | \end{cfa} |
778 | | |
779 | | To support this common case, a reference type is introduced in \CFA, denoted by ©&©, which is the opposite dereference semantics to a pointer type, making the value at the pointed-to location the implicit semantics for dereferencing (similar but not the same as \CC \Index{reference type}s). |
780 | | \begin{cfa} |
781 | | int x, y, ®&® r1, ®&® r2, ®&&® r3; |
782 | | ®&®r1 = &x; §\C{// r1 points to x}§ |
783 | | ®&®r2 = &r1; §\C{// r2 points to x}§ |
784 | | ®&®r1 = &y; §\C{// r1 points to y}§ |
785 | | ®&&®r3 = ®&®&r2; §\C{// r3 points to r2}§ |
786 | | r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15); §\C{// implicit dereferencing}§ |
787 | | \end{cfa} |
788 | | Except for auto-dereferencing by the compiler, this reference example is the same as the previous pointer example. |
789 | | Hence, a reference behaves like the variable name for the current variable it is pointing-to. |
790 | | One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in a declaration, so the previous example becomes: |
791 | | \begin{cfa} |
792 | | ®*®r2 = ((®*®r1 + ®*®r2) ®*® (®**®r3 - ®*®r1)) / (®**®r3 - 15); |
793 | | \end{cfa} |
794 | | When a reference operation appears beside a dereference operation, \eg ©&*©, they cancel out. |
795 | | However, in C, the cancellation always yields a value (\Index{rvalue}).\footnote{ |
796 | | The unary ©&© operator yields the address of its operand. |
797 | | If the operand has type ``type'', the result has type ``pointer to type''. |
798 | | If the operand is the result of a unary ©*© operator, neither that operator nor the ©&© operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.~\cite[\S~6.5.3.2--3]{C11}} |
799 | | For a \CFA reference type, the cancellation on the left-hand side of assignment leaves the reference as an address (\Index{lvalue}): |
800 | | \begin{cfa} |
801 | | (&®*®)r1 = &x; §\C{// (\&*) cancel giving address in r1 not variable pointed-to by r1}§ |
802 | | \end{cfa} |
803 | | Similarly, the address of a reference can be obtained for assignment or computation (\Index{rvalue}): |
804 | | \begin{cfa} |
805 | | (&(&®*®)®*®)r3 = &(&®*®)r2; §\C{// (\&*) cancel giving address in r2, (\&(\&*)*) cancel giving address in r3}§ |
806 | | \end{cfa} |
807 | | Cancellation\index{cancellation!pointer/reference}\index{pointer!cancellation} works to arbitrary depth. |
808 | | |
809 | | Fundamentally, pointer and reference objects are functionally interchangeable because both contain addresses. |
810 | | \begin{cfa} |
811 | | int x, *p1 = &x, **p2 = &p1, ***p3 = &p2, |
812 | | &r1 = x, &&r2 = r1, &&&r3 = r2; |
813 | | ***p3 = 3; §\C{// change x}§ |
814 | | r3 = 3; §\C{// change x, ***r3}§ |
815 | | **p3 = ...; §\C{// change p1}§ |
816 | | &r3 = ...; §\C{// change r1, (\&*)**r3, 1 cancellation}§ |
817 | | *p3 = ...; §\C{// change p2}§ |
818 | | &&r3 = ...; §\C{// change r2, (\&(\&*)*)*r3, 2 cancellations}§ |
819 | | &&&r3 = p3; §\C{// change r3 to p3, (\&(\&(\&*)*)*)r3, 3 cancellations}§ |
820 | | \end{cfa} |
821 | | Furthermore, both types are equally performant, as the same amount of dereferencing occurs for both types. |
822 | | Therefore, the choice between them is based solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of implicit dereferencing aid from the compiler. |
823 | | |
824 | | As for a pointer type, a reference type may have qualifiers: |
825 | | \begin{cfa} |
826 | | const int cx = 5; §\C{// cannot change cx;}§ |
827 | | const int & cr = cx; §\C{// cannot change what cr points to}§ |
828 | | ®&®cr = &cx; §\C{// can change cr}§ |
829 | | cr = 7; §\C{// error, cannot change cx}§ |
830 | | int & const rc = x; §\C{// must be initialized}§ |
831 | | ®&®rc = &x; §\C{// error, cannot change rc}§ |
832 | | const int & const crc = cx; §\C{// must be initialized}§ |
833 | | crc = 7; §\C{// error, cannot change cx}§ |
834 | | ®&®crc = &cx; §\C{// error, cannot change crc}§ |
835 | | \end{cfa} |
836 | | Hence, for type ©& const©, there is no pointer assignment, so ©&rc = &x© is disallowed, and \emph{the address value cannot be the null pointer unless an arbitrary pointer is coerced\index{coercion} into the reference}: |
837 | | \begin{cfa} |
838 | | int & const cr = *0; §\C{// where 0 is the int * zero}§ |
839 | | \end{cfa} |
840 | | Note, constant reference-types do not prevent \Index{addressing errors} because of explicit storage-management: |
841 | | \begin{cfa} |
842 | | int & const cr = *malloc(); |
843 | | cr = 5; |
844 | | free( &cr ); |
845 | | cr = 7; §\C{// unsound pointer dereference}§ |
846 | | \end{cfa} |
847 | | |
848 | | The position of the ©const© qualifier \emph{after} the pointer/reference qualifier causes confuse for C programmers. |
849 | | The ©const© qualifier cannot be moved before the pointer/reference qualifier for C style-declarations; |
850 | | \CFA-style declarations (see \VRef{s:Declarations}) attempt to address this issue: |
851 | | \begin{quote2} |
852 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
853 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
854 | | \begin{cfa} |
855 | | ®const® * ®const® * const int ccp; |
856 | | ®const® & ®const® & const int ccr; |
857 | | \end{cfa} |
858 | | & |
859 | | \begin{cfa} |
860 | | const int * ®const® * ®const® ccp; |
861 | | |
862 | | \end{cfa} |
863 | | \end{tabular} |
864 | | \end{quote2} |
865 | | where the \CFA declaration is read left-to-right. |
866 | | |
867 | | Finally, like pointers, references are usable and composable with other type operators and generators. |
868 | | \begin{cfa} |
869 | | int w, x, y, z, & ar[3] = { x, y, z }; §\C{// initialize array of references}§ |
870 | | &ar[1] = &w; §\C{// change reference array element}§ |
871 | | typeof( ar[1] ) p; §\C{// (gcc) is int, i.e., the type of referenced object}§ |
872 | | typeof( &ar[1] ) q; §\C{// (gcc) is int \&, i.e., the type of reference}§ |
873 | | sizeof( ar[1] ) == sizeof( int ); §\C{// is true, i.e., the size of referenced object}§ |
874 | | sizeof( &ar[1] ) == sizeof( int *) §\C{// is true, i.e., the size of a reference}§ |
875 | | \end{cfa} |
876 | | |
877 | | In contrast to \CFA reference types, \Index*[C++]{\CC{}}'s reference types are all ©const© references, preventing changes to the reference address, so only value assignment is possible, which eliminates half of the \Index{address duality}. |
878 | | Also, \CC does not allow \Index{array}s\index{array!reference} of reference\footnote{ |
879 | | The reason for disallowing arrays of reference is unknown, but possibly comes from references being ethereal (like a textual macro), and hence, replaceable by the referant object.} |
880 | | \Index*{Java}'s reference types to objects (all Java objects are on the heap) are like C pointers, which always manipulate the address, and there is no (bit-wise) object assignment, so objects are explicitly cloned by shallow or deep copying, which eliminates half of the address duality. |
881 | | |
882 | | |
883 | | \subsection{Initialization} |
884 | | |
885 | | \Index{Initialization} is different than \Index{assignment} because initialization occurs on the empty (uninitialized) storage on an object, while assignment occurs on possibly initialized storage of an object. |
886 | | There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, return/temporary binding. |
887 | | Because the object being initialized has no value, there is only one meaningful semantics with respect to address duality: it must mean address as there is no pointed-to value. |
888 | | In contrast, the left-hand side of assignment has an address that has a duality. |
889 | | Therefore, for pointer/reference initialization, the initializing value must be an address not a value. |
890 | | \begin{cfa} |
891 | | int * p = &x; §\C{// assign address of x}§ |
892 | | ®int * p = x;® §\C{// assign value of x}§ |
893 | | int & r = x; §\C{// must have address of x}§ |
894 | | \end{cfa} |
895 | | Like the previous example with C pointer-arithmetic, it is unlikely assigning the value of ©x© into a pointer is meaningful (again, a warning is usually given). |
896 | | Therefore, for safety, this context requires an address, so it is superfluous to require explicitly taking the address of the initialization object, even though the type is incorrect. |
897 | | Note, this is strictly a convenience and safety feature for a programmer. |
898 | | Hence, \CFA allows ©r© to be assigned ©x© because it infers a reference for ©x©, by implicitly inserting a address-of operator, ©&©, and it is an error to put an ©&© because the types no longer match due to the implicit dereference. |
899 | | Unfortunately, C allows ©p© to be assigned with ©&x© (address) or ©x© (value), but most compilers warn about the latter assignment as being potentially incorrect. |
900 | | Similarly, when a reference type is used for a parameter/return type, the call-site argument does not require a reference operator for the same reason. |
901 | | \begin{cfa} |
902 | | int & f( int & r ); §\C{// reference parameter and return}§ |
903 | | z = f( x ) + f( y ); §\C{// reference operator added, temporaries needed for call results}§ |
904 | | \end{cfa} |
905 | | Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©r© can be locally reassigned within ©f©. |
906 | | Since operator routine ©?+?© takes its arguments by value, the references returned from ©f© are used to initialize compiler generated temporaries with value semantics that copy from the references. |
907 | | \begin{cfa} |
908 | | int temp1 = f( x ), temp2 = f( y ); |
909 | | z = temp1 + temp2; |
910 | | \end{cfa} |
911 | | This \Index{implicit referencing} is crucial for reducing the syntactic burden for programmers when using references; |
912 | | otherwise references have the same syntactic burden as pointers in these contexts. |
913 | | |
914 | | When a pointer/reference parameter has a ©const© value (immutable), it is possible to pass literals and expressions. |
915 | | \begin{cfa} |
916 | | void f( ®const® int & cr ); |
917 | | void g( ®const® int * cp ); |
918 | | f( 3 ); g( ®&®3 ); |
919 | | f( x + y ); g( ®&®(x + y) ); |
920 | | \end{cfa} |
921 | | Here, the compiler passes the address to the literal 3 or the temporary for the expression ©x + y©, knowing the argument cannot be changed through the parameter. |
922 | | The ©&© before the constant/expression for the pointer-type parameter (©g©) is a \CFA extension necessary to type match and is a common requirement before a variable in C (\eg ©scanf©). |
923 | | Importantly, ©&3© may not be equal to ©&3©, where the references occur across calls because the temporaries maybe different on each call. |
924 | | |
925 | | \CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed.\footnote{ |
926 | | If whole program analysis is possible, and shows the parameter is not assigned, \ie it is ©const©, the temporary is unnecessary.} |
927 | | \begin{cfa} |
928 | | void f( int & r ); |
929 | | void g( int * p ); |
930 | | f( 3 ); g( ®&®3 ); §\C{// compiler implicit generates temporaries}§ |
931 | | f( x + y ); g( ®&®(x + y) ); §\C{// compiler implicit generates temporaries}§ |
932 | | \end{cfa} |
933 | | Essentially, there is an implicit \Index{rvalue} to \Index{lvalue} conversion in this case.\footnote{ |
934 | | This conversion attempts to address the \newterm{const hell} problem, when the innocent addition of a ©const© qualifier causes a cascade of type failures, requiring an unknown number of additional ©const© qualifiers, until it is discovered a ©const© qualifier cannot be added and all the ©const© qualifiers must be removed.} |
935 | | The implicit conversion allows seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call. |
936 | | |
937 | | %\CFA attempts to handle pointers and references in a uniform, symmetric manner. |
938 | | Finally, C handles \Index{routine object}s in an inconsistent way. |
939 | | A routine object is both a pointer and a reference (\Index{particle and wave}). |
940 | | \begin{cfa} |
941 | | void f( int i ); |
942 | | void (*fp)( int ); §\C{// routine pointer}§ |
943 | | fp = f; §\C{// reference initialization}§ |
944 | | fp = &f; §\C{// pointer initialization}§ |
945 | | fp = *f; §\C{// reference initialization}§ |
946 | | fp(3); §\C{// reference invocation}§ |
947 | | (*fp)(3); §\C{// pointer invocation}§ |
948 | | \end{cfa} |
949 | | While C's treatment of routine objects has similarity to inferring a reference type in initialization contexts, the examples are assignment not initialization, and all possible forms of assignment are possible (©f©, ©&f©, ©*f©) without regard for type. |
950 | | Instead, a routine object should be referenced by a ©const© reference: |
951 | | \begin{cfa} |
952 | | ®const® void (®&® fr)( int ) = f; §\C{// routine reference}§ |
953 | | fr = ... §\C{// error, cannot change code}§ |
954 | | &fr = ...; §\C{// changing routine reference}§ |
955 | | fr( 3 ); §\C{// reference call to f}§ |
956 | | (*fr)(3); §\C{// error, incorrect type}§ |
957 | | \end{cfa} |
958 | | because the value of the routine object is a routine literal, \ie the routine code is normally immutable during execution.\footnote{ |
959 | | Dynamic code rewriting is possible but only in special circumstances.} |
960 | | \CFA allows this additional use of references for routine objects in an attempt to give a more consistent meaning for them. |
961 | | |
962 | | |
963 | | \subsection{Address-of Semantics} |
964 | | |
965 | | In C, ©&E© is an rvalue for any expression ©E©. |
966 | | \CFA extends the ©&© (address-of) operator as follows: |
967 | | \begin{itemize} |
968 | | \item |
969 | | if ©R© is an \Index{rvalue} of type ©T &$_1$...&$_r$© where $r \ge 1$ references (©&© symbols) than ©&R© has type ©T ®*®&$_{\color{red}2}$...&$_{\color{red}r}$©, \ie ©T© pointer with $r-1$ references (©&© symbols). |
970 | | |
971 | | \item |
972 | | if ©L© is an \Index{lvalue} of type ©T &$_1$...&$_l$© where $l \ge 0$ references (©&© symbols) then ©&L© has type ©T ®*®&$_{\color{red}1}$...&$_{\color{red}l}$©, \ie ©T© pointer with $l$ references (©&© symbols). |
973 | | \end{itemize} |
974 | | The following example shows the first rule applied to different \Index{rvalue} contexts: |
975 | | \begin{cfa} |
976 | | int x, * px, ** ppx, *** pppx, **** ppppx; |
977 | | int & rx = x, && rrx = rx, &&& rrrx = rrx ; |
978 | | x = rrrx; // rrrx is an lvalue with type int &&& (equivalent to x) |
979 | | px = &rrrx; // starting from rrrx, &rrrx is an rvalue with type int *&&& (&x) |
980 | | ppx = &&rrrx; // starting from &rrrx, &&rrrx is an rvalue with type int **&& (&rx) |
981 | | pppx = &&&rrrx; // starting from &&rrrx, &&&rrrx is an rvalue with type int ***& (&rrx) |
982 | | ppppx = &&&&rrrx; // starting from &&&rrrx, &&&&rrrx is an rvalue with type int **** (&rrrx) |
983 | | \end{cfa} |
984 | | The following example shows the second rule applied to different \Index{lvalue} contexts: |
985 | | \begin{cfa} |
986 | | int x, * px, ** ppx, *** pppx; |
987 | | int & rx = x, && rrx = rx, &&& rrrx = rrx ; |
988 | | rrrx = 2; // rrrx is an lvalue with type int &&& (equivalent to x) |
989 | | &rrrx = px; // starting from rrrx, &rrrx is an rvalue with type int *&&& (rx) |
990 | | &&rrrx = ppx; // starting from &rrrx, &&rrrx is an rvalue with type int **&& (rrx) |
991 | | &&&rrrx = pppx; // starting from &&rrrx, &&&rrrx is an rvalue with type int ***& (rrrx) |
992 | | \end{cfa} |
993 | | |
994 | | |
995 | | \subsection{Conversions} |
996 | | |
997 | | C provides a basic implicit conversion to simplify variable usage: |
998 | | \begin{enumerate} |
999 | | \setcounter{enumi}{-1} |
1000 | | \item |
1001 | | lvalue to rvalue conversion: ©cv T© converts to ©T©, which allows implicit variable dereferencing. |
1002 | | \begin{cfa} |
1003 | | int x; |
1004 | | x + 1; // lvalue variable (int) converts to rvalue for expression |
1005 | | \end{cfa} |
1006 | | An rvalue has no type qualifiers (©cv©), so the lvalue qualifiers are dropped. |
1007 | | \end{enumerate} |
1008 | | \CFA provides three new implicit conversion for reference types to simplify reference usage. |
1009 | | \begin{enumerate} |
1010 | | \item |
1011 | | reference to rvalue conversion: ©cv T &© converts to ©T©, which allows implicit reference dereferencing. |
1012 | | \begin{cfa} |
1013 | | int x, &r = x, f( int p ); |
1014 | | x = ®r® + f( ®r® ); // lvalue reference converts to rvalue |
1015 | | \end{cfa} |
1016 | | An rvalue has no type qualifiers (©cv©), so the reference qualifiers are dropped. |
1017 | | |
1018 | | \item |
1019 | | lvalue to reference conversion: \lstinline[deletekeywords={lvalue}]@lvalue-type cv1 T@ converts to ©cv2 T &©, which allows implicitly converting variables to references. |
1020 | | \begin{cfa} |
1021 | | int x, &r = ®x®, f( int & p ); // lvalue variable (int) convert to reference (int &) |
1022 | | f( ®x® ); // lvalue variable (int) convert to reference (int &) |
1023 | | \end{cfa} |
1024 | | Conversion can restrict a type, where ©cv1© $\le$ ©cv2©, \eg passing an ©int© to a ©const volatile int &©, which has low cost. |
1025 | | Conversion can expand a type, where ©cv1© $>$ ©cv2©, \eg passing a ©const volatile int© to an ©int &©, which has high cost (\Index{warning}); |
1026 | | furthermore, if ©cv1© has ©const© but not ©cv2©, a temporary variable is created to preserve the immutable lvalue. |
1027 | | |
1028 | | \item |
1029 | | rvalue to reference conversion: ©T© converts to ©cv T &©, which allows binding references to temporaries. |
1030 | | \begin{cfa} |
1031 | | int x, & f( int & p ); |
1032 | | f( ®x + 3® ); // rvalue parameter (int) implicitly converts to lvalue temporary reference (int &) |
1033 | | ®&f®(...) = &x; // rvalue result (int &) implicitly converts to lvalue temporary reference (int &) |
1034 | | \end{cfa} |
1035 | | In both case, modifications to the temporary are inaccessible (\Index{warning}). |
1036 | | Conversion expands the temporary-type with ©cv©, which is low cost since the temporary is inaccessible. |
1037 | | \end{enumerate} |
1038 | | |
1039 | | |
1040 | | \begin{comment} |
1041 | | From: Richard Bilson <rcbilson@gmail.com> |
1042 | | Date: Wed, 13 Jul 2016 01:58:58 +0000 |
1043 | | Subject: Re: pointers / references |
1044 | | To: "Peter A. Buhr" <pabuhr@plg2.cs.uwaterloo.ca> |
1045 | | |
1046 | | As a general comment I would say that I found the section confusing, as you move back and forth |
1047 | | between various real and imagined programming languages. If it were me I would rewrite into two |
1048 | | subsections, one that specifies precisely the syntax and semantics of reference variables and |
1049 | | another that provides the rationale. |
1050 | | |
1051 | | I don't see any obvious problems with the syntax or semantics so far as I understand them. It's not |
1052 | | obvious that the description you're giving is complete, but I'm sure you'll find the special cases |
1053 | | as you do the implementation. |
1054 | | |
1055 | | My big gripes are mostly that you're not being as precise as you need to be in your terminology, and |
1056 | | that you say a few things that aren't actually true even though I generally know what you mean. |
1057 | | |
1058 | | 20 C provides a pointer type; CFA adds a reference type. Both types contain an address, which is normally a |
1059 | | 21 location in memory. |
1060 | | |
1061 | | An address is not a location in memory; an address refers to a location in memory. Furthermore it |
1062 | | seems weird to me to say that a type "contains" an address; rather, objects of that type do. |
1063 | | |
1064 | | 21 Special addresses are used to denote certain states or access co-processor memory. By |
1065 | | 22 convention, no variable is placed at address 0, so addresses like 0, 1, 2, 3 are often used to denote no-value |
1066 | | 23 or other special states. |
1067 | | |
1068 | | This isn't standard C at all. There has to be one null pointer representation, but it doesn't have |
1069 | | to be a literal zero representation and there doesn't have to be more than one such representation. |
1070 | | |
1071 | | 23 Often dereferencing a special state causes a memory fault, so checking is necessary |
1072 | | 24 during execution. |
1073 | | |
1074 | | I don't see the connection between the two clauses here. I feel like if a bad pointer will not cause |
1075 | | a memory fault then I need to do more checking, not less. |
1076 | | |
1077 | | 24 If the programming language assigns addresses, a program's execution is sound, \ie all |
1078 | | 25 addresses are to valid memory locations. |
1079 | | |
1080 | | You haven't said what it means to "assign" an address, but if I use my intuitive understanding of |
1081 | | the term I don't see how this can be true unless you're assuming automatic storage management. |
1082 | | |
1083 | | 1 Program variables are implicit pointers to memory locations generated by the compiler and automatically |
1084 | | 2 dereferenced, as in: |
1085 | | |
1086 | | There is no reason why a variable needs to have a location in memory, and indeed in a typical |
1087 | | program many variables will not. In standard terminology an object identifier refers to data in the |
1088 | | execution environment, but not necessarily in memory. |
1089 | | |
1090 | | 13 A pointer/reference is a generalization of a variable name, \ie a mutable address that can point to more |
1091 | | 14 than one memory location during its lifetime. |
1092 | | |
1093 | | I feel like you're off the reservation here. In my world there are objects of pointer type, which |
1094 | | seem to be what you're describing here, but also pointer values, which can be stored in an object of |
1095 | | pointer type but don't necessarily have to be. For example, how would you describe the value denoted |
1096 | | by "&main" in a C program? I would call it a (function) pointer, but that doesn't satisfy your |
1097 | | definition. |
1098 | | |
1099 | | 16 not occupy storage as the literal is embedded directly into instructions.) Hence, a pointer occupies memory |
1100 | | 17 to store its current address, and the pointer's value is loaded by dereferencing, e.g.: |
1101 | | |
1102 | | As with my general objection regarding your definition of variables, there is no reason why a |
1103 | | pointer variable (object of pointer type) needs to occupy memory. |
1104 | | |
1105 | | 21 p2 = p1 + x; // compiler infers *p2 = *p1 + x; |
1106 | | |
1107 | | What language are we in now? |
1108 | | |
1109 | | 24 pointer usage. However, in C, the following cases are ambiguous, especially with pointer arithmetic: |
1110 | | 25 p1 = p2; // p1 = p2 or *p1 = *p2 |
1111 | | |
1112 | | This isn't ambiguous. it's defined to be the first option. |
1113 | | |
1114 | | 26 p1 = p1 + 1; // p1 = p1 + 1 or *p1 = *p1 + 1 |
1115 | | |
1116 | | Again, this statement is not ambiguous. |
1117 | | |
1118 | | 13 example. Hence, a reference behaves like the variable name for the current variable it is pointing-to. The |
1119 | | 14 simplest way to understand a reference is to imagine the compiler inserting a dereference operator before |
1120 | | 15 the reference variable for each reference qualifier in a declaration, e.g.: |
1121 | | |
1122 | | It's hard for me to understand who the audience for this part is. I think a practical programmer is |
1123 | | likely to be satisfied with "a reference behaves like the variable name for the current variable it |
1124 | | is pointing-to," maybe with some examples. Your "simplest way" doesn't strike me as simpler than |
1125 | | that. It feels like you're trying to provide a more precise definition for the semantics of |
1126 | | references, but it isn't actually precise enough to be a formal specification. If you want to |
1127 | | express the semantics of references using rewrite rules that's a great way to do it, but lay the |
1128 | | rules out clearly, and when you're showing an example of rewriting keep your |
1129 | | references/pointers/values separate (right now, you use \eg "r3" to mean a reference, a pointer, |
1130 | | and a value). |
1131 | | |
1132 | | 24 Cancellation works to arbitrary depth, and pointer and reference values are interchangeable because both |
1133 | | 25 contain addresses. |
1134 | | |
1135 | | Except they're not interchangeable, because they have different and incompatible types. |
1136 | | |
1137 | | 40 Interestingly, C++ deals with the address duality by making the pointed-to value the default, and prevent- |
1138 | | 41 ing changes to the reference address, which eliminates half of the duality. Java deals with the address duality |
1139 | | 42 by making address assignment the default and requiring field assignment (direct or indirect via methods), |
1140 | | 43 \ie there is no builtin bit-wise or method-wise assignment, which eliminates half of the duality. |
1141 | | |
1142 | | I can follow this but I think that's mostly because I already understand what you're trying to |
1143 | | say. I don't think I've ever heard the term "method-wise assignment" and I don't see you defining |
1144 | | it. Furthermore Java does have value assignment of basic (non-class) types, so your summary here |
1145 | | feels incomplete. (If it were me I'd drop this paragraph rather than try to save it.) |
1146 | | |
1147 | | 11 Hence, for type & const, there is no pointer assignment, so &rc = &x is disallowed, and the address value |
1148 | | 12 cannot be 0 unless an arbitrary pointer is assigned to the reference. |
1149 | | |
1150 | | Given the pains you've taken to motivate every little bit of the semantics up until now, this last |
1151 | | clause ("the address value cannot be 0") comes out of the blue. It seems like you could have |
1152 | | perfectly reasonable semantics that allowed the initialization of null references. |
1153 | | |
1154 | | 12 In effect, the compiler is managing the |
1155 | | 13 addresses for type & const not the programmer, and by a programming discipline of only using references |
1156 | | 14 with references, address errors can be prevented. |
1157 | | |
1158 | | Again, is this assuming automatic storage management? |
1159 | | |
1160 | | 18 rary binding. For reference initialization (like pointer), the initializing value must be an address (lvalue) not |
1161 | | 19 a value (rvalue). |
1162 | | |
1163 | | This sentence appears to suggest that an address and an lvalue are the same thing. |
1164 | | |
1165 | | 20 int * p = &x; // both &x and x are possible interpretations |
1166 | | |
1167 | | Are you saying that we should be considering "x" as a possible interpretation of the initializer |
1168 | | "&x"? It seems to me that this expression has only one legitimate interpretation in context. |
1169 | | |
1170 | | 21 int & r = x; // x unlikely interpretation, because of auto-dereferencing |
1171 | | |
1172 | | You mean, we can initialize a reference using an integer value? Surely we would need some sort of |
1173 | | cast to induce that interpretation, no? |
1174 | | |
1175 | | 22 Hence, the compiler implicitly inserts a reference operator, &, before the initialization expression. |
1176 | | |
1177 | | But then the expression would have pointer type, which wouldn't be compatible with the type of r. |
1178 | | |
1179 | | 22 Similarly, |
1180 | | 23 when a reference is used for a parameter/return type, the call-site argument does not require a reference |
1181 | | 24 operator. |
1182 | | |
1183 | | Furthermore, it would not be correct to use a reference operator. |
1184 | | |
1185 | | 45 The implicit conversion allows |
1186 | | 1 seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call. |
1187 | | 2 While C' attempts to handle pointers and references in a uniform, symmetric manner, C handles routine |
1188 | | 3 variables in an inconsistent way: a routine variable is both a pointer and a reference (particle and wave). |
1189 | | |
1190 | | After all this talk of how expressions can have both pointer and value interpretations, you're |
1191 | | disparaging C because it has expressions that have both pointer and value interpretations? |
1192 | | |
1193 | | On Sat, Jul 9, 2016 at 4:18 PM Peter A. Buhr <pabuhr@plg.uwaterloo.ca> wrote: |
1194 | | > Aaron discovered a few places where "&"s are missing and where there are too many "&", which are |
1195 | | > corrected in the attached updated. None of the text has changed, if you have started reading |
1196 | | > already. |
1197 | | \end{comment} |
1198 | | |
1199 | | |
1200 | | \section{Routine Definition} |
1201 | | |
1202 | | \CFA also supports a new syntax for routine definition, as well as \Celeven and K\&R routine syntax. |
1203 | | The point of the new syntax is to allow returning multiple values from a routine~\cite{Galletly96,CLU}, \eg: |
1204 | | \begin{cfa} |
1205 | | ®[ int o1, int o2, char o3 ]® f( int i1, char i2, char i3 ) { |
1206 | | §\emph{routine body}§ |
1207 | | } |
1208 | | \end{cfa} |
1209 | | where routine ©f© has three output (return values) and three input parameters. |
1210 | | Existing C syntax cannot be extended with multiple return types because it is impossible to embed a single routine name within multiple return type specifications. |
1211 | | |
1212 | | In detail, the brackets, ©[]©, enclose the result type, where each return value is named and that name is a local variable of the particular return type.\footnote{ |
1213 | | \Index*{Michael Tiemann}, with help from \Index*{Doug Lea}, provided named return values in g++, circa 1989.} |
1214 | | The value of each local return variable is automatically returned at routine termination. |
1215 | | Declaration qualifiers can only appear at the start of a routine definition, \eg: |
1216 | | \begin{cfa} |
1217 | | ®extern® [ int x ] g( int y ) {§\,§} |
1218 | | \end{cfa} |
1219 | | Lastly, if there are no output parameters or input parameters, the brackets and/or parentheses must still be specified; |
1220 | | in both cases the type is assumed to be void as opposed to old style C defaults of int return type and unknown parameter types, respectively, as in: |
1221 | | \begin{cfa} |
1222 | | [§\,§] g(); §\C{// no input or output parameters}§ |
1223 | | [ void ] g( void ); §\C{// no input or output parameters}§ |
1224 | | \end{cfa} |
1225 | | |
1226 | | Routine f is called as follows: |
1227 | | \begin{cfa} |
1228 | | [ i, j, ch ] = f( 3, 'a', ch ); |
1229 | | \end{cfa} |
1230 | | The list of return values from f and the grouping on the left-hand side of the assignment is called a \newterm{return list} and discussed in Section 12. |
1231 | | |
1232 | | \CFA style declarations cannot be used to declare parameters for K\&R style routine definitions because of the following ambiguity: |
1233 | | \begin{cfa} |
1234 | | int (*f(x))[ 5 ] int x; {} |
1235 | | \end{cfa} |
1236 | | The string ``©int (*f(x))[ 5 ]©'' declares a K\&R style routine of type returning a pointer to an array of 5 integers, while the string ``©[ 5 ] int x©'' declares a \CFA style parameter x of type array of 5 integers. |
1237 | | Since the strings overlap starting with the open bracket, ©[©, there is an ambiguous interpretation for the string. |
1238 | | As well, \CFA-style declarations cannot be used to declare parameters for C-style routine-definitions because of the following ambiguity: |
1239 | | \begin{cfa} |
1240 | | typedef int foo; |
1241 | | int f( int (* foo) ); §\C{// foo is redefined as a parameter name}§ |
1242 | | \end{cfa} |
1243 | | The string ``©int (* foo)©'' declares a C-style named-parameter of type pointer to an integer (the parenthesis are superfluous), while the same string declares a \CFA style unnamed parameter of type routine returning integer with unnamed parameter of type pointer to foo. |
1244 | | The redefinition of a type name in a parameter list is the only context in C where the character ©*© can appear to the left of a type name, and \CFA relies on all type qualifier characters appearing to the right of the type name. |
1245 | | The inability to use \CFA declarations in these two contexts is probably a blessing because it precludes programmers from arbitrarily switching between declarations forms within a declaration contexts. |
1246 | | |
1247 | | C-style declarations can be used to declare parameters for \CFA style routine definitions, \eg: |
1248 | | \begin{cfa} |
1249 | | [ int ] f( * int, int * ); §\C{// returns an integer, accepts 2 pointers to integers}§ |
1250 | | [ * int, int * ] f( int ); §\C{// returns 2 pointers to integers, accepts an integer}§ |
1251 | | \end{cfa} |
1252 | | The reason for allowing both declaration styles in the new context is for backwards compatibility with existing preprocessor macros that generate C-style declaration-syntax, as in: |
1253 | | \begin{cfa} |
1254 | | #define ptoa( n, d ) int (*n)[ d ] |
1255 | | int f( ptoa( p, 5 ) ) ... §\C{// expands to int f( int (*p)[ 5 ] )}§ |
1256 | | [ int ] f( ptoa( p, 5 ) ) ... §\C{// expands to [ int ] f( int (*p)[ 5 ] )}§ |
1257 | | \end{cfa} |
1258 | | Again, programmers are highly encouraged to use one declaration form or the other, rather than mixing the forms. |
1259 | | |
1260 | | |
1261 | | \subsection{Named Return Values} |
1262 | | |
1263 | | \Index{Named return values} handle the case where it is necessary to define a local variable whose value is then returned in a ©return© statement, as in: |
1264 | | \begin{cfa} |
1265 | | int f() { |
1266 | | int x; |
1267 | | ... x = 0; ... x = y; ... |
1268 | | return x; |
1269 | | } |
1270 | | \end{cfa} |
1271 | | Because the value in the return variable is automatically returned when a \CFA routine terminates, the ©return© statement \emph{does not} contain an expression, as in: |
1272 | | \newline |
1273 | | \begin{minipage}{\linewidth} |
1274 | | \begin{cfa} |
1275 | | ®[ int x, int y ]® f() { |
1276 | | int z; |
1277 | | ... x = 0; ... y = z; ... |
1278 | | ®return;® §\C{// implicitly return x, y}§ |
1279 | | } |
1280 | | \end{cfa} |
1281 | | \end{minipage} |
1282 | | \newline |
1283 | | When the return is encountered, the current values of ©x© and ©y© are returned to the calling routine. |
1284 | | As well, ``falling off the end'' of a routine without a ©return© statement is permitted, as in: |
1285 | | \begin{cfa} |
1286 | | [ int x, int y ] f() { |
1287 | | ... |
1288 | | } §\C{// implicitly return x, y}§ |
1289 | | \end{cfa} |
1290 | | In this case, the current values of ©x© and ©y© are returned to the calling routine just as if a ©return© had been encountered. |
1291 | | |
1292 | | Named return values may be used in conjunction with named parameter values; |
1293 | | specifically, a return and parameter can have the same name. |
1294 | | \begin{cfa} |
1295 | | [ int x, int y ] f( int, x, int y ) { |
1296 | | ... |
1297 | | } §\C{// implicitly return x, y}§ |
1298 | | \end{cfa} |
1299 | | This notation allows the compiler to eliminate temporary variables in nested routine calls. |
1300 | | \begin{cfa} |
1301 | | [ int x, int y ] f( int, x, int y ); §\C{// prototype declaration}§ |
1302 | | int a, b; |
1303 | | [a, b] = f( f( f( a, b ) ) ); |
1304 | | \end{cfa} |
1305 | | While the compiler normally ignores parameters names in prototype declarations, here they are used to eliminate temporary return-values by inferring that the results of each call are the inputs of the next call, and ultimately, the left-hand side of the assignment. |
1306 | | Hence, even without the body of routine ©f© (separate compilation), it is possible to perform a global optimization across routine calls. |
1307 | | The compiler warns about naming inconsistencies between routine prototype and definition in this case, and behaviour is \Index{undefined} if the programmer is inconsistent. |
1308 | | |
1309 | | |
1310 | | \subsection{Routine Prototype} |
1311 | | |
1312 | | The syntax of the new routine prototype declaration follows directly from the new routine definition syntax; |
1313 | | as well, parameter names are optional, \eg: |
1314 | | \begin{cfa} |
1315 | | [ int x ] f (); §\C{// returning int with no parameters}§ |
1316 | | [ * int ] g (int y); §\C{// returning pointer to int with int parameter}§ |
1317 | | [ ] h ( int, char ); §\C{// returning no result with int and char parameters}§ |
1318 | | [ * int, int ] j ( int ); §\C{// returning pointer to int and int, with int parameter}§ |
1319 | | \end{cfa} |
1320 | | This syntax allows a prototype declaration to be created by cutting and pasting source text from the routine definition header (or vice versa). |
1321 | | It is possible to declare multiple routine-prototypes in a single declaration, but the entire type specification is distributed across \emph{all} routine names in the declaration list (see~\VRef{s:Declarations}), \eg: |
1322 | | \begin{quote2} |
1323 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
1324 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
1325 | | \begin{cfa} |
1326 | | [ int ] f( int ), g; |
1327 | | \end{cfa} |
1328 | | & |
1329 | | \begin{cfa} |
1330 | | int f( int ), g( int ); |
1331 | | \end{cfa} |
1332 | | \end{tabular} |
1333 | | \end{quote2} |
1334 | | Declaration qualifiers can only appear at the start of a \CFA routine declaration,\footref{StorageClassSpecifier} \eg: |
1335 | | \begin{cfa} |
1336 | | extern [ int ] f ( int ); |
1337 | | static [ int ] g ( int ); |
1338 | | \end{cfa} |
1339 | | |
1340 | | |
1341 | | \section{Routine Pointers} |
1342 | | |
1343 | | The syntax for pointers to \CFA routines specifies the pointer name on the right, \eg: |
1344 | | \begin{cfa} |
1345 | | * [ int x ] () fp; §\C{// pointer to routine returning int with no parameters}§ |
1346 | | * [ * int ] (int y) gp; §\C{// pointer to routine returning pointer to int with int parameter}§ |
1347 | | * [ ] (int,char) hp; §\C{// pointer to routine returning no result with int and char parameters}§ |
1348 | | * [ * int,int ] ( int ) jp; §\C{// pointer to routine returning pointer to int and int, with int parameter}§ |
1349 | | \end{cfa} |
1350 | | While parameter names are optional, \emph{a routine name cannot be specified}; |
1351 | | for example, the following is incorrect: |
1352 | | \begin{cfa} |
1353 | | * [ int x ] f () fp; §\C{// routine name "f" is not allowed}§ |
1354 | | \end{cfa} |
1355 | | |
1356 | | |
1357 | | \section{Named and Default Arguments} |
1358 | | |
1359 | | Named\index{named arguments}\index{arguments!named} and default\index{default arguments}\index{arguments!default} arguments~\cite{Hardgrave76}\footnote{ |
1360 | | Francez~\cite{Francez77} proposed a further extension to the named-parameter passing style, which specifies what type of communication (by value, by reference, by name) the argument is passed to the routine.} |
1361 | | are two mechanisms to simplify routine call. |
1362 | | Both mechanisms are discussed with respect to \CFA. |
1363 | | \begin{description} |
1364 | | \item[Named (or Keyword) Arguments:] |
1365 | | provide the ability to specify an argument to a routine call using the parameter name rather than the position of the parameter. |
1366 | | For example, given the routine: |
1367 | | \begin{cfa} |
1368 | | void p( int x, int y, int z ) {...} |
1369 | | \end{cfa} |
1370 | | a positional call is: |
1371 | | \begin{cfa} |
1372 | | p( 4, 7, 3 ); |
1373 | | \end{cfa} |
1374 | | whereas a named (keyword) call may be: |
1375 | | \begin{cfa} |
1376 | | p( z : 3, x : 4, y : 7 ); §\C{// rewrite $\Rightarrow$ p( 4, 7, 3 )}§ |
1377 | | \end{cfa} |
1378 | | Here the order of the arguments is unimportant, and the names of the parameters are used to associate argument values with the corresponding parameters. |
1379 | | The compiler rewrites a named call into a positional call. |
1380 | | The advantages of named parameters are: |
1381 | | \begin{itemize} |
1382 | | \item |
1383 | | Remembering the names of the parameters may be easier than the order in the routine definition. |
1384 | | \item |
1385 | | Parameter names provide documentation at the call site (assuming the names are descriptive). |
1386 | | \item |
1387 | | Changes can be made to the order or number of parameters without affecting the call (although the call must still be recompiled). |
1388 | | \end{itemize} |
1389 | | |
1390 | | Unfortunately, named arguments do not work in C-style programming-languages because a routine prototype is not required to specify parameter names, nor do the names in the prototype have to match with the actual definition. |
1391 | | For example, the following routine prototypes and definition are all valid. |
1392 | | \begin{cfa} |
1393 | | void p( int, int, int ); §\C{// equivalent prototypes}§ |
1394 | | void p( int x, int y, int z ); |
1395 | | void p( int y, int x, int z ); |
1396 | | void p( int z, int y, int x ); |
1397 | | void p( int q, int r, int s ) {} §\C{// match with this definition}§ |
1398 | | \end{cfa} |
1399 | | Forcing matching parameter names in routine prototypes with corresponding routine definitions is possible, but goes against a strong tradition in C programming. |
1400 | | Alternatively, prototype definitions can be eliminated by using a two-pass compilation, and implicitly creating header files for exports. |
1401 | | The former is easy to do, while the latter is more complex. |
1402 | | |
1403 | | Furthermore, named arguments do not work well in a \CFA-style programming-languages because they potentially introduces a new criteria for type matching. |
1404 | | For example, it is technically possible to disambiguate between these two overloaded definitions of ©f© based on named arguments at the call site: |
1405 | | \begin{cfa} |
1406 | | int f( int i, int j ); |
1407 | | int f( int x, double y ); |
1408 | | |
1409 | | f( j : 3, i : 4 ); §\C{// 1st f}§ |
1410 | | f( x : 7, y : 8.1 ); §\C{// 2nd f}§ |
1411 | | f( 4, 5 ); §\C{// ambiguous call}§ |
1412 | | \end{cfa} |
1413 | | However, named arguments compound routine resolution in conjunction with conversions: |
1414 | | \begin{cfa} |
1415 | | f( i : 3, 5.7 ); §\C{// ambiguous call ?}§ |
1416 | | \end{cfa} |
1417 | | Depending on the cost associated with named arguments, this call could be resolvable or ambiguous. |
1418 | | Adding named argument into the routine resolution algorithm does not seem worth the complexity. |
1419 | | Therefore, \CFA does \emph{not} attempt to support named arguments. |
1420 | | |
1421 | | \item[Default Arguments] |
1422 | | provide the ability to associate a default value with a parameter so it can be optionally specified in the argument list. |
1423 | | For example, given the routine: |
1424 | | \begin{cfa} |
1425 | | void p( int x = 1, int y = 2, int z = 3 ) {...} |
1426 | | \end{cfa} |
1427 | | the allowable positional calls are: |
1428 | | \begin{cfa} |
1429 | | p(); §\C{// rewrite $\Rightarrow$ p( 1, 2, 3 )}§ |
1430 | | p( 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 2, 3 )}§ |
1431 | | p( 4, 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 4, 3 )}§ |
1432 | | p( 4, 4, 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 4, 4 )}§ |
1433 | | // empty arguments |
1434 | | p( , 4, 4 ); §\C{// rewrite $\Rightarrow$ p( 1, 4, 4 )}§ |
1435 | | p( 4, , 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 2, 4 )}§ |
1436 | | p( 4, 4, ); §\C{// rewrite $\Rightarrow$ p( 4, 4, 3 )}§ |
1437 | | p( 4, , ); §\C{// rewrite $\Rightarrow$ p( 4, 2, 3 )}§ |
1438 | | p( , 4, ); §\C{// rewrite $\Rightarrow$ p( 1, 4, 3 )}§ |
1439 | | p( , , 4 ); §\C{// rewrite $\Rightarrow$ p( 1, 2, 4 )}§ |
1440 | | p( , , ); §\C{// rewrite $\Rightarrow$ p( 1, 2, 3 )}§ |
1441 | | \end{cfa} |
1442 | | Here the missing arguments are inserted from the default values in the parameter list. |
1443 | | The compiler rewrites missing default values into explicit positional arguments. |
1444 | | The advantages of default values are: |
1445 | | \begin{itemize} |
1446 | | \item |
1447 | | Routines with a large number of parameters are often very generalized, giving a programmer a number of different options on how a computation is performed. |
1448 | | For many of these kinds of routines, there are standard or default settings that work for the majority of computations. |
1449 | | Without default values for parameters, a programmer is forced to specify these common values all the time, resulting in long argument lists that are error prone. |
1450 | | \item |
1451 | | When a routine's interface is augmented with new parameters, it extends the interface providing generalizability\footnote{ |
1452 | | ``It should be possible for the implementor of an abstraction to increase its generality. |
1453 | | So long as the modified abstraction is a generalization of the original, existing uses of the abstraction will not require change. |
1454 | | It might be possible to modify an abstraction in a manner which is not a generalization without affecting existing uses, but, without inspecting the modules in which the uses occur, this possibility cannot be determined. |
1455 | | This criterion precludes the addition of parameters, unless these parameters have default or inferred values that are valid for all possible existing applications.''~\cite[p.~128]{Cormack90}} |
1456 | | (somewhat like the generalization provided by inheritance for classes). |
1457 | | That is, all existing calls are still valid, although the call must still be recompiled. |
1458 | | \end{itemize} |
1459 | | The only disadvantage of default arguments is that unintentional omission of an argument may not result in a compiler-time error. |
1460 | | Instead, a default value is used, which may not be the programmer's intent. |
1461 | | |
1462 | | Default values may only appear in a prototype versus definition context: |
1463 | | \begin{cfa} |
1464 | | void p( int x, int y = 2, int z = 3 ); §\C{// prototype: allowed}§ |
1465 | | void p( int, int = 2, int = 3 ); §\C{// prototype: allowed}§ |
1466 | | void p( int x, int y = 2, int z = 3 ) {} §\C{// definition: not allowed}§ |
1467 | | \end{cfa} |
1468 | | The reason for this restriction is to allow separate compilation. |
1469 | | Multiple prototypes with different default values is an error. |
1470 | | \end{description} |
1471 | | |
1472 | | Ellipse (``...'') arguments present problems when used with default arguments. |
1473 | | The conflict occurs because both named and ellipse arguments must appear after positional arguments, giving two possibilities: |
1474 | | \begin{cfa} |
1475 | | p( /* positional */, ... , /* named */ ); |
1476 | | p( /* positional */, /* named */, ... ); |
1477 | | \end{cfa} |
1478 | | While it is possible to implement both approaches, the first possibly is more complex than the second, \eg: |
1479 | | \begin{cfa} |
1480 | | p( int x, int y, int z, ... ); |
1481 | | p( 1, 4, 5, 6, z : 3, y : 2 ); §\C{// assume p( /* positional */, ... , /* named */ );}§ |
1482 | | p( 1, z : 3, y : 2, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§ |
1483 | | \end{cfa} |
1484 | | In the first call, it is necessary for the programmer to conceptually rewrite the call, changing named arguments into positional, before knowing where the ellipse arguments begin. |
1485 | | Hence, this approach seems significantly more difficult, and hence, confusing and error prone. |
1486 | | In the second call, the named arguments separate the positional and ellipse arguments, making it trivial to read the call. |
1487 | | |
1488 | | The problem is exacerbated with default arguments, \eg: |
1489 | | \begin{cfa} |
1490 | | void p( int x, int y = 2, int z = 3... ); |
1491 | | p( 1, 4, 5, 6, z : 3 ); §\C{// assume p( /* positional */, ... , /* named */ );}§ |
1492 | | p( 1, z : 3, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§ |
1493 | | \end{cfa} |
1494 | | The first call is an error because arguments 4 and 5 are actually positional not ellipse arguments; |
1495 | | therefore, argument 5 subsequently conflicts with the named argument z : 3. |
1496 | | In the second call, the default value for y is implicitly inserted after argument 1 and the named arguments separate the positional and ellipse arguments, making it trivial to read the call. |
1497 | | For these reasons, \CFA requires named arguments before ellipse arguments. |
1498 | | Finally, while ellipse arguments are needed for a small set of existing C routines, like printf, the extended \CFA type system largely eliminates the need for ellipse arguments (see Section 24), making much of this discussion moot. |
1499 | | |
1500 | | Default arguments and overloading (see Section 24) are complementary. |
1501 | | While in theory default arguments can be simulated with overloading, as in: |
1502 | | \begin{quote2} |
1503 | | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
1504 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{default arguments}} & \multicolumn{1}{c}{\textbf{overloading}} \\ |
1505 | | \begin{cfa} |
1506 | | void p( int x, int y = 2, int z = 3 ) {...} |
1507 | | |
1508 | | |
1509 | | \end{cfa} |
1510 | | & |
1511 | | \begin{cfa} |
1512 | | void p( int x, int y, int z ) {...} |
1513 | | void p( int x ) { p( x, 2, 3 ); } |
1514 | | void p( int x, int y ) { p( x, y, 3 ); } |
1515 | | \end{cfa} |
1516 | | \end{tabular} |
1517 | | \end{quote2} |
1518 | | the number of required overloaded routines is linear in the number of default values, which is unacceptable growth. |
1519 | | In general, overloading should only be used over default arguments if the body of the routine is significantly different. |
1520 | | Furthermore, overloading cannot handle accessing default arguments in the middle of a positional list, via a missing argument, such as: |
1521 | | \begin{cfa} |
1522 | | p( 1, /* default */, 5 ); §\C{// rewrite $\Rightarrow$ p( 1, 2, 5 )}§ |
1523 | | \end{cfa} |
1524 | | |
1525 | | Given the \CFA restrictions above, both named and default arguments are backwards compatible. |
1526 | | \Index*[C++]{\CC{}} only supports default arguments; |
1527 | | \Index*{Ada} supports both named and default arguments. |
1528 | | |
1529 | | |
1530 | | \section{Unnamed Structure Fields} |
1531 | | |
1532 | | C requires each field of a structure to have a name, except for a bit field associated with a basic type, \eg: |
1533 | | \begin{cfa} |
1534 | | struct { |
1535 | | int f1; §\C{// named field}§ |
1536 | | int f2 : 4; §\C{// named field with bit field size}§ |
1537 | | int : 3; §\C{// unnamed field for basic type with bit field size}§ |
1538 | | int ; §\C{// disallowed, unnamed field}§ |
1539 | | int *; §\C{// disallowed, unnamed field}§ |
1540 | | int (*)( int ); §\C{// disallowed, unnamed field}§ |
1541 | | }; |
1542 | | \end{cfa} |
1543 | | This requirement is relaxed by making the field name optional for all field declarations; therefore, all the field declarations in the example are allowed. |
1544 | | As for unnamed bit fields, an unnamed field is used for padding a structure to a particular size. |
1545 | | A list of unnamed fields is also supported, \eg: |
1546 | | \begin{cfa} |
1547 | | struct { |
1548 | | int , , ; §\C{// 3 unnamed fields}§ |
1549 | | } |
1550 | | \end{cfa} |
1551 | | |
1552 | | |
1553 | | \section{Nesting} |
1554 | | |
1555 | | Nesting of types and routines is useful for controlling name visibility (\newterm{name hiding}). |
1556 | | |
1557 | | |
1558 | | \subsection{Type Nesting} |
1559 | | |
1560 | | \CFA allows \Index{type nesting}, and type qualification of the nested types (see \VRef[Figure]{f:TypeNestingQualification}), where as C hoists\index{type hoisting} (refactors) nested types into the enclosing scope and has no type qualification. |
1561 | | \begin{figure} |
1562 | | \centering |
1563 | | \begin{tabular}{@{}l@{\hspace{3em}}l|l@{}} |
1564 | | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{C Type Nesting}} & \multicolumn{1}{c}{\textbf{C Implicit Hoisting}} & \multicolumn{1}{|c}{\textbf{\CFA}} \\ |
1565 | | \hline |
1566 | | \begin{cfa} |
1567 | | struct S { |
1568 | | enum C { R, G, B }; |
1569 | | struct T { |
1570 | | union U { int i, j; }; |
1571 | | enum C c; |
1572 | | short int i, j; |
1573 | | }; |
1574 | | struct T t; |
1575 | | } s; |
1576 | | |
1577 | | int fred() { |
1578 | | s.t.c = R; |
1579 | | struct T t = { R, 1, 2 }; |
1580 | | enum C c; |
1581 | | union U u; |
1582 | | } |
1583 | | \end{cfa} |
1584 | | & |
1585 | | \begin{cfa} |
1586 | | enum C { R, G, B }; |
1587 | | union U { int i, j; }; |
1588 | | struct T { |
1589 | | enum C c; |
1590 | | short int i, j; |
1591 | | }; |
1592 | | struct S { |
1593 | | struct T t; |
1594 | | } s; |
1595 | | |
1596 | | |
1597 | | |
1598 | | |
1599 | | |
1600 | | |
1601 | | |
1602 | | \end{cfa} |
1603 | | & |
1604 | | \begin{cfa} |
1605 | | struct S { |
1606 | | enum C { R, G, B }; |
1607 | | struct T { |
1608 | | union U { int i, j; }; |
1609 | | enum C c; |
1610 | | short int i, j; |
1611 | | }; |
1612 | | struct T t; |
1613 | | } s; |
1614 | | |
1615 | | int fred() { |
1616 | | s.t.c = ®S.®R; // type qualification |
1617 | | struct ®S.®T t = { ®S.®R, 1, 2 }; |
1618 | | enum ®S.®C c; |
1619 | | union ®S.T.®U u; |
1620 | | } |
1621 | | \end{cfa} |
1622 | | \end{tabular} |
1623 | | \caption{Type Nesting / Qualification} |
1624 | | \label{f:TypeNestingQualification} |
1625 | | \end{figure} |
1626 | | In the left example in C, types ©C©, ©U© and ©T© are implicitly hoisted outside of type ©S© into the containing block scope. |
1627 | | In the right example in \CFA, the types are not hoisted and accessed using the field-selection operator ``©.©'' for type qualification, as does \Index*{Java}, rather than the \CC type-selection operator ``©::©''. |
1628 | | |
1629 | | |
1630 | | \subsection{Routine Nesting} |
1631 | | |
1632 | | While \CFA does not provide object programming by putting routines into structures, it does rely heavily on locally nested routines to redefine operations at or close to a call site. |
1633 | | For example, the C quick-sort is wrapped into the following polymorphic \CFA routine: |
1634 | | \begin{cfa} |
1635 | | forall( otype T | { int ?<?( T, T ); } ) |
1636 | | void qsort( const T * arr, size_t dimension ); |
1637 | | \end{cfa} |
1638 | | which can be used to sort in ascending and descending order by locally redefining the less-than operator into greater-than. |
1639 | | \begin{cfa} |
1640 | | const unsigned int size = 5; |
1641 | | int ia[size]; |
1642 | | ... §\C{// assign values to array ia}§ |
1643 | | qsort( ia, size ); §\C{// sort ascending order using builtin ?<?}§ |
1644 | | { |
1645 | | ®int ?<?( int x, int y ) { return x > y; }® §\C{// nested routine}§ |
1646 | | qsort( ia, size ); §\C{// sort descending order by local redefinition}§ |
1647 | | } |
1648 | | \end{cfa} |
1649 | | |
1650 | | Nested routines are not first-class, meaning a nested routine cannot be returned if it has references to variables in its enclosing blocks; |
1651 | | the only exception is references to the external block of the translation unit, as these variables persist for the duration of the program. |
1652 | | The following program in undefined in \CFA (and Indexc{gcc}) |
1653 | | \begin{cfa} |
1654 | | [* [int]( int )] foo() { §\C{// int (*foo())( int )}§ |
1655 | | int ®i® = 7; |
1656 | | int bar( int p ) { |
1657 | | ®i® += 1; §\C{// dependent on local variable}§ |
1658 | | sout | ®i® | endl; |
1659 | | } |
1660 | | return bar; §\C{// undefined because of local dependence}§ |
1661 | | } |
1662 | | int main() { |
1663 | | * [int]( int ) fp = foo(); §\C{// int (*fp)( int )}§ |
1664 | | sout | fp( 3 ) | endl; |
1665 | | } |
1666 | | \end{cfa} |
1667 | | because |
1668 | | |
1669 | | Currently, there are no \Index{lambda} expressions, \ie unnamed routines because routine names are very important to properly select the correct routine. |
1670 | | |
1671 | | |
1672 | | \section{Tuples} |
1673 | | |
1674 | | In C and \CFA, lists of elements appear in several contexts, such as the parameter list for a routine call. |
1675 | | (More contexts are added shortly.) |
1676 | | A list of such elements is called a \newterm{lexical list}. |
1677 | | The general syntax of a lexical list is: |
1678 | | \begin{cfa} |
1679 | | [ §\emph{exprlist}§ ] |
1680 | | \end{cfa} |
1681 | | where ©$\emph{exprlist}$© is a list of one or more expressions separated by commas. |
1682 | | The brackets, ©[]©, allow differentiating between lexical lists and expressions containing the C comma operator. |
1683 | | The following are examples of lexical lists: |
1684 | | \begin{cfa} |
1685 | | [ x, y, z ] |
1686 | | [ 2 ] |
1687 | | [ v+w, x*y, 3.14159, f() ] |
1688 | | \end{cfa} |
1689 | | Tuples are permitted to contain sub-tuples (\ie nesting), such as ©[ [ 14, 21 ], 9 ]©, which is a 2-element tuple whose first element is itself a tuple. |
1690 | | Note, a tuple is not a record (structure); |
1691 | | a record denotes a single value with substructure, whereas a tuple is multiple values with no substructure (see flattening coercion in Section 12.1). |
1692 | | In essence, tuples are largely a compile time phenomenon, having little or no runtime presence. |
1693 | | |
1694 | | Tuples can be organized into compile-time tuple variables; |
1695 | | these variables are of \newterm{tuple type}. |
1696 | | Tuple variables and types can be used anywhere lists of conventional variables and types can be used. |
1697 | | The general syntax of a tuple type is: |
1698 | | \begin{cfa} |
1699 | | [ §\emph{typelist}§ ] |
1700 | | \end{cfa} |
1701 | | where ©$\emph{typelist}$© is a list of one or more legal \CFA or C type specifications separated by commas, which may include other tuple type specifications. |
1702 | | Examples of tuple types include: |
1703 | | \begin{cfa} |
1704 | | [ unsigned int, char ] |
1705 | | [ double, double, double ] |
1706 | | [ * int, int * ] §\C{// mix of CFA and ANSI}§ |
1707 | | [ * [ 5 ] int, * * char, * [ [ int, int ] ] (int, int) ] |
1708 | | \end{cfa} |
1709 | | Like tuples, tuple types may be nested, such as ©[ [ int, int ], int ]©, which is a 2-element tuple type whose first element is itself a tuple type. |
1710 | | |
1711 | | Examples of declarations using tuple types are: |
1712 | | \begin{cfa} |
1713 | | [ int, int ] x; §\C{// 2 element tuple, each element of type int}§ |
1714 | | * [ char, char ] y; §\C{// pointer to a 2 element tuple}§ |
1715 | | [ [ int, int ] ] z ([ int, int ]); |
1716 | | \end{cfa} |
1717 | | The last example declares an external routine that expects a 2 element tuple as an input parameter and returns a 2 element tuple as its result. |
1718 | | |
1719 | | As mentioned, tuples can appear in contexts requiring a list of value, such as an argument list of a routine call. |
1720 | | In unambiguous situations, the tuple brackets may be omitted, \eg a tuple that appears as an argument may have its |
1721 | | square brackets omitted for convenience; therefore, the following routine invocations are equivalent: |
1722 | | \begin{cfa} |
1723 | | f( [ 1, x+2, fred() ] ); |
1724 | | f( 1, x+2, fred() ); |
1725 | | \end{cfa} |
1726 | | Also, a tuple or a tuple variable may be used to supply all or part of an argument list for a routine expecting multiple input parameters or for a routine expecting a tuple as an input parameter. |
1727 | | For example, the following are all legal: |
1728 | | \begin{cfa} |
1729 | | [ int, int ] w1; |
1730 | | [ int, int, int ] w2; |
1731 | | [ void ] f (int, int, int); /* three input parameters of type int */ |
1732 | | [ void ] g ([ int, int, int ]); /* 3 element tuple as input */ |
1733 | | f( [ 1, 2, 3 ] ); |
1734 | | f( w1, 3 ); |
1735 | | f( 1, w1 ); |
1736 | | f( w2 ); |
1737 | | g( [ 1, 2, 3 ] ); |
1738 | | g( w1, 3 ); |
1739 | | g( 1, w1 ); |
1740 | | g( w2 ); |
1741 | | \end{cfa} |
1742 | | Note, in all cases 3 arguments are supplied even though the syntax may appear to supply less than 3. As mentioned, a |
1743 | | tuple does not have structure like a record; a tuple is simply converted into a list of components. |
1744 | | \begin{rationale} |
1745 | | The present implementation of \CFA does not support nested routine calls when the inner routine returns multiple values; \ie a statement such as ©g( f() )© is not supported. |
1746 | | Using a temporary variable to store the results of the inner routine and then passing this variable to the outer routine works, however. |
1747 | | \end{rationale} |
1748 | | |
1749 | | A tuple can contain a C comma expression, provided the expression containing the comma operator is enclosed in parentheses. |
1750 | | For instance, the following tuples are equivalent: |
1751 | | \begin{cfa} |
1752 | | [ 1, 3, 5 ] |
1753 | | [ 1, (2, 3), 5 ] |
1754 | | \end{cfa} |
1755 | | The second element of the second tuple is the expression (2, 3), which yields the result 3. |
1756 | | This requirement is the same as for comma expressions in argument lists. |
1757 | | |
1758 | | Type qualifiers, \ie const and volatile, may modify a tuple type. |
1759 | | The meaning is the same as for a type qualifier modifying an aggregate type [Int99, x 6.5.2.3(7),x 6.7.3(11)], \ie the qualifier is distributed across all of the types in the tuple, \eg: |
1760 | | \begin{cfa} |
1761 | | const volatile [ int, float, const int ] x; |
1762 | | \end{cfa} |
1763 | | is equivalent to: |
1764 | | \begin{cfa} |
1765 | | [ const volatile int, const volatile float, const volatile int ] x; |
1766 | | \end{cfa} |
1767 | | Declaration qualifiers can only appear at the start of a \CFA tuple declaration4, \eg: |
1768 | | \begin{cfa} |
1769 | | extern [ int, int ] w1; |
1770 | | static [ int, int, int ] w2; |
1771 | | \end{cfa} |
1772 | | \begin{rationale} |
1773 | | Unfortunately, C's syntax for subscripts precluded treating them as tuples. |
1774 | | The C subscript list has the form ©[i][j]...© and not ©[i, j, ...]©. |
1775 | | Therefore, there is no syntactic way for a routine returning multiple values to specify the different subscript values, \eg ©f[g()]© always means a single subscript value because there is only one set of brackets. |
1776 | | Fixing this requires a major change to C because the syntactic form ©M[i, j, k]© already has a particular meaning: ©i, j, k© is a comma expression. |
1777 | | \end{rationale} |
1778 | | |
1779 | | |
1780 | | \subsection{Tuple Coercions} |
1781 | | |
1782 | | There are four coercions that can be performed on tuples and tuple variables: closing, opening, flattening and structuring. |
1783 | | In addition, the coercion of dereferencing can be performed on a tuple variable to yield its value(s), as for other variables. |
1784 | | A \newterm{closing coercion} takes a set of values and converts it into a tuple value, which is a contiguous set of values, as in: |
1785 | | \begin{cfa} |
1786 | | [ int, int, int, int ] w; |
1787 | | w = [ 1, 2, 3, 4 ]; |
1788 | | \end{cfa} |
1789 | | First the right-hand tuple is closed into a tuple value and then the tuple value is assigned. |
1790 | | |
1791 | | An \newterm{opening coercion} is the opposite of closing; a tuple value is converted into a tuple of values, as in: |
1792 | | \begin{cfa} |
1793 | | [ a, b, c, d ] = w |
1794 | | \end{cfa} |
1795 | | ©w© is implicitly opened to yield a tuple of four values, which are then assigned individually. |
1796 | | |
1797 | | A \newterm{flattening coercion} coerces a nested tuple, \ie a tuple with one or more components, which are themselves tuples, into a flattened tuple, which is a tuple whose components are not tuples, as in: |
1798 | | \begin{cfa} |
1799 | | [ a, b, c, d ] = [ 1, [ 2, 3 ], 4 ]; |
1800 | | \end{cfa} |
1801 | | First the right-hand tuple is flattened and then the values are assigned individually. |
1802 | | Flattening is also performed on tuple types. |
1803 | | For example, the type ©[ int, [ int, int ], int ]© can be coerced, using flattening, into the type ©[ int, int, int, int ]©. |
1804 | | |
1805 | | A \newterm{structuring coercion} is the opposite of flattening; |
1806 | | a tuple is structured into a more complex nested tuple. |
1807 | | For example, structuring the tuple ©[ 1, 2, 3, 4 ]© into the tuple ©[ 1, [ 2, 3 ], 4 ]© or the tuple type ©[ int, int, int, int ]© into the tuple type ©[ int, [ int, int ], int ]©. |
1808 | | In the following example, the last assignment illustrates all the tuple coercions: |
1809 | | \begin{cfa} |
1810 | | [ int, int, int, int ] w = [ 1, 2, 3, 4 ]; |
1811 | | int x = 5; |
1812 | | [ x, w ] = [ w, x ]; §\C{// all four tuple coercions}§ |
1813 | | \end{cfa} |
1814 | | Starting on the right-hand tuple in the last assignment statement, w is opened, producing a tuple of four values; |
1815 | | therefore, the right-hand tuple is now the tuple ©[ [ 1, 2, 3, 4 ], 5 ]©. |
1816 | | This tuple is then flattened, yielding ©[ 1, 2, 3, 4, 5 ]©, which is structured into ©[ 1, [ 2, 3, 4, 5 ] ]© to match the tuple type of the left-hand side. |
1817 | | The tuple ©[ 2, 3, 4, 5 ]© is then closed to create a tuple value. |
1818 | | Finally, ©x© is assigned ©1© and ©w© is assigned the tuple value using multiple assignment (see Section 14). |
1819 | | \begin{rationale} |
1820 | | A possible additional language extension is to use the structuring coercion for tuples to initialize a complex record with a tuple. |
1821 | | \end{rationale} |
1822 | | |
1823 | | |
1824 | | \section{Mass Assignment} |
1825 | | |
1826 | | \CFA permits assignment to several variables at once using mass assignment~\cite{CLU}. |
1827 | | Mass assignment has the following form: |
1828 | | \begin{cfa} |
1829 | | [ §\emph{lvalue}§, ... , §\emph{lvalue}§ ] = §\emph{expr}§; |
1830 | | \end{cfa} |
1831 | | \index{lvalue} |
1832 | | The left-hand side is a tuple of \emph{lvalues}, which is a list of expressions each yielding an address, \ie any data object that can appear on the left-hand side of a conventional assignment statement. |
1833 | | ©$\emph{expr}$© is any standard arithmetic expression. |
1834 | | Clearly, the types of the entities being assigned must be type compatible with the value of the expression. |
1835 | | |
1836 | | Mass assignment has parallel semantics, \eg the statement: |
1837 | | \begin{cfa} |
1838 | | [ x, y, z ] = 1.5; |
1839 | | \end{cfa} |
1840 | | is equivalent to: |
1841 | | \begin{cfa} |
1842 | | x = 1.5; y = 1.5; z = 1.5; |
1843 | | \end{cfa} |
1844 | | This semantics is not the same as the following in C: |
1845 | | \begin{cfa} |
1846 | | x = y = z = 1.5; |
1847 | | \end{cfa} |
1848 | | as conversions between intermediate assignments may lose information. |
1849 | | A more complex example is: |
1850 | | \begin{cfa} |
1851 | | [ i, y[i], z ] = a + b; |
1852 | | \end{cfa} |
1853 | | which is equivalent to: |
1854 | | \begin{cfa} |
1855 | | t = a + b; |
1856 | | a1 = &i; a2 = &y[i]; a3 = &z; |
1857 | | *a1 = t; *a2 = t; *a3 = t; |
1858 | | \end{cfa} |
1859 | | The temporary ©t© is necessary to store the value of the expression to eliminate conversion issues. |
1860 | | The temporaries for the addresses are needed so that locations on the left-hand side do not change as the values are assigned. |
1861 | | In this case, ©y[i]© uses the previous value of ©i© and not the new value set at the beginning of the mass assignment. |
1862 | | |
1863 | | |
1864 | | \section{Multiple Assignment} |
1865 | | |
1866 | | \CFA also supports the assignment of several values at once, known as multiple assignment~\cite{CLU,Galletly96}. |
1867 | | Multiple assignment has the following form: |
1868 | | \begin{cfa} |
1869 | | [ §\emph{lvalue}§, ... , §\emph{lvalue}§ ] = [ §\emph{expr}§, ... , §\emph{expr}§ ]; |
1870 | | \end{cfa} |
1871 | | \index{lvalue} |
1872 | | The left-hand side is a tuple of \emph{lvalues}, and the right-hand side is a tuple of \emph{expr}s. |
1873 | | Each \emph{expr} appearing on the right-hand side of a multiple assignment statement is assigned to the corresponding \emph{lvalues} on the left-hand side of the statement using parallel semantics for each assignment. |
1874 | | An example of multiple assignment is: |
1875 | | \begin{cfa} |
1876 | | [ x, y, z ] = [ 1, 2, 3 ]; |
1877 | | \end{cfa} |
1878 | | Here, the values ©1©, ©2© and ©3© are assigned, respectively, to the variables ©x©, ©y© and ©z©. |
1879 | | A more complex example is: |
1880 | | \begin{cfa} |
1881 | | [ i, y[ i ], z ] = [ 1, i, a + b ]; |
1882 | | \end{cfa} |
1883 | | Here, the values ©1©, ©i© and ©a + b© are assigned to the variables ©i©, ©y[i]© and ©z©, respectively. |
1884 | | Note, the parallel semantics of |
1885 | | multiple assignment ensures: |
1886 | | \begin{cfa} |
1887 | | [ x, y ] = [ y, x ]; |
1888 | | \end{cfa} |
1889 | | correctly interchanges (swaps) the values stored in ©x© and ©y©. |
1890 | | The following cases are errors: |
1891 | | \begin{cfa} |
1892 | | [ a, b, c ] = [ 1, 2, 3, 4 ]; |
1893 | | [ a, b, c ] = [ 1, 2 ]; |
1894 | | \end{cfa} |
1895 | | because the number of entities in the left-hand tuple is unequal with the right-hand tuple. |
1896 | | |
1897 | | As for all tuple contexts in C, side effects should not be used because C does not define an ordering for the evaluation of the elements of a tuple; |
1898 | | both these examples produce indeterminate results: |
1899 | | \begin{cfa} |
1900 | | f( x++, x++ ); §\C{// C routine call with side effects in arguments}§ |
1901 | | [ v1, v2 ] = [ x++, x++ ]; §\C{// side effects in righthand side of multiple assignment}§ |
1902 | | \end{cfa} |
1903 | | |
1904 | | |
1905 | | \section{Cascade Assignment} |
1906 | | |
1907 | | As in C, \CFA mass and multiple assignments can be cascaded, producing cascade assignment. |
1908 | | Cascade assignment has the following form: |
1909 | | \begin{cfa} |
1910 | | §\emph{tuple}§ = §\emph{tuple}§ = ... = §\emph{tuple}§; |
1911 | | \end{cfa} |
1912 | | and it has the same parallel semantics as for mass and multiple assignment. |
1913 | | Some examples of cascade assignment are: |
1914 | | \begin{cfa} |
1915 | | x1 = y1 = x2 = y2 = 0; |
1916 | | [ x1, y1 ] = [ x2, y2 ] = [ x3, y3 ]; |
1917 | | [ x1, y1 ] = [ x2, y2 ] = 0; |
1918 | | [ x1, y1 ] = z = 0; |
1919 | | \end{cfa} |
1920 | | As in C, the rightmost assignment is performed first, \ie assignment parses right to left. |
1921 | | |
1922 | | |
1923 | | \section{Field Tuples} |
1924 | | |
1925 | | Tuples may be used to select multiple fields of a record by field name. |
1926 | | Its general form is: |
1927 | | \begin{cfa} |
1928 | | §\emph{expr}§ . [ §\emph{fieldlist}§ ] |
1929 | | §\emph{expr}§ -> [ §\emph{fieldlist}§ ] |
1930 | | \end{cfa} |
1931 | | \emph{expr} is any expression yielding a value of type record, \eg ©struct©, ©union©. |
1932 | | Each element of \emph{ fieldlist} is an element of the record specified by \emph{expr}. |
1933 | | A record-field tuple may be used anywhere a tuple can be used. An example of the use of a record-field tuple is |
1934 | | the following: |
1935 | | \begin{cfa} |
1936 | | struct s { |
1937 | | int f1, f2; |
1938 | | char f3; |
1939 | | double f4; |
1940 | | } v; |
1941 | | v.[ f3, f1, f2 ] = ['x', 11, 17 ]; §\C{// equivalent to v.f3 = 'x', v.f1 = 11, v.f2 = 17}§ |
1942 | | f( v.[ f3, f1, f2 ] ); §\C{// equivalent to f( v.f3, v.f1, v.f2 )}§ |
1943 | | \end{cfa} |
1944 | | Note, the fields appearing in a record-field tuple may be specified in any order; |
1945 | | also, it is unnecessary to specify all the fields of a struct in a multiple record-field tuple. |
1946 | | |
1947 | | If a field of a ©struct© is itself another ©struct©, multiple fields of this subrecord can be specified using a nested record-field tuple, as in the following example: |
1948 | | \begin{cfa} |
1949 | | struct inner { |
1950 | | int f2, f3; |
1951 | | }; |
1952 | | struct outer { |
1953 | | int f1; |
1954 | | struct inner i; |
1955 | | double f4; |
1956 | | } o; |
1957 | | |
1958 | | o.[ f1, i.[ f2, f3 ], f4 ] = [ 11, 12, 13, 3.14159 ]; |
1959 | | \end{cfa} |
| 930 | \subsection{Exception Hierarchy} |
| 931 | |
| 932 | An exception type can be derived from another exception type, just like deriving a subclass from a class, providing a kind of polymorphism among exception types. |
| 933 | The exception-type hierarchy that is created is used to organize exception types, similar to a class hierarchy in object-oriented languages, \eg: |
| 934 | \begin{center} |
| 935 | \input{EHMHierarchy} |
| 936 | \end{center} |
| 937 | A programmer can then choose to handle an exception at different degrees of specificity along the hierarchy; |
| 938 | derived exception-types support a more flexible programming style. |
| 939 | For example, higher-level code should catch general exceptions to reduce coupling to the specific implementation at the lower levels; |
| 940 | unnecessary coupling may force changes in higher-level code when low-level code changes. |
| 941 | A consequence of derived exception-types is that multiple exceptions may match, \eg: |
| 942 | \begin{cfa} |
| 943 | catch( Arithmetic ) |
| 944 | \end{cfa} |
| 945 | matches all three derived exception-types: ©DivideByZero©, ©Overflow©, and ©Underflow©. |
| 946 | Because the propagation mechanisms perform a simple linear search of the handler clause for a guarded block, and selects the first matching handler, the order of catch clauses in the handler clause becomes important, \eg: |
| 947 | \begin{cfa} |
| 948 | try { |
| 949 | ... |
| 950 | } catch( Overflow ) { // must appear first |
| 951 | // handle overflow |
| 952 | } catch( Arithmetic ) |
| 953 | // handle other arithmetic issues |
| 954 | } |
| 955 | \end{cfa} |
| 956 | \newterm{Multiple derivation} among exception is not supported. |
| 957 | |
| 958 | |
| 959 | \section{Declarations} |
| 960 | \label{s:Declarations} |
| 961 | |
| 962 | C declaration syntax is notoriously confusing and error prone. |
| 963 | For example, many C programmers are confused by a declaration as simple as: |
| 964 | \begin{quote2} |
| 965 | \begin{tabular}{@{}ll@{}} |
| 966 | \begin{cfa} |
| 967 | int * x[5] |
| 968 | \end{cfa} |
| 969 | & |
| 970 | \raisebox{-0.75\totalheight}{\input{Cdecl}} |
| 971 | \end{tabular} |
| 972 | \end{quote2} |
| 973 | Is this an array of 5 pointers to integers or a \Index{pointer} to an array of 5 integers? |
| 974 | The fact this declaration is unclear to many C programmers means there are \Index{productivity} and \Index{safety} issues even for basic programs. |
| 975 | Another example of confusion results from the fact that a routine name and its parameters are embedded within the return type, mimicking the way the return value is used at the routine's call site. |
| 976 | For example, a routine returning a \Index{pointer} to an array of integers is defined and used in the following way: |
| 977 | \begin{cfa} |
| 978 | int ®(*®f®())[®5®]® {...}; §\C{definition}§ |
| 979 | ... ®(*®f®())[®3®]® += 1; §\C{usage}§ |
| 980 | \end{cfa} |
| 981 | Essentially, the return type is wrapped around the routine name in successive layers (like an \Index{onion}). |
| 982 | While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice. |
| 983 | |
| 984 | \CFA provides its own type, variable and routine declarations, using a different syntax. |
| 985 | The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type. |
| 986 | In the following example, \R{red} is the base type and \B{blue} is qualifiers. |
| 987 | The \CFA declarations move the qualifiers to the left of the base type, \ie move the blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type. |
| 988 | \begin{quote2} |
| 989 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 990 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
| 991 | \begin{cfa} |
| 992 | ß[5] *ß ®int® x1; |
| 993 | ß* [5]ß ®int® x2; |
| 994 | ß[* [5] int]ß f®( int p )®; |
| 995 | \end{cfa} |
| 996 | & |
| 997 | \begin{cfa} |
| 998 | ®int® ß*ß x1 ß[5]ß; |
| 999 | ®int® ß(*ßx2ß)[5]ß; |
| 1000 | ßint (*ßf®( int p )®ß)[5]ß; |
| 1001 | \end{cfa} |
| 1002 | \end{tabular} |
| 1003 | \end{quote2} |
| 1004 | The only exception is \Index{bit field} specification, which always appear to the right of the base type. |
| 1005 | % Specifically, the character ©*© is used to indicate a pointer, square brackets ©[©\,©]© are used to represent an array or function return value, and parentheses ©()© are used to indicate a routine parameter. |
| 1006 | However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list. |
| 1007 | For instance, variables ©x© and ©y© of type \Index{pointer} to integer are defined in \CFA as follows: |
| 1008 | \begin{quote2} |
| 1009 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 1010 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
| 1011 | \begin{cfa} |
| 1012 | ®*® int x, y; |
| 1013 | \end{cfa} |
| 1014 | & |
| 1015 | \begin{cfa} |
| 1016 | int ®*®x, ®*®y; |
| 1017 | \end{cfa} |
| 1018 | \end{tabular} |
| 1019 | \end{quote2} |
| 1020 | The downside of this semantics is the need to separate regular and \Index{pointer} declarations: |
| 1021 | \begin{quote2} |
| 1022 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 1023 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
| 1024 | \begin{cfa} |
| 1025 | ®*® int x; |
| 1026 | int y; |
| 1027 | \end{cfa} |
| 1028 | & |
| 1029 | \begin{cfa} |
| 1030 | int ®*®x, y; |
| 1031 | |
| 1032 | \end{cfa} |
| 1033 | \end{tabular} |
| 1034 | \end{quote2} |
| 1035 | which is \Index{prescribing} a safety benefit. |
| 1036 | Other examples are: |
| 1037 | \begin{quote2} |
| 1038 | \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}} |
| 1039 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\ |
| 1040 | \begin{cfa} |
| 1041 | [ 5 ] int z; |
| 1042 | [ 5 ] * char w; |
| 1043 | * [ 5 ] double v; |
| 1044 | struct s { |
| 1045 | int f0:3; |
| 1046 | * int f1; |
| 1047 | [ 5 ] * int f2; |
| 1048 | }; |
| 1049 | \end{cfa} |
| 1050 | & |
| 1051 | \begin{cfa} |
| 1052 | int z[ 5 ]; |
| 1053 | char * w[ 5 ]; |
| 1054 | double (* v)[ 5 ]; |
| 1055 | struct s { |
| 1056 | int f0:3; |
| 1057 | int * f1; |
| 1058 | int * f2[ 5 ] |
| 1059 | }; |
| 1060 | \end{cfa} |
| 1061 | & |
| 1062 | \begin{cfa} |
| 1063 | // array of 5 integers |
| 1064 | // array of 5 pointers to char |
| 1065 | // pointer to array of 5 doubles |
| 1066 | |
| 1067 | // common bit field syntax |
| 1068 | |
| 1069 | |
| 1070 | |
| 1071 | \end{cfa} |
| 1072 | \end{tabular} |
| 1073 | \end{quote2} |
| 1074 | |
| 1075 | All type qualifiers, \eg ©const©, ©volatile©, etc., are used in the normal way with the new declarations and also appear left to right, \eg: |
| 1076 | \begin{quote2} |
| 1077 | \begin{tabular}{@{}l@{\hspace{1em}}l@{\hspace{1em}}l@{}} |
| 1078 | \multicolumn{1}{c@{\hspace{1em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{1em}}}{\textbf{C}} \\ |
| 1079 | \begin{cfa} |
| 1080 | const * const int x; |
| 1081 | const * [ 5 ] const int y; |
| 1082 | \end{cfa} |
| 1083 | & |
| 1084 | \begin{cfa} |
| 1085 | int const * const x; |
| 1086 | const int (* const y)[ 5 ] |
| 1087 | \end{cfa} |
| 1088 | & |
| 1089 | \begin{cfa} |
| 1090 | // const pointer to const integer |
| 1091 | // const pointer to array of 5 const integers |
| 1092 | \end{cfa} |
| 1093 | \end{tabular} |
| 1094 | \end{quote2} |
| 1095 | All declaration qualifiers, \eg ©extern©, ©static©, etc., are used in the normal way with the new declarations but can only appear at the start of a \CFA routine declaration,\footnote{\label{StorageClassSpecifier} |
| 1096 | The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.~\cite[\S~6.11.5(1)]{C11}} \eg: |
| 1097 | \begin{quote2} |
| 1098 | \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}} |
| 1099 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\ |
| 1100 | \begin{cfa} |
| 1101 | extern [ 5 ] int x; |
| 1102 | static * const int y; |
| 1103 | \end{cfa} |
| 1104 | & |
| 1105 | \begin{cfa} |
| 1106 | int extern x[ 5 ]; |
| 1107 | const int static * y; |
| 1108 | \end{cfa} |
| 1109 | & |
| 1110 | \begin{cfa} |
| 1111 | // externally visible array of 5 integers |
| 1112 | // internally visible pointer to constant int |
| 1113 | \end{cfa} |
| 1114 | \end{tabular} |
| 1115 | \end{quote2} |
| 1116 | |
| 1117 | The new declaration syntax can be used in other contexts where types are required, \eg casts and the pseudo-routine ©sizeof©: |
| 1118 | \begin{quote2} |
| 1119 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 1120 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
| 1121 | \begin{cfa} |
| 1122 | y = (®* int®)x; |
| 1123 | i = sizeof(®[ 5 ] * int®); |
| 1124 | \end{cfa} |
| 1125 | & |
| 1126 | \begin{cfa} |
| 1127 | y = (®int *®)x; |
| 1128 | i = sizeof(®int * [ 5 ]®); |
| 1129 | \end{cfa} |
| 1130 | \end{tabular} |
| 1131 | \end{quote2} |
| 1132 | |
| 1133 | Finally, new \CFA declarations may appear together with C declarations in the same program block, but cannot be mixed within a specific declaration. |
| 1134 | Therefore, a programmer has the option of either continuing to use traditional C declarations or take advantage of the new style. |
| 1135 | Clearly, both styles need to be supported for some time due to existing C-style header-files, particularly for UNIX systems. |
| 1136 | |
| 1137 | |
| 1138 | \section{Pointer/Reference} |
| 1139 | |
| 1140 | C provides a \newterm{pointer type}; |
| 1141 | \CFA adds a \newterm{reference type}. |
| 1142 | These types may be derived from an object or routine type, called the \newterm{referenced type}. |
| 1143 | Objects of these types contain an \newterm{address}, which is normally a location in memory, but may also address memory-mapped registers in hardware devices. |
| 1144 | An integer constant expression with the value 0, or such an expression cast to type ©void *©, is called a \newterm{null-pointer constant}.\footnote{ |
| 1145 | One way to conceptualize the null pointer is that no variable is placed at this address, so the null-pointer address can be used to denote an uninitialized pointer/reference object; |
| 1146 | \ie the null pointer is guaranteed to compare unequal to a pointer to any object or routine.} |
| 1147 | An address is \newterm{sound}, if it points to a valid memory location in scope, \ie within the program's execution-environment and has not been freed. |
| 1148 | Dereferencing an \newterm{unsound} address, including the null pointer, is \Index{undefined}, often resulting in a \Index{memory fault}. |
| 1149 | |
| 1150 | A program \newterm{object} is a region of data storage in the execution environment, the contents of which can represent values. |
| 1151 | In most cases, objects are located in memory at an address, and the variable name for an object is an implicit address to the object generated by the compiler and automatically dereferenced, as in: |
| 1152 | \begin{quote2} |
| 1153 | \begin{tabular}{@{}ll@{\hspace{2em}}l@{}} |
| 1154 | \begin{cfa} |
| 1155 | int x; |
| 1156 | x = 3; |
| 1157 | int y; |
| 1158 | y = x; |
| 1159 | \end{cfa} |
| 1160 | & |
| 1161 | \raisebox{-0.45\totalheight}{\input{pointer1}} |
| 1162 | & |
| 1163 | \begin{cfa} |
| 1164 | int * ®const® x = (int *)100 |
| 1165 | *x = 3; // implicit dereference |
| 1166 | int * ®const® y = (int *)104; |
| 1167 | *y = *x; // implicit dereference |
| 1168 | \end{cfa} |
| 1169 | \end{tabular} |
| 1170 | \end{quote2} |
| 1171 | where the right example is how the compiler logically interprets the variables in the left example. |
| 1172 | Since a variable name only points to one address during its lifetime, it is an \Index{immutable} \Index{pointer}; |
| 1173 | hence, the implicit type of pointer variables ©x© and ©y© are constant pointers in the compiler interpretation. |
| 1174 | In general, variable addresses are stored in instructions instead of loaded from memory, and hence may not occupy storage. |
| 1175 | These approaches are contrasted in the following: |
| 1176 | \begin{quote2} |
| 1177 | \begin{tabular}{@{}l|l@{}} |
| 1178 | \multicolumn{1}{c|}{explicit variable address} & \multicolumn{1}{c}{implicit variable address} \\ |
| 1179 | \hline |
| 1180 | \begin{cfa} |
| 1181 | lda r1,100 // load address of x |
| 1182 | ld r2,(r1) // load value of x |
| 1183 | lda r3,104 // load address of y |
| 1184 | st r2,(r3) // store x into y |
| 1185 | \end{cfa} |
| 1186 | & |
| 1187 | \begin{cfa} |
| 1188 | |
| 1189 | ld r2,(100) // load value of x |
| 1190 | |
| 1191 | st r2,(104) // store x into y |
| 1192 | \end{cfa} |
| 1193 | \end{tabular} |
| 1194 | \end{quote2} |
| 1195 | Finally, the immutable nature of a variable's address and the fact that there is no storage for the variable pointer means pointer assignment\index{pointer!assignment}\index{assignment!pointer} is impossible. |
| 1196 | Therefore, the expression ©x = y© has only one meaning, ©*x = *y©, \ie manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of \Index{instruction decoding}. |
| 1197 | |
| 1198 | A \Index{pointer}/\Index{reference} object is a generalization of an object variable-name, \ie a mutable address that can point to more than one memory location during its lifetime. |
| 1199 | (Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime, and like a variable name, may not occupy storage if the literal is embedded directly into instructions.) |
| 1200 | Hence, a pointer occupies memory to store its current address, and the pointer's value is loaded by dereferencing, \eg: |
| 1201 | \begin{quote2} |
| 1202 | \begin{tabular}{@{}l@{\hspace{2em}}l@{}} |
| 1203 | \begin{cfa} |
| 1204 | int x, y, ®*® p1, ®*® p2, ®**® p3; |
| 1205 | p1 = ®&®x; // p1 points to x |
| 1206 | p2 = p1; // p2 points to x |
| 1207 | p1 = ®&®y; // p1 points to y |
| 1208 | p3 = &p2; // p3 points to p2 |
| 1209 | \end{cfa} |
| 1210 | & |
| 1211 | \raisebox{-0.5\totalheight}{\input{pointer2.pstex_t}} |
| 1212 | \end{tabular} |
| 1213 | \end{quote2} |
| 1214 | |
| 1215 | Notice, an address has a \Index{duality}\index{address!duality}: a location in memory or the value at that location. |
| 1216 | In many cases, a compiler might be able to infer the best meaning for these two cases. |
| 1217 | For example, \Index*{Algol68}~\cite{Algol68} infers pointer dereferencing to select the best meaning for each pointer usage |
| 1218 | \begin{cfa} |
| 1219 | p2 = p1 + x; §\C{// compiler infers *p2 = *p1 + x;}§ |
| 1220 | \end{cfa} |
| 1221 | Algol68 infers the following dereferencing ©*p2 = *p1 + x©, because adding the arbitrary integer value in ©x© to the address of ©p1© and storing the resulting address into ©p2© is an unlikely operation. |
| 1222 | Unfortunately, automatic dereferencing does not work in all cases, and so some mechanism is necessary to fix incorrect choices. |
| 1223 | |
| 1224 | Rather than inferring dereference, most programming languages pick one implicit dereferencing semantics, and the programmer explicitly indicates the other to resolve address-duality. |
| 1225 | In C, objects of pointer type always manipulate the pointer object's address: |
| 1226 | \begin{cfa} |
| 1227 | p1 = p2; §\C{// p1 = p2\ \ rather than\ \ *p1 = *p2}§ |
| 1228 | p2 = p1 + x; §\C{// p2 = p1 + x\ \ rather than\ \ *p2 = *p1 + x}§ |
| 1229 | \end{cfa} |
| 1230 | even though the assignment to ©p2© is likely incorrect, and the programmer probably meant: |
| 1231 | \begin{cfa} |
| 1232 | p1 = p2; §\C{// pointer address assignment}§ |
| 1233 | ®*®p2 = ®*®p1 + x; §\C{// pointed-to value assignment / operation}§ |
| 1234 | \end{cfa} |
| 1235 | The C semantics work well for situations where manipulation of addresses is the primary meaning and data is rarely accessed, such as storage management (©malloc©/©free©). |
| 1236 | |
| 1237 | However, in most other situations, the pointed-to value is requested more often than the pointer address. |
| 1238 | \begin{cfa} |
| 1239 | *p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15); |
| 1240 | \end{cfa} |
| 1241 | In this case, it is tedious to explicitly write the dereferencing, and error prone when pointer arithmetic is allowed. |
| 1242 | It is better to have the compiler generate the dereferencing and have no implicit pointer arithmetic: |
| 1243 | \begin{cfa} |
| 1244 | p2 = ((p1 + p2) * (p3 - p1)) / (p3 - 15); |
| 1245 | \end{cfa} |
| 1246 | |
| 1247 | To support this common case, a reference type is introduced in \CFA, denoted by ©&©, which is the opposite dereference semantics to a pointer type, making the value at the pointed-to location the implicit semantics for dereferencing (similar but not the same as \CC \Index{reference type}s). |
| 1248 | \begin{cfa} |
| 1249 | int x, y, ®&® r1, ®&® r2, ®&&® r3; |
| 1250 | ®&®r1 = &x; §\C{// r1 points to x}§ |
| 1251 | ®&®r2 = &r1; §\C{// r2 points to x}§ |
| 1252 | ®&®r1 = &y; §\C{// r1 points to y}§ |
| 1253 | ®&&®r3 = ®&®&r2; §\C{// r3 points to r2}§ |
| 1254 | r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15); §\C{// implicit dereferencing}§ |
| 1255 | \end{cfa} |
| 1256 | Except for auto-dereferencing by the compiler, this reference example is the same as the previous pointer example. |
| 1257 | Hence, a reference behaves like the variable name for the current variable it is pointing-to. |
| 1258 | One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in a declaration, so the previous example becomes: |
| 1259 | \begin{cfa} |
| 1260 | ®*®r2 = ((®*®r1 + ®*®r2) ®*® (®**®r3 - ®*®r1)) / (®**®r3 - 15); |
| 1261 | \end{cfa} |
| 1262 | When a reference operation appears beside a dereference operation, \eg ©&*©, they cancel out. |
| 1263 | However, in C, the cancellation always yields a value (\Index{rvalue}).\footnote{ |
| 1264 | The unary ©&© operator yields the address of its operand. |
| 1265 | If the operand has type ``type'', the result has type ``pointer to type''. |
| 1266 | If the operand is the result of a unary ©*© operator, neither that operator nor the ©&© operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.~\cite[\S~6.5.3.2--3]{C11}} |
| 1267 | For a \CFA reference type, the cancellation on the left-hand side of assignment leaves the reference as an address (\Index{lvalue}): |
| 1268 | \begin{cfa} |
| 1269 | (&®*®)r1 = &x; §\C{// (\&*) cancel giving address in r1 not variable pointed-to by r1}§ |
| 1270 | \end{cfa} |
| 1271 | Similarly, the address of a reference can be obtained for assignment or computation (\Index{rvalue}): |
| 1272 | \begin{cfa} |
| 1273 | (&(&®*®)®*®)r3 = &(&®*®)r2; §\C{// (\&*) cancel giving address in r2, (\&(\&*)*) cancel giving address in r3}§ |
| 1274 | \end{cfa} |
| 1275 | Cancellation\index{cancellation!pointer/reference}\index{pointer!cancellation} works to arbitrary depth. |
| 1276 | |
| 1277 | Fundamentally, pointer and reference objects are functionally interchangeable because both contain addresses. |
| 1278 | \begin{cfa} |
| 1279 | int x, *p1 = &x, **p2 = &p1, ***p3 = &p2, |
| 1280 | &r1 = x, &&r2 = r1, &&&r3 = r2; |
| 1281 | ***p3 = 3; §\C{// change x}§ |
| 1282 | r3 = 3; §\C{// change x, ***r3}§ |
| 1283 | **p3 = ...; §\C{// change p1}§ |
| 1284 | &r3 = ...; §\C{// change r1, (\&*)**r3, 1 cancellation}§ |
| 1285 | *p3 = ...; §\C{// change p2}§ |
| 1286 | &&r3 = ...; §\C{// change r2, (\&(\&*)*)*r3, 2 cancellations}§ |
| 1287 | &&&r3 = p3; §\C{// change r3 to p3, (\&(\&(\&*)*)*)r3, 3 cancellations}§ |
| 1288 | \end{cfa} |
| 1289 | Furthermore, both types are equally performant, as the same amount of dereferencing occurs for both types. |
| 1290 | Therefore, the choice between them is based solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of implicit dereferencing aid from the compiler. |
| 1291 | |
| 1292 | As for a pointer type, a reference type may have qualifiers: |
| 1293 | \begin{cfa} |
| 1294 | const int cx = 5; §\C{// cannot change cx;}§ |
| 1295 | const int & cr = cx; §\C{// cannot change what cr points to}§ |
| 1296 | ®&®cr = &cx; §\C{// can change cr}§ |
| 1297 | cr = 7; §\C{// error, cannot change cx}§ |
| 1298 | int & const rc = x; §\C{// must be initialized}§ |
| 1299 | ®&®rc = &x; §\C{// error, cannot change rc}§ |
| 1300 | const int & const crc = cx; §\C{// must be initialized}§ |
| 1301 | crc = 7; §\C{// error, cannot change cx}§ |
| 1302 | ®&®crc = &cx; §\C{// error, cannot change crc}§ |
| 1303 | \end{cfa} |
| 1304 | Hence, for type ©& const©, there is no pointer assignment, so ©&rc = &x© is disallowed, and \emph{the address value cannot be the null pointer unless an arbitrary pointer is coerced\index{coercion} into the reference}: |
| 1305 | \begin{cfa} |
| 1306 | int & const cr = *0; §\C{// where 0 is the int * zero}§ |
| 1307 | \end{cfa} |
| 1308 | Note, constant reference-types do not prevent \Index{addressing errors} because of explicit storage-management: |
| 1309 | \begin{cfa} |
| 1310 | int & const cr = *malloc(); |
| 1311 | cr = 5; |
| 1312 | free( &cr ); |
| 1313 | cr = 7; §\C{// unsound pointer dereference}§ |
| 1314 | \end{cfa} |
| 1315 | |
| 1316 | The position of the ©const© qualifier \emph{after} the pointer/reference qualifier causes confuse for C programmers. |
| 1317 | The ©const© qualifier cannot be moved before the pointer/reference qualifier for C style-declarations; |
| 1318 | \CFA-style declarations (see \VRef{s:Declarations}) attempt to address this issue: |
| 1319 | \begin{quote2} |
| 1320 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 1321 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
| 1322 | \begin{cfa} |
| 1323 | ®const® * ®const® * const int ccp; |
| 1324 | ®const® & ®const® & const int ccr; |
| 1325 | \end{cfa} |
| 1326 | & |
| 1327 | \begin{cfa} |
| 1328 | const int * ®const® * ®const® ccp; |
| 1329 | |
| 1330 | \end{cfa} |
| 1331 | \end{tabular} |
| 1332 | \end{quote2} |
| 1333 | where the \CFA declaration is read left-to-right. |
| 1334 | |
| 1335 | Finally, like pointers, references are usable and composable with other type operators and generators. |
| 1336 | \begin{cfa} |
| 1337 | int w, x, y, z, & ar[3] = { x, y, z }; §\C{// initialize array of references}§ |
| 1338 | &ar[1] = &w; §\C{// change reference array element}§ |
| 1339 | typeof( ar[1] ) p; §\C{// (gcc) is int, i.e., the type of referenced object}§ |
| 1340 | typeof( &ar[1] ) q; §\C{// (gcc) is int \&, i.e., the type of reference}§ |
| 1341 | sizeof( ar[1] ) == sizeof( int ); §\C{// is true, i.e., the size of referenced object}§ |
| 1342 | sizeof( &ar[1] ) == sizeof( int *) §\C{// is true, i.e., the size of a reference}§ |
| 1343 | \end{cfa} |
| 1344 | |
| 1345 | In contrast to \CFA reference types, \Index*[C++]{\CC{}}'s reference types are all ©const© references, preventing changes to the reference address, so only value assignment is possible, which eliminates half of the \Index{address duality}. |
| 1346 | Also, \CC does not allow \Index{array}s\index{array!reference} of reference\footnote{ |
| 1347 | The reason for disallowing arrays of reference is unknown, but possibly comes from references being ethereal (like a textual macro), and hence, replaceable by the referant object.} |
| 1348 | \Index*{Java}'s reference types to objects (all Java objects are on the heap) are like C pointers, which always manipulate the address, and there is no (bit-wise) object assignment, so objects are explicitly cloned by shallow or deep copying, which eliminates half of the address duality. |
| 1349 | |
| 1350 | |
| 1351 | \subsection{Initialization} |
| 1352 | |
| 1353 | \Index{Initialization} is different than \Index{assignment} because initialization occurs on the empty (uninitialized) storage on an object, while assignment occurs on possibly initialized storage of an object. |
| 1354 | There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, return/temporary binding. |
| 1355 | Because the object being initialized has no value, there is only one meaningful semantics with respect to address duality: it must mean address as there is no pointed-to value. |
| 1356 | In contrast, the left-hand side of assignment has an address that has a duality. |
| 1357 | Therefore, for pointer/reference initialization, the initializing value must be an address not a value. |
| 1358 | \begin{cfa} |
| 1359 | int * p = &x; §\C{// assign address of x}§ |
| 1360 | ®int * p = x;® §\C{// assign value of x}§ |
| 1361 | int & r = x; §\C{// must have address of x}§ |
| 1362 | \end{cfa} |
| 1363 | Like the previous example with C pointer-arithmetic, it is unlikely assigning the value of ©x© into a pointer is meaningful (again, a warning is usually given). |
| 1364 | Therefore, for safety, this context requires an address, so it is superfluous to require explicitly taking the address of the initialization object, even though the type is incorrect. |
| 1365 | Note, this is strictly a convenience and safety feature for a programmer. |
| 1366 | Hence, \CFA allows ©r© to be assigned ©x© because it infers a reference for ©x©, by implicitly inserting a address-of operator, ©&©, and it is an error to put an ©&© because the types no longer match due to the implicit dereference. |
| 1367 | Unfortunately, C allows ©p© to be assigned with ©&x© (address) or ©x© (value), but most compilers warn about the latter assignment as being potentially incorrect. |
| 1368 | Similarly, when a reference type is used for a parameter/return type, the call-site argument does not require a reference operator for the same reason. |
| 1369 | \begin{cfa} |
| 1370 | int & f( int & r ); §\C{// reference parameter and return}§ |
| 1371 | z = f( x ) + f( y ); §\C{// reference operator added, temporaries needed for call results}§ |
| 1372 | \end{cfa} |
| 1373 | Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©r© can be locally reassigned within ©f©. |
| 1374 | Since operator routine ©?+?© takes its arguments by value, the references returned from ©f© are used to initialize compiler generated temporaries with value semantics that copy from the references. |
| 1375 | \begin{cfa} |
| 1376 | int temp1 = f( x ), temp2 = f( y ); |
| 1377 | z = temp1 + temp2; |
| 1378 | \end{cfa} |
| 1379 | This \Index{implicit referencing} is crucial for reducing the syntactic burden for programmers when using references; |
| 1380 | otherwise references have the same syntactic burden as pointers in these contexts. |
| 1381 | |
| 1382 | When a pointer/reference parameter has a ©const© value (immutable), it is possible to pass literals and expressions. |
| 1383 | \begin{cfa} |
| 1384 | void f( ®const® int & cr ); |
| 1385 | void g( ®const® int * cp ); |
| 1386 | f( 3 ); g( ®&®3 ); |
| 1387 | f( x + y ); g( ®&®(x + y) ); |
| 1388 | \end{cfa} |
| 1389 | Here, the compiler passes the address to the literal 3 or the temporary for the expression ©x + y©, knowing the argument cannot be changed through the parameter. |
| 1390 | The ©&© before the constant/expression for the pointer-type parameter (©g©) is a \CFA extension necessary to type match and is a common requirement before a variable in C (\eg ©scanf©). |
| 1391 | Importantly, ©&3© may not be equal to ©&3©, where the references occur across calls because the temporaries maybe different on each call. |
| 1392 | |
| 1393 | \CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed.\footnote{ |
| 1394 | If whole program analysis is possible, and shows the parameter is not assigned, \ie it is ©const©, the temporary is unnecessary.} |
| 1395 | \begin{cfa} |
| 1396 | void f( int & r ); |
| 1397 | void g( int * p ); |
| 1398 | f( 3 ); g( ®&®3 ); §\C{// compiler implicit generates temporaries}§ |
| 1399 | f( x + y ); g( ®&®(x + y) ); §\C{// compiler implicit generates temporaries}§ |
| 1400 | \end{cfa} |
| 1401 | Essentially, there is an implicit \Index{rvalue} to \Index{lvalue} conversion in this case.\footnote{ |
| 1402 | This conversion attempts to address the \newterm{const hell} problem, when the innocent addition of a ©const© qualifier causes a cascade of type failures, requiring an unknown number of additional ©const© qualifiers, until it is discovered a ©const© qualifier cannot be added and all the ©const© qualifiers must be removed.} |
| 1403 | The implicit conversion allows seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call. |
| 1404 | |
| 1405 | %\CFA attempts to handle pointers and references in a uniform, symmetric manner. |
| 1406 | Finally, C handles \Index{routine object}s in an inconsistent way. |
| 1407 | A routine object is both a pointer and a reference (\Index{particle and wave}). |
| 1408 | \begin{cfa} |
| 1409 | void f( int i ); |
| 1410 | void (*fp)( int ); §\C{// routine pointer}§ |
| 1411 | fp = f; §\C{// reference initialization}§ |
| 1412 | fp = &f; §\C{// pointer initialization}§ |
| 1413 | fp = *f; §\C{// reference initialization}§ |
| 1414 | fp(3); §\C{// reference invocation}§ |
| 1415 | (*fp)(3); §\C{// pointer invocation}§ |
| 1416 | \end{cfa} |
| 1417 | While C's treatment of routine objects has similarity to inferring a reference type in initialization contexts, the examples are assignment not initialization, and all possible forms of assignment are possible (©f©, ©&f©, ©*f©) without regard for type. |
| 1418 | Instead, a routine object should be referenced by a ©const© reference: |
| 1419 | \begin{cfa} |
| 1420 | ®const® void (®&® fr)( int ) = f; §\C{// routine reference}§ |
| 1421 | fr = ... §\C{// error, cannot change code}§ |
| 1422 | &fr = ...; §\C{// changing routine reference}§ |
| 1423 | fr( 3 ); §\C{// reference call to f}§ |
| 1424 | (*fr)(3); §\C{// error, incorrect type}§ |
| 1425 | \end{cfa} |
| 1426 | because the value of the routine object is a routine literal, \ie the routine code is normally immutable during execution.\footnote{ |
| 1427 | Dynamic code rewriting is possible but only in special circumstances.} |
| 1428 | \CFA allows this additional use of references for routine objects in an attempt to give a more consistent meaning for them. |
| 1429 | |
| 1430 | |
| 1431 | \subsection{Address-of Semantics} |
| 1432 | |
| 1433 | In C, ©&E© is an rvalue for any expression ©E©. |
| 1434 | \CFA extends the ©&© (address-of) operator as follows: |
| 1435 | \begin{itemize} |
| 1436 | \item |
| 1437 | if ©R© is an \Index{rvalue} of type ©T &$_1$...&$_r$© where $r \ge 1$ references (©&© symbols) than ©&R© has type ©T ®*®&$_{\color{red}2}$...&$_{\color{red}r}$©, \ie ©T© pointer with $r-1$ references (©&© symbols). |
| 1438 | |
| 1439 | \item |
| 1440 | if ©L© is an \Index{lvalue} of type ©T &$_1$...&$_l$© where $l \ge 0$ references (©&© symbols) then ©&L© has type ©T ®*®&$_{\color{red}1}$...&$_{\color{red}l}$©, \ie ©T© pointer with $l$ references (©&© symbols). |
| 1441 | \end{itemize} |
| 1442 | The following example shows the first rule applied to different \Index{rvalue} contexts: |
| 1443 | \begin{cfa} |
| 1444 | int x, * px, ** ppx, *** pppx, **** ppppx; |
| 1445 | int & rx = x, && rrx = rx, &&& rrrx = rrx ; |
| 1446 | x = rrrx; // rrrx is an lvalue with type int &&& (equivalent to x) |
| 1447 | px = &rrrx; // starting from rrrx, &rrrx is an rvalue with type int *&&& (&x) |
| 1448 | ppx = &&rrrx; // starting from &rrrx, &&rrrx is an rvalue with type int **&& (&rx) |
| 1449 | pppx = &&&rrrx; // starting from &&rrrx, &&&rrrx is an rvalue with type int ***& (&rrx) |
| 1450 | ppppx = &&&&rrrx; // starting from &&&rrrx, &&&&rrrx is an rvalue with type int **** (&rrrx) |
| 1451 | \end{cfa} |
| 1452 | The following example shows the second rule applied to different \Index{lvalue} contexts: |
| 1453 | \begin{cfa} |
| 1454 | int x, * px, ** ppx, *** pppx; |
| 1455 | int & rx = x, && rrx = rx, &&& rrrx = rrx ; |
| 1456 | rrrx = 2; // rrrx is an lvalue with type int &&& (equivalent to x) |
| 1457 | &rrrx = px; // starting from rrrx, &rrrx is an rvalue with type int *&&& (rx) |
| 1458 | &&rrrx = ppx; // starting from &rrrx, &&rrrx is an rvalue with type int **&& (rrx) |
| 1459 | &&&rrrx = pppx; // starting from &&rrrx, &&&rrrx is an rvalue with type int ***& (rrrx) |
| 1460 | \end{cfa} |
| 1461 | |
| 1462 | |
| 1463 | \subsection{Conversions} |
| 1464 | |
| 1465 | C provides a basic implicit conversion to simplify variable usage: |
| 1466 | \begin{enumerate} |
| 1467 | \setcounter{enumi}{-1} |
| 1468 | \item |
| 1469 | lvalue to rvalue conversion: ©cv T© converts to ©T©, which allows implicit variable dereferencing. |
| 1470 | \begin{cfa} |
| 1471 | int x; |
| 1472 | x + 1; // lvalue variable (int) converts to rvalue for expression |
| 1473 | \end{cfa} |
| 1474 | An rvalue has no type qualifiers (©cv©), so the lvalue qualifiers are dropped. |
| 1475 | \end{enumerate} |
| 1476 | \CFA provides three new implicit conversion for reference types to simplify reference usage. |
| 1477 | \begin{enumerate} |
| 1478 | \item |
| 1479 | reference to rvalue conversion: ©cv T &© converts to ©T©, which allows implicit reference dereferencing. |
| 1480 | \begin{cfa} |
| 1481 | int x, &r = x, f( int p ); |
| 1482 | x = ®r® + f( ®r® ); // lvalue reference converts to rvalue |
| 1483 | \end{cfa} |
| 1484 | An rvalue has no type qualifiers (©cv©), so the reference qualifiers are dropped. |
| 1485 | |
| 1486 | \item |
| 1487 | lvalue to reference conversion: \lstinline[deletekeywords={lvalue}]@lvalue-type cv1 T@ converts to ©cv2 T &©, which allows implicitly converting variables to references. |
| 1488 | \begin{cfa} |
| 1489 | int x, &r = ®x®, f( int & p ); // lvalue variable (int) convert to reference (int &) |
| 1490 | f( ®x® ); // lvalue variable (int) convert to reference (int &) |
| 1491 | \end{cfa} |
| 1492 | Conversion can restrict a type, where ©cv1© $\le$ ©cv2©, \eg passing an ©int© to a ©const volatile int &©, which has low cost. |
| 1493 | Conversion can expand a type, where ©cv1© $>$ ©cv2©, \eg passing a ©const volatile int© to an ©int &©, which has high cost (\Index{warning}); |
| 1494 | furthermore, if ©cv1© has ©const© but not ©cv2©, a temporary variable is created to preserve the immutable lvalue. |
| 1495 | |
| 1496 | \item |
| 1497 | rvalue to reference conversion: ©T© converts to ©cv T &©, which allows binding references to temporaries. |
| 1498 | \begin{cfa} |
| 1499 | int x, & f( int & p ); |
| 1500 | f( ®x + 3® ); // rvalue parameter (int) implicitly converts to lvalue temporary reference (int &) |
| 1501 | ®&f®(...) = &x; // rvalue result (int &) implicitly converts to lvalue temporary reference (int &) |
| 1502 | \end{cfa} |
| 1503 | In both case, modifications to the temporary are inaccessible (\Index{warning}). |
| 1504 | Conversion expands the temporary-type with ©cv©, which is low cost since the temporary is inaccessible. |
| 1505 | \end{enumerate} |
| 1506 | |
| 1507 | |
| 1508 | \begin{comment} |
| 1509 | From: Richard Bilson <rcbilson@gmail.com> |
| 1510 | Date: Wed, 13 Jul 2016 01:58:58 +0000 |
| 1511 | Subject: Re: pointers / references |
| 1512 | To: "Peter A. Buhr" <pabuhr@plg2.cs.uwaterloo.ca> |
| 1513 | |
| 1514 | As a general comment I would say that I found the section confusing, as you move back and forth |
| 1515 | between various real and imagined programming languages. If it were me I would rewrite into two |
| 1516 | subsections, one that specifies precisely the syntax and semantics of reference variables and |
| 1517 | another that provides the rationale. |
| 1518 | |
| 1519 | I don't see any obvious problems with the syntax or semantics so far as I understand them. It's not |
| 1520 | obvious that the description you're giving is complete, but I'm sure you'll find the special cases |
| 1521 | as you do the implementation. |
| 1522 | |
| 1523 | My big gripes are mostly that you're not being as precise as you need to be in your terminology, and |
| 1524 | that you say a few things that aren't actually true even though I generally know what you mean. |
| 1525 | |
| 1526 | 20 C provides a pointer type; CFA adds a reference type. Both types contain an address, which is normally a |
| 1527 | 21 location in memory. |
| 1528 | |
| 1529 | An address is not a location in memory; an address refers to a location in memory. Furthermore it |
| 1530 | seems weird to me to say that a type "contains" an address; rather, objects of that type do. |
| 1531 | |
| 1532 | 21 Special addresses are used to denote certain states or access co-processor memory. By |
| 1533 | 22 convention, no variable is placed at address 0, so addresses like 0, 1, 2, 3 are often used to denote no-value |
| 1534 | 23 or other special states. |
| 1535 | |
| 1536 | This isn't standard C at all. There has to be one null pointer representation, but it doesn't have |
| 1537 | to be a literal zero representation and there doesn't have to be more than one such representation. |
| 1538 | |
| 1539 | 23 Often dereferencing a special state causes a memory fault, so checking is necessary |
| 1540 | 24 during execution. |
| 1541 | |
| 1542 | I don't see the connection between the two clauses here. I feel like if a bad pointer will not cause |
| 1543 | a memory fault then I need to do more checking, not less. |
| 1544 | |
| 1545 | 24 If the programming language assigns addresses, a program's execution is sound, \ie all |
| 1546 | 25 addresses are to valid memory locations. |
| 1547 | |
| 1548 | You haven't said what it means to "assign" an address, but if I use my intuitive understanding of |
| 1549 | the term I don't see how this can be true unless you're assuming automatic storage management. |
| 1550 | |
| 1551 | 1 Program variables are implicit pointers to memory locations generated by the compiler and automatically |
| 1552 | 2 dereferenced, as in: |
| 1553 | |
| 1554 | There is no reason why a variable needs to have a location in memory, and indeed in a typical |
| 1555 | program many variables will not. In standard terminology an object identifier refers to data in the |
| 1556 | execution environment, but not necessarily in memory. |
| 1557 | |
| 1558 | 13 A pointer/reference is a generalization of a variable name, \ie a mutable address that can point to more |
| 1559 | 14 than one memory location during its lifetime. |
| 1560 | |
| 1561 | I feel like you're off the reservation here. In my world there are objects of pointer type, which |
| 1562 | seem to be what you're describing here, but also pointer values, which can be stored in an object of |
| 1563 | pointer type but don't necessarily have to be. For example, how would you describe the value denoted |
| 1564 | by "&main" in a C program? I would call it a (function) pointer, but that doesn't satisfy your |
| 1565 | definition. |
| 1566 | |
| 1567 | 16 not occupy storage as the literal is embedded directly into instructions.) Hence, a pointer occupies memory |
| 1568 | 17 to store its current address, and the pointer's value is loaded by dereferencing, \eg: |
| 1569 | |
| 1570 | As with my general objection regarding your definition of variables, there is no reason why a |
| 1571 | pointer variable (object of pointer type) needs to occupy memory. |
| 1572 | |
| 1573 | 21 p2 = p1 + x; // compiler infers *p2 = *p1 + x; |
| 1574 | |
| 1575 | What language are we in now? |
| 1576 | |
| 1577 | 24 pointer usage. However, in C, the following cases are ambiguous, especially with pointer arithmetic: |
| 1578 | 25 p1 = p2; // p1 = p2 or *p1 = *p2 |
| 1579 | |
| 1580 | This isn't ambiguous. it's defined to be the first option. |
| 1581 | |
| 1582 | 26 p1 = p1 + 1; // p1 = p1 + 1 or *p1 = *p1 + 1 |
| 1583 | |
| 1584 | Again, this statement is not ambiguous. |
| 1585 | |
| 1586 | 13 example. Hence, a reference behaves like the variable name for the current variable it is pointing-to. The |
| 1587 | 14 simplest way to understand a reference is to imagine the compiler inserting a dereference operator before |
| 1588 | 15 the reference variable for each reference qualifier in a declaration, \eg: |
| 1589 | |
| 1590 | It's hard for me to understand who the audience for this part is. I think a practical programmer is |
| 1591 | likely to be satisfied with "a reference behaves like the variable name for the current variable it |
| 1592 | is pointing-to," maybe with some examples. Your "simplest way" doesn't strike me as simpler than |
| 1593 | that. It feels like you're trying to provide a more precise definition for the semantics of |
| 1594 | references, but it isn't actually precise enough to be a formal specification. If you want to |
| 1595 | express the semantics of references using rewrite rules that's a great way to do it, but lay the |
| 1596 | rules out clearly, and when you're showing an example of rewriting keep your |
| 1597 | references/pointers/values separate (right now, you use \eg "r3" to mean a reference, a pointer, |
| 1598 | and a value). |
| 1599 | |
| 1600 | 24 Cancellation works to arbitrary depth, and pointer and reference values are interchangeable because both |
| 1601 | 25 contain addresses. |
| 1602 | |
| 1603 | Except they're not interchangeable, because they have different and incompatible types. |
| 1604 | |
| 1605 | 40 Interestingly, C++ deals with the address duality by making the pointed-to value the default, and prevent- |
| 1606 | 41 ing changes to the reference address, which eliminates half of the duality. Java deals with the address duality |
| 1607 | 42 by making address assignment the default and requiring field assignment (direct or indirect via methods), |
| 1608 | 43 \ie there is no builtin bit-wise or method-wise assignment, which eliminates half of the duality. |
| 1609 | |
| 1610 | I can follow this but I think that's mostly because I already understand what you're trying to |
| 1611 | say. I don't think I've ever heard the term "method-wise assignment" and I don't see you defining |
| 1612 | it. Furthermore Java does have value assignment of basic (non-class) types, so your summary here |
| 1613 | feels incomplete. (If it were me I'd drop this paragraph rather than try to save it.) |
| 1614 | |
| 1615 | 11 Hence, for type & const, there is no pointer assignment, so &rc = &x is disallowed, and the address value |
| 1616 | 12 cannot be 0 unless an arbitrary pointer is assigned to the reference. |
| 1617 | |
| 1618 | Given the pains you've taken to motivate every little bit of the semantics up until now, this last |
| 1619 | clause ("the address value cannot be 0") comes out of the blue. It seems like you could have |
| 1620 | perfectly reasonable semantics that allowed the initialization of null references. |
| 1621 | |
| 1622 | 12 In effect, the compiler is managing the |
| 1623 | 13 addresses for type & const not the programmer, and by a programming discipline of only using references |
| 1624 | 14 with references, address errors can be prevented. |
| 1625 | |
| 1626 | Again, is this assuming automatic storage management? |
| 1627 | |
| 1628 | 18 rary binding. For reference initialization (like pointer), the initializing value must be an address (lvalue) not |
| 1629 | 19 a value (rvalue). |
| 1630 | |
| 1631 | This sentence appears to suggest that an address and an lvalue are the same thing. |
| 1632 | |
| 1633 | 20 int * p = &x; // both &x and x are possible interpretations |
| 1634 | |
| 1635 | Are you saying that we should be considering "x" as a possible interpretation of the initializer |
| 1636 | "&x"? It seems to me that this expression has only one legitimate interpretation in context. |
| 1637 | |
| 1638 | 21 int & r = x; // x unlikely interpretation, because of auto-dereferencing |
| 1639 | |
| 1640 | You mean, we can initialize a reference using an integer value? Surely we would need some sort of |
| 1641 | cast to induce that interpretation, no? |
| 1642 | |
| 1643 | 22 Hence, the compiler implicitly inserts a reference operator, &, before the initialization expression. |
| 1644 | |
| 1645 | But then the expression would have pointer type, which wouldn't be compatible with the type of r. |
| 1646 | |
| 1647 | 22 Similarly, |
| 1648 | 23 when a reference is used for a parameter/return type, the call-site argument does not require a reference |
| 1649 | 24 operator. |
| 1650 | |
| 1651 | Furthermore, it would not be correct to use a reference operator. |
| 1652 | |
| 1653 | 45 The implicit conversion allows |
| 1654 | 1 seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call. |
| 1655 | 2 While C' attempts to handle pointers and references in a uniform, symmetric manner, C handles routine |
| 1656 | 3 variables in an inconsistent way: a routine variable is both a pointer and a reference (particle and wave). |
| 1657 | |
| 1658 | After all this talk of how expressions can have both pointer and value interpretations, you're |
| 1659 | disparaging C because it has expressions that have both pointer and value interpretations? |
| 1660 | |
| 1661 | On Sat, Jul 9, 2016 at 4:18 PM Peter A. Buhr <pabuhr@plg.uwaterloo.ca> wrote: |
| 1662 | > Aaron discovered a few places where "&"s are missing and where there are too many "&", which are |
| 1663 | > corrected in the attached updated. None of the text has changed, if you have started reading |
| 1664 | > already. |
| 1665 | \end{comment} |
| 1666 | |
| 1667 | |
| 1668 | \section{Routine Definition} |
| 1669 | |
| 1670 | \CFA also supports a new syntax for routine definition, as well as \Celeven and K\&R routine syntax. |
| 1671 | The point of the new syntax is to allow returning multiple values from a routine~\cite{Galletly96,CLU}, \eg: |
| 1672 | \begin{cfa} |
| 1673 | ®[ int o1, int o2, char o3 ]® f( int i1, char i2, char i3 ) { |
| 1674 | §\emph{routine body}§ |
| 1675 | } |
| 1676 | \end{cfa} |
| 1677 | where routine ©f© has three output (return values) and three input parameters. |
| 1678 | Existing C syntax cannot be extended with multiple return types because it is impossible to embed a single routine name within multiple return type specifications. |
| 1679 | |
| 1680 | In detail, the brackets, ©[]©, enclose the result type, where each return value is named and that name is a local variable of the particular return type.\footnote{ |
| 1681 | \Index*{Michael Tiemann}, with help from \Index*{Doug Lea}, provided named return values in g++, circa 1989.} |
| 1682 | The value of each local return variable is automatically returned at routine termination. |
| 1683 | Declaration qualifiers can only appear at the start of a routine definition, \eg: |
| 1684 | \begin{cfa} |
| 1685 | ®extern® [ int x ] g( int y ) {§\,§} |
| 1686 | \end{cfa} |
| 1687 | Lastly, if there are no output parameters or input parameters, the brackets and/or parentheses must still be specified; |
| 1688 | in both cases the type is assumed to be void as opposed to old style C defaults of int return type and unknown parameter types, respectively, as in: |
| 1689 | \begin{cfa} |
| 1690 | [§\,§] g(); §\C{// no input or output parameters}§ |
| 1691 | [ void ] g( void ); §\C{// no input or output parameters}§ |
| 1692 | \end{cfa} |
| 1693 | |
| 1694 | Routine f is called as follows: |
| 1695 | \begin{cfa} |
| 1696 | [ i, j, ch ] = f( 3, 'a', ch ); |
| 1697 | \end{cfa} |
| 1698 | The list of return values from f and the grouping on the left-hand side of the assignment is called a \newterm{return list} and discussed in Section 12. |
| 1699 | |
| 1700 | \CFA style declarations cannot be used to declare parameters for K\&R style routine definitions because of the following ambiguity: |
| 1701 | \begin{cfa} |
| 1702 | int (*f(x))[ 5 ] int x; {} |
| 1703 | \end{cfa} |
| 1704 | The string ``©int (*f(x))[ 5 ]©'' declares a K\&R style routine of type returning a pointer to an array of 5 integers, while the string ``©[ 5 ] int x©'' declares a \CFA style parameter x of type array of 5 integers. |
| 1705 | Since the strings overlap starting with the open bracket, ©[©, there is an ambiguous interpretation for the string. |
| 1706 | As well, \CFA-style declarations cannot be used to declare parameters for C-style routine-definitions because of the following ambiguity: |
| 1707 | \begin{cfa} |
| 1708 | typedef int foo; |
| 1709 | int f( int (* foo) ); §\C{// foo is redefined as a parameter name}§ |
| 1710 | \end{cfa} |
| 1711 | The string ``©int (* foo)©'' declares a C-style named-parameter of type pointer to an integer (the parenthesis are superfluous), while the same string declares a \CFA style unnamed parameter of type routine returning integer with unnamed parameter of type pointer to foo. |
| 1712 | The redefinition of a type name in a parameter list is the only context in C where the character ©*© can appear to the left of a type name, and \CFA relies on all type qualifier characters appearing to the right of the type name. |
| 1713 | The inability to use \CFA declarations in these two contexts is probably a blessing because it precludes programmers from arbitrarily switching between declarations forms within a declaration contexts. |
| 1714 | |
| 1715 | C-style declarations can be used to declare parameters for \CFA style routine definitions, \eg: |
| 1716 | \begin{cfa} |
| 1717 | [ int ] f( * int, int * ); §\C{// returns an integer, accepts 2 pointers to integers}§ |
| 1718 | [ * int, int * ] f( int ); §\C{// returns 2 pointers to integers, accepts an integer}§ |
| 1719 | \end{cfa} |
| 1720 | The reason for allowing both declaration styles in the new context is for backwards compatibility with existing preprocessor macros that generate C-style declaration-syntax, as in: |
| 1721 | \begin{cfa} |
| 1722 | #define ptoa( n, d ) int (*n)[ d ] |
| 1723 | int f( ptoa( p, 5 ) ) ... §\C{// expands to int f( int (*p)[ 5 ] )}§ |
| 1724 | [ int ] f( ptoa( p, 5 ) ) ... §\C{// expands to [ int ] f( int (*p)[ 5 ] )}§ |
| 1725 | \end{cfa} |
| 1726 | Again, programmers are highly encouraged to use one declaration form or the other, rather than mixing the forms. |
| 1727 | |
| 1728 | |
| 1729 | \subsection{Named Return Values} |
| 1730 | |
| 1731 | \Index{Named return values} handle the case where it is necessary to define a local variable whose value is then returned in a ©return© statement, as in: |
| 1732 | \begin{cfa} |
| 1733 | int f() { |
| 1734 | int x; |
| 1735 | ... x = 0; ... x = y; ... |
| 1736 | return x; |
| 1737 | } |
| 1738 | \end{cfa} |
| 1739 | Because the value in the return variable is automatically returned when a \CFA routine terminates, the ©return© statement \emph{does not} contain an expression, as in: |
| 1740 | \newline |
| 1741 | \begin{minipage}{\linewidth} |
| 1742 | \begin{cfa} |
| 1743 | ®[ int x, int y ]® f() { |
| 1744 | int z; |
| 1745 | ... x = 0; ... y = z; ... |
| 1746 | ®return;® §\C{// implicitly return x, y}§ |
| 1747 | } |
| 1748 | \end{cfa} |
| 1749 | \end{minipage} |
| 1750 | \newline |
| 1751 | When the return is encountered, the current values of ©x© and ©y© are returned to the calling routine. |
| 1752 | As well, ``falling off the end'' of a routine without a ©return© statement is permitted, as in: |
| 1753 | \begin{cfa} |
| 1754 | [ int x, int y ] f() { |
| 1755 | ... |
| 1756 | } §\C{// implicitly return x, y}§ |
| 1757 | \end{cfa} |
| 1758 | In this case, the current values of ©x© and ©y© are returned to the calling routine just as if a ©return© had been encountered. |
| 1759 | |
| 1760 | Named return values may be used in conjunction with named parameter values; |
| 1761 | specifically, a return and parameter can have the same name. |
| 1762 | \begin{cfa} |
| 1763 | [ int x, int y ] f( int, x, int y ) { |
| 1764 | ... |
| 1765 | } §\C{// implicitly return x, y}§ |
| 1766 | \end{cfa} |
| 1767 | This notation allows the compiler to eliminate temporary variables in nested routine calls. |
| 1768 | \begin{cfa} |
| 1769 | [ int x, int y ] f( int, x, int y ); §\C{// prototype declaration}§ |
| 1770 | int a, b; |
| 1771 | [a, b] = f( f( f( a, b ) ) ); |
| 1772 | \end{cfa} |
| 1773 | While the compiler normally ignores parameters names in prototype declarations, here they are used to eliminate temporary return-values by inferring that the results of each call are the inputs of the next call, and ultimately, the left-hand side of the assignment. |
| 1774 | Hence, even without the body of routine ©f© (separate compilation), it is possible to perform a global optimization across routine calls. |
| 1775 | The compiler warns about naming inconsistencies between routine prototype and definition in this case, and behaviour is \Index{undefined} if the programmer is inconsistent. |
| 1776 | |
| 1777 | |
| 1778 | \subsection{Routine Prototype} |
| 1779 | |
| 1780 | The syntax of the new routine prototype declaration follows directly from the new routine definition syntax; |
| 1781 | as well, parameter names are optional, \eg: |
| 1782 | \begin{cfa} |
| 1783 | [ int x ] f (); §\C{// returning int with no parameters}§ |
| 1784 | [ * int ] g (int y); §\C{// returning pointer to int with int parameter}§ |
| 1785 | [ ] h ( int, char ); §\C{// returning no result with int and char parameters}§ |
| 1786 | [ * int, int ] j ( int ); §\C{// returning pointer to int and int, with int parameter}§ |
| 1787 | \end{cfa} |
| 1788 | This syntax allows a prototype declaration to be created by cutting and pasting source text from the routine definition header (or vice versa). |
| 1789 | It is possible to declare multiple routine-prototypes in a single declaration, but the entire type specification is distributed across \emph{all} routine names in the declaration list (see~\VRef{s:Declarations}), \eg: |
| 1790 | \begin{quote2} |
| 1791 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 1792 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ |
| 1793 | \begin{cfa} |
| 1794 | [ int ] f( int ), g; |
| 1795 | \end{cfa} |
| 1796 | & |
| 1797 | \begin{cfa} |
| 1798 | int f( int ), g( int ); |
| 1799 | \end{cfa} |
| 1800 | \end{tabular} |
| 1801 | \end{quote2} |
| 1802 | Declaration qualifiers can only appear at the start of a \CFA routine declaration,\footref{StorageClassSpecifier} \eg: |
| 1803 | \begin{cfa} |
| 1804 | extern [ int ] f ( int ); |
| 1805 | static [ int ] g ( int ); |
| 1806 | \end{cfa} |
| 1807 | |
| 1808 | |
| 1809 | \section{Routine Pointers} |
| 1810 | |
| 1811 | The syntax for pointers to \CFA routines specifies the pointer name on the right, \eg: |
| 1812 | \begin{cfa} |
| 1813 | * [ int x ] () fp; §\C{// pointer to routine returning int with no parameters}§ |
| 1814 | * [ * int ] (int y) gp; §\C{// pointer to routine returning pointer to int with int parameter}§ |
| 1815 | * [ ] (int,char) hp; §\C{// pointer to routine returning no result with int and char parameters}§ |
| 1816 | * [ * int,int ] ( int ) jp; §\C{// pointer to routine returning pointer to int and int, with int parameter}§ |
| 1817 | \end{cfa} |
| 1818 | While parameter names are optional, \emph{a routine name cannot be specified}; |
| 1819 | for example, the following is incorrect: |
| 1820 | \begin{cfa} |
| 1821 | * [ int x ] f () fp; §\C{// routine name "f" is not allowed}§ |
| 1822 | \end{cfa} |
| 1823 | |
| 1824 | |
| 1825 | \section{Named and Default Arguments} |
| 1826 | |
| 1827 | Named\index{named arguments}\index{arguments!named} and default\index{default arguments}\index{arguments!default} arguments~\cite{Hardgrave76}\footnote{ |
| 1828 | Francez~\cite{Francez77} proposed a further extension to the named-parameter passing style, which specifies what type of communication (by value, by reference, by name) the argument is passed to the routine.} |
| 1829 | are two mechanisms to simplify routine call. |
| 1830 | Both mechanisms are discussed with respect to \CFA. |
| 1831 | \begin{description} |
| 1832 | \item[Named (or Keyword) Arguments:] |
| 1833 | provide the ability to specify an argument to a routine call using the parameter name rather than the position of the parameter. |
| 1834 | For example, given the routine: |
| 1835 | \begin{cfa} |
| 1836 | void p( int x, int y, int z ) {...} |
| 1837 | \end{cfa} |
| 1838 | a positional call is: |
| 1839 | \begin{cfa} |
| 1840 | p( 4, 7, 3 ); |
| 1841 | \end{cfa} |
| 1842 | whereas a named (keyword) call may be: |
| 1843 | \begin{cfa} |
| 1844 | p( z : 3, x : 4, y : 7 ); §\C{// rewrite $\Rightarrow$ p( 4, 7, 3 )}§ |
| 1845 | \end{cfa} |
| 1846 | Here the order of the arguments is unimportant, and the names of the parameters are used to associate argument values with the corresponding parameters. |
| 1847 | The compiler rewrites a named call into a positional call. |
| 1848 | The advantages of named parameters are: |
| 1849 | \begin{itemize} |
| 1850 | \item |
| 1851 | Remembering the names of the parameters may be easier than the order in the routine definition. |
| 1852 | \item |
| 1853 | Parameter names provide documentation at the call site (assuming the names are descriptive). |
| 1854 | \item |
| 1855 | Changes can be made to the order or number of parameters without affecting the call (although the call must still be recompiled). |
| 1856 | \end{itemize} |
| 1857 | |
| 1858 | Unfortunately, named arguments do not work in C-style programming-languages because a routine prototype is not required to specify parameter names, nor do the names in the prototype have to match with the actual definition. |
| 1859 | For example, the following routine prototypes and definition are all valid. |
| 1860 | \begin{cfa} |
| 1861 | void p( int, int, int ); §\C{// equivalent prototypes}§ |
| 1862 | void p( int x, int y, int z ); |
| 1863 | void p( int y, int x, int z ); |
| 1864 | void p( int z, int y, int x ); |
| 1865 | void p( int q, int r, int s ) {} §\C{// match with this definition}§ |
| 1866 | \end{cfa} |
| 1867 | Forcing matching parameter names in routine prototypes with corresponding routine definitions is possible, but goes against a strong tradition in C programming. |
| 1868 | Alternatively, prototype definitions can be eliminated by using a two-pass compilation, and implicitly creating header files for exports. |
| 1869 | The former is easy to do, while the latter is more complex. |
| 1870 | |
| 1871 | Furthermore, named arguments do not work well in a \CFA-style programming-languages because they potentially introduces a new criteria for type matching. |
| 1872 | For example, it is technically possible to disambiguate between these two overloaded definitions of ©f© based on named arguments at the call site: |
| 1873 | \begin{cfa} |
| 1874 | int f( int i, int j ); |
| 1875 | int f( int x, double y ); |
| 1876 | |
| 1877 | f( j : 3, i : 4 ); §\C{// 1st f}§ |
| 1878 | f( x : 7, y : 8.1 ); §\C{// 2nd f}§ |
| 1879 | f( 4, 5 ); §\C{// ambiguous call}§ |
| 1880 | \end{cfa} |
| 1881 | However, named arguments compound routine resolution in conjunction with conversions: |
| 1882 | \begin{cfa} |
| 1883 | f( i : 3, 5.7 ); §\C{// ambiguous call ?}§ |
| 1884 | \end{cfa} |
| 1885 | Depending on the cost associated with named arguments, this call could be resolvable or ambiguous. |
| 1886 | Adding named argument into the routine resolution algorithm does not seem worth the complexity. |
| 1887 | Therefore, \CFA does \emph{not} attempt to support named arguments. |
| 1888 | |
| 1889 | \item[Default Arguments] |
| 1890 | provide the ability to associate a default value with a parameter so it can be optionally specified in the argument list. |
| 1891 | For example, given the routine: |
| 1892 | \begin{cfa} |
| 1893 | void p( int x = 1, int y = 2, int z = 3 ) {...} |
| 1894 | \end{cfa} |
| 1895 | the allowable positional calls are: |
| 1896 | \begin{cfa} |
| 1897 | p(); §\C{// rewrite $\Rightarrow$ p( 1, 2, 3 )}§ |
| 1898 | p( 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 2, 3 )}§ |
| 1899 | p( 4, 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 4, 3 )}§ |
| 1900 | p( 4, 4, 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 4, 4 )}§ |
| 1901 | // empty arguments |
| 1902 | p( , 4, 4 ); §\C{// rewrite $\Rightarrow$ p( 1, 4, 4 )}§ |
| 1903 | p( 4, , 4 ); §\C{// rewrite $\Rightarrow$ p( 4, 2, 4 )}§ |
| 1904 | p( 4, 4, ); §\C{// rewrite $\Rightarrow$ p( 4, 4, 3 )}§ |
| 1905 | p( 4, , ); §\C{// rewrite $\Rightarrow$ p( 4, 2, 3 )}§ |
| 1906 | p( , 4, ); §\C{// rewrite $\Rightarrow$ p( 1, 4, 3 )}§ |
| 1907 | p( , , 4 ); §\C{// rewrite $\Rightarrow$ p( 1, 2, 4 )}§ |
| 1908 | p( , , ); §\C{// rewrite $\Rightarrow$ p( 1, 2, 3 )}§ |
| 1909 | \end{cfa} |
| 1910 | Here the missing arguments are inserted from the default values in the parameter list. |
| 1911 | The compiler rewrites missing default values into explicit positional arguments. |
| 1912 | The advantages of default values are: |
| 1913 | \begin{itemize} |
| 1914 | \item |
| 1915 | Routines with a large number of parameters are often very generalized, giving a programmer a number of different options on how a computation is performed. |
| 1916 | For many of these kinds of routines, there are standard or default settings that work for the majority of computations. |
| 1917 | Without default values for parameters, a programmer is forced to specify these common values all the time, resulting in long argument lists that are error prone. |
| 1918 | \item |
| 1919 | When a routine's interface is augmented with new parameters, it extends the interface providing generalizability\footnote{ |
| 1920 | ``It should be possible for the implementor of an abstraction to increase its generality. |
| 1921 | So long as the modified abstraction is a generalization of the original, existing uses of the abstraction will not require change. |
| 1922 | It might be possible to modify an abstraction in a manner which is not a generalization without affecting existing uses, but, without inspecting the modules in which the uses occur, this possibility cannot be determined. |
| 1923 | This criterion precludes the addition of parameters, unless these parameters have default or inferred values that are valid for all possible existing applications.''~\cite[p.~128]{Cormack90}} |
| 1924 | (somewhat like the generalization provided by inheritance for classes). |
| 1925 | That is, all existing calls are still valid, although the call must still be recompiled. |
| 1926 | \end{itemize} |
| 1927 | The only disadvantage of default arguments is that unintentional omission of an argument may not result in a compiler-time error. |
| 1928 | Instead, a default value is used, which may not be the programmer's intent. |
| 1929 | |
| 1930 | Default values may only appear in a prototype versus definition context: |
| 1931 | \begin{cfa} |
| 1932 | void p( int x, int y = 2, int z = 3 ); §\C{// prototype: allowed}§ |
| 1933 | void p( int, int = 2, int = 3 ); §\C{// prototype: allowed}§ |
| 1934 | void p( int x, int y = 2, int z = 3 ) {} §\C{// definition: not allowed}§ |
| 1935 | \end{cfa} |
| 1936 | The reason for this restriction is to allow separate compilation. |
| 1937 | Multiple prototypes with different default values is an error. |
| 1938 | \end{description} |
| 1939 | |
| 1940 | Ellipse (``...'') arguments present problems when used with default arguments. |
| 1941 | The conflict occurs because both named and ellipse arguments must appear after positional arguments, giving two possibilities: |
| 1942 | \begin{cfa} |
| 1943 | p( /* positional */, ... , /* named */ ); |
| 1944 | p( /* positional */, /* named */, ... ); |
| 1945 | \end{cfa} |
| 1946 | While it is possible to implement both approaches, the first possibly is more complex than the second, \eg: |
| 1947 | \begin{cfa} |
| 1948 | p( int x, int y, int z, ... ); |
| 1949 | p( 1, 4, 5, 6, z : 3, y : 2 ); §\C{// assume p( /* positional */, ... , /* named */ );}§ |
| 1950 | p( 1, z : 3, y : 2, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§ |
| 1951 | \end{cfa} |
| 1952 | In the first call, it is necessary for the programmer to conceptually rewrite the call, changing named arguments into positional, before knowing where the ellipse arguments begin. |
| 1953 | Hence, this approach seems significantly more difficult, and hence, confusing and error prone. |
| 1954 | In the second call, the named arguments separate the positional and ellipse arguments, making it trivial to read the call. |
| 1955 | |
| 1956 | The problem is exacerbated with default arguments, \eg: |
| 1957 | \begin{cfa} |
| 1958 | void p( int x, int y = 2, int z = 3... ); |
| 1959 | p( 1, 4, 5, 6, z : 3 ); §\C{// assume p( /* positional */, ... , /* named */ );}§ |
| 1960 | p( 1, z : 3, 4, 5, 6 ); §\C{// assume p( /* positional */, /* named */, ... );}§ |
| 1961 | \end{cfa} |
| 1962 | The first call is an error because arguments 4 and 5 are actually positional not ellipse arguments; |
| 1963 | therefore, argument 5 subsequently conflicts with the named argument z : 3. |
| 1964 | In the second call, the default value for y is implicitly inserted after argument 1 and the named arguments separate the positional and ellipse arguments, making it trivial to read the call. |
| 1965 | For these reasons, \CFA requires named arguments before ellipse arguments. |
| 1966 | Finally, while ellipse arguments are needed for a small set of existing C routines, like printf, the extended \CFA type system largely eliminates the need for ellipse arguments (see Section 24), making much of this discussion moot. |
| 1967 | |
| 1968 | Default arguments and overloading (see Section 24) are complementary. |
| 1969 | While in theory default arguments can be simulated with overloading, as in: |
| 1970 | \begin{quote2} |
| 1971 | \begin{tabular}{@{}l@{\hspace{3em}}l@{}} |
| 1972 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{default arguments}} & \multicolumn{1}{c}{\textbf{overloading}} \\ |
| 1973 | \begin{cfa} |
| 1974 | void p( int x, int y = 2, int z = 3 ) {...} |
| 1975 | |
| 1976 | |
| 1977 | \end{cfa} |
| 1978 | & |
| 1979 | \begin{cfa} |
| 1980 | void p( int x, int y, int z ) {...} |
| 1981 | void p( int x ) { p( x, 2, 3 ); } |
| 1982 | void p( int x, int y ) { p( x, y, 3 ); } |
| 1983 | \end{cfa} |
| 1984 | \end{tabular} |
| 1985 | \end{quote2} |
| 1986 | the number of required overloaded routines is linear in the number of default values, which is unacceptable growth. |
| 1987 | In general, overloading should only be used over default arguments if the body of the routine is significantly different. |
| 1988 | Furthermore, overloading cannot handle accessing default arguments in the middle of a positional list, via a missing argument, such as: |
| 1989 | \begin{cfa} |
| 1990 | p( 1, /* default */, 5 ); §\C{// rewrite $\Rightarrow$ p( 1, 2, 5 )}§ |
| 1991 | \end{cfa} |
| 1992 | |
| 1993 | Given the \CFA restrictions above, both named and default arguments are backwards compatible. |
| 1994 | \Index*[C++]{\CC{}} only supports default arguments; |
| 1995 | \Index*{Ada} supports both named and default arguments. |
| 1996 | |
| 1997 | |
| 1998 | \section{Unnamed Structure Fields} |
| 1999 | |
| 2000 | C requires each field of a structure to have a name, except for a bit field associated with a basic type, \eg: |
| 2001 | \begin{cfa} |
| 2002 | struct { |
| 2003 | int f1; §\C{// named field}§ |
| 2004 | int f2 : 4; §\C{// named field with bit field size}§ |
| 2005 | int : 3; §\C{// unnamed field for basic type with bit field size}§ |
| 2006 | int ; §\C{// disallowed, unnamed field}§ |
| 2007 | int *; §\C{// disallowed, unnamed field}§ |
| 2008 | int (*)( int ); §\C{// disallowed, unnamed field}§ |
| 2009 | }; |
| 2010 | \end{cfa} |
| 2011 | This requirement is relaxed by making the field name optional for all field declarations; therefore, all the field declarations in the example are allowed. |
| 2012 | As for unnamed bit fields, an unnamed field is used for padding a structure to a particular size. |
| 2013 | A list of unnamed fields is also supported, \eg: |
| 2014 | \begin{cfa} |
| 2015 | struct { |
| 2016 | int , , ; §\C{// 3 unnamed fields}§ |
| 2017 | } |
| 2018 | \end{cfa} |
| 2019 | |
| 2020 | |
| 2021 | \section{Nesting} |
| 2022 | |
| 2023 | Nesting of types and routines is useful for controlling name visibility (\newterm{name hiding}). |
| 2024 | |
| 2025 | |
| 2026 | \subsection{Type Nesting} |
| 2027 | |
| 2028 | \CFA allows \Index{type nesting}, and type qualification of the nested types (see \VRef[Figure]{f:TypeNestingQualification}), where as C hoists\index{type hoisting} (refactors) nested types into the enclosing scope and has no type qualification. |
| 2029 | \begin{figure} |
| 2030 | \centering |
| 2031 | \begin{tabular}{@{}l@{\hspace{3em}}l|l@{}} |
| 2032 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{C Type Nesting}} & \multicolumn{1}{c}{\textbf{C Implicit Hoisting}} & \multicolumn{1}{|c}{\textbf{\CFA}} \\ |
| 2033 | \hline |
| 2034 | \begin{cfa} |
| 2035 | struct S { |
| 2036 | enum C { R, G, B }; |
| 2037 | struct T { |
| 2038 | union U { int i, j; }; |
| 2039 | enum C c; |
| 2040 | short int i, j; |
| 2041 | }; |
| 2042 | struct T t; |
| 2043 | } s; |
| 2044 | |
| 2045 | int fred() { |
| 2046 | s.t.c = R; |
| 2047 | struct T t = { R, 1, 2 }; |
| 2048 | enum C c; |
| 2049 | union U u; |
| 2050 | } |
| 2051 | \end{cfa} |
| 2052 | & |
| 2053 | \begin{cfa} |
| 2054 | enum C { R, G, B }; |
| 2055 | union U { int i, j; }; |
| 2056 | struct T { |
| 2057 | enum C c; |
| 2058 | short int i, j; |
| 2059 | }; |
| 2060 | struct S { |
| 2061 | struct T t; |
| 2062 | } s; |
| 2063 | |
| 2064 | |
| 2065 | |
| 2066 | |
| 2067 | |
| 2068 | |
| 2069 | |
| 2070 | \end{cfa} |
| 2071 | & |
| 2072 | \begin{cfa} |
| 2073 | struct S { |
| 2074 | enum C { R, G, B }; |
| 2075 | struct T { |
| 2076 | union U { int i, j; }; |
| 2077 | enum C c; |
| 2078 | short int i, j; |
| 2079 | }; |
| 2080 | struct T t; |
| 2081 | } s; |
| 2082 | |
| 2083 | int fred() { |
| 2084 | s.t.c = ®S.®R; // type qualification |
| 2085 | struct ®S.®T t = { ®S.®R, 1, 2 }; |
| 2086 | enum ®S.®C c; |
| 2087 | union ®S.T.®U u; |
| 2088 | } |
| 2089 | \end{cfa} |
| 2090 | \end{tabular} |
| 2091 | \caption{Type Nesting / Qualification} |
| 2092 | \label{f:TypeNestingQualification} |
| 2093 | \end{figure} |
| 2094 | In the left example in C, types ©C©, ©U© and ©T© are implicitly hoisted outside of type ©S© into the containing block scope. |
| 2095 | In the right example in \CFA, the types are not hoisted and accessed using the field-selection operator ``©.©'' for type qualification, as does \Index*{Java}, rather than the \CC type-selection operator ``©::©''. |
| 2096 | |
| 2097 | |
| 2098 | \subsection{Routine Nesting} |
| 2099 | |
| 2100 | While \CFA does not provide object programming by putting routines into structures, it does rely heavily on locally nested routines to redefine operations at or close to a call site. |
| 2101 | For example, the C quick-sort is wrapped into the following polymorphic \CFA routine: |
| 2102 | \begin{cfa} |
| 2103 | forall( otype T | { int ?<?( T, T ); } ) |
| 2104 | void qsort( const T * arr, size_t dimension ); |
| 2105 | \end{cfa} |
| 2106 | which can be used to sort in ascending and descending order by locally redefining the less-than operator into greater-than. |
| 2107 | \begin{cfa} |
| 2108 | const unsigned int size = 5; |
| 2109 | int ia[size]; |
| 2110 | ... §\C{// assign values to array ia}§ |
| 2111 | qsort( ia, size ); §\C{// sort ascending order using builtin ?<?}§ |
| 2112 | { |
| 2113 | ®int ?<?( int x, int y ) { return x > y; }® §\C{// nested routine}§ |
| 2114 | qsort( ia, size ); §\C{// sort descending order by local redefinition}§ |
| 2115 | } |
| 2116 | \end{cfa} |
| 2117 | |
| 2118 | Nested routines are not first-class, meaning a nested routine cannot be returned if it has references to variables in its enclosing blocks; |
| 2119 | the only exception is references to the external block of the translation unit, as these variables persist for the duration of the program. |
| 2120 | The following program in undefined in \CFA (and Indexc{gcc}) |
| 2121 | \begin{cfa} |
| 2122 | [* [int]( int )] foo() { §\C{// int (*foo())( int )}§ |
| 2123 | int ®i® = 7; |
| 2124 | int bar( int p ) { |
| 2125 | ®i® += 1; §\C{// dependent on local variable}§ |
| 2126 | sout | ®i® | endl; |
| 2127 | } |
| 2128 | return bar; §\C{// undefined because of local dependence}§ |
| 2129 | } |
| 2130 | int main() { |
| 2131 | * [int]( int ) fp = foo(); §\C{// int (*fp)( int )}§ |
| 2132 | sout | fp( 3 ) | endl; |
| 2133 | } |
| 2134 | \end{cfa} |
| 2135 | because |
| 2136 | |
| 2137 | Currently, there are no \Index{lambda} expressions, \ie unnamed routines because routine names are very important to properly select the correct routine. |
| 2138 | |
| 2139 | |
| 2140 | \section{Tuples} |
| 2141 | |
| 2142 | In C and \CFA, lists of elements appear in several contexts, such as the parameter list for a routine call. |
| 2143 | (More contexts are added shortly.) |
| 2144 | A list of such elements is called a \newterm{lexical list}. |
| 2145 | The general syntax of a lexical list is: |
| 2146 | \begin{cfa} |
| 2147 | [ §\emph{exprlist}§ ] |
| 2148 | \end{cfa} |
| 2149 | where ©$\emph{exprlist}$© is a list of one or more expressions separated by commas. |
| 2150 | The brackets, ©[]©, allow differentiating between lexical lists and expressions containing the C comma operator. |
| 2151 | The following are examples of lexical lists: |
| 2152 | \begin{cfa} |
| 2153 | [ x, y, z ] |
| 2154 | [ 2 ] |
| 2155 | [ v+w, x*y, 3.14159, f() ] |
| 2156 | \end{cfa} |
| 2157 | Tuples are permitted to contain sub-tuples (\ie nesting), such as ©[ [ 14, 21 ], 9 ]©, which is a 2-element tuple whose first element is itself a tuple. |
| 2158 | Note, a tuple is not a record (structure); |
| 2159 | a record denotes a single value with substructure, whereas a tuple is multiple values with no substructure (see flattening coercion in Section 12.1). |
| 2160 | In essence, tuples are largely a compile time phenomenon, having little or no runtime presence. |
| 2161 | |
| 2162 | Tuples can be organized into compile-time tuple variables; |
| 2163 | these variables are of \newterm{tuple type}. |
| 2164 | Tuple variables and types can be used anywhere lists of conventional variables and types can be used. |
| 2165 | The general syntax of a tuple type is: |
| 2166 | \begin{cfa} |
| 2167 | [ §\emph{typelist}§ ] |
| 2168 | \end{cfa} |
| 2169 | where ©$\emph{typelist}$© is a list of one or more legal \CFA or C type specifications separated by commas, which may include other tuple type specifications. |
| 2170 | Examples of tuple types include: |
| 2171 | \begin{cfa} |
| 2172 | [ unsigned int, char ] |
| 2173 | [ double, double, double ] |
| 2174 | [ * int, int * ] §\C{// mix of CFA and ANSI}§ |
| 2175 | [ * [ 5 ] int, * * char, * [ [ int, int ] ] (int, int) ] |
| 2176 | \end{cfa} |
| 2177 | Like tuples, tuple types may be nested, such as ©[ [ int, int ], int ]©, which is a 2-element tuple type whose first element is itself a tuple type. |
| 2178 | |
| 2179 | Examples of declarations using tuple types are: |
| 2180 | \begin{cfa} |
| 2181 | [ int, int ] x; §\C{// 2 element tuple, each element of type int}§ |
| 2182 | * [ char, char ] y; §\C{// pointer to a 2 element tuple}§ |
| 2183 | [ [ int, int ] ] z ([ int, int ]); |
| 2184 | \end{cfa} |
| 2185 | The last example declares an external routine that expects a 2 element tuple as an input parameter and returns a 2 element tuple as its result. |
| 2186 | |
| 2187 | As mentioned, tuples can appear in contexts requiring a list of value, such as an argument list of a routine call. |
| 2188 | In unambiguous situations, the tuple brackets may be omitted, \eg a tuple that appears as an argument may have its |
| 2189 | square brackets omitted for convenience; therefore, the following routine invocations are equivalent: |
| 2190 | \begin{cfa} |
| 2191 | f( [ 1, x+2, fred() ] ); |
| 2192 | f( 1, x+2, fred() ); |
| 2193 | \end{cfa} |
| 2194 | Also, a tuple or a tuple variable may be used to supply all or part of an argument list for a routine expecting multiple input parameters or for a routine expecting a tuple as an input parameter. |
| 2195 | For example, the following are all legal: |
| 2196 | \begin{cfa} |
| 2197 | [ int, int ] w1; |
| 2198 | [ int, int, int ] w2; |
| 2199 | [ void ] f (int, int, int); /* three input parameters of type int */ |
| 2200 | [ void ] g ([ int, int, int ]); /* 3 element tuple as input */ |
| 2201 | f( [ 1, 2, 3 ] ); |
| 2202 | f( w1, 3 ); |
| 2203 | f( 1, w1 ); |
| 2204 | f( w2 ); |
| 2205 | g( [ 1, 2, 3 ] ); |
| 2206 | g( w1, 3 ); |
| 2207 | g( 1, w1 ); |
| 2208 | g( w2 ); |
| 2209 | \end{cfa} |
| 2210 | Note, in all cases 3 arguments are supplied even though the syntax may appear to supply less than 3. As mentioned, a |
| 2211 | tuple does not have structure like a record; a tuple is simply converted into a list of components. |
| 2212 | \begin{rationale} |
| 2213 | The present implementation of \CFA does not support nested routine calls when the inner routine returns multiple values; \ie a statement such as ©g( f() )© is not supported. |
| 2214 | Using a temporary variable to store the results of the inner routine and then passing this variable to the outer routine works, however. |
| 2215 | \end{rationale} |
| 2216 | |
| 2217 | A tuple can contain a C comma expression, provided the expression containing the comma operator is enclosed in parentheses. |
| 2218 | For instance, the following tuples are equivalent: |
| 2219 | \begin{cfa} |
| 2220 | [ 1, 3, 5 ] |
| 2221 | [ 1, (2, 3), 5 ] |
| 2222 | \end{cfa} |
| 2223 | The second element of the second tuple is the expression (2, 3), which yields the result 3. |
| 2224 | This requirement is the same as for comma expressions in argument lists. |
| 2225 | |
| 2226 | Type qualifiers, \ie const and volatile, may modify a tuple type. |
| 2227 | The meaning is the same as for a type qualifier modifying an aggregate type [Int99, x 6.5.2.3(7),x 6.7.3(11)], \ie the qualifier is distributed across all of the types in the tuple, \eg: |
| 2228 | \begin{cfa} |
| 2229 | const volatile [ int, float, const int ] x; |
| 2230 | \end{cfa} |
| 2231 | is equivalent to: |
| 2232 | \begin{cfa} |
| 2233 | [ const volatile int, const volatile float, const volatile int ] x; |
| 2234 | \end{cfa} |
| 2235 | Declaration qualifiers can only appear at the start of a \CFA tuple declaration4, \eg: |
| 2236 | \begin{cfa} |
| 2237 | extern [ int, int ] w1; |
| 2238 | static [ int, int, int ] w2; |
| 2239 | \end{cfa} |
| 2240 | \begin{rationale} |
| 2241 | Unfortunately, C's syntax for subscripts precluded treating them as tuples. |
| 2242 | The C subscript list has the form ©[i][j]...© and not ©[i, j, ...]©. |
| 2243 | Therefore, there is no syntactic way for a routine returning multiple values to specify the different subscript values, \eg ©f[g()]© always means a single subscript value because there is only one set of brackets. |
| 2244 | Fixing this requires a major change to C because the syntactic form ©M[i, j, k]© already has a particular meaning: ©i, j, k© is a comma expression. |
| 2245 | \end{rationale} |
| 2246 | |
| 2247 | |
| 2248 | \subsection{Tuple Coercions} |
| 2249 | |
| 2250 | There are four coercions that can be performed on tuples and tuple variables: closing, opening, flattening and structuring. |
| 2251 | In addition, the coercion of dereferencing can be performed on a tuple variable to yield its value(s), as for other variables. |
| 2252 | A \newterm{closing coercion} takes a set of values and converts it into a tuple value, which is a contiguous set of values, as in: |
| 2253 | \begin{cfa} |
| 2254 | [ int, int, int, int ] w; |
| 2255 | w = [ 1, 2, 3, 4 ]; |
| 2256 | \end{cfa} |
| 2257 | First the right-hand tuple is closed into a tuple value and then the tuple value is assigned. |
| 2258 | |
| 2259 | An \newterm{opening coercion} is the opposite of closing; a tuple value is converted into a tuple of values, as in: |
| 2260 | \begin{cfa} |
| 2261 | [ a, b, c, d ] = w |
| 2262 | \end{cfa} |
| 2263 | ©w© is implicitly opened to yield a tuple of four values, which are then assigned individually. |
| 2264 | |
| 2265 | A \newterm{flattening coercion} coerces a nested tuple, \ie a tuple with one or more components, which are themselves tuples, into a flattened tuple, which is a tuple whose components are not tuples, as in: |
| 2266 | \begin{cfa} |
| 2267 | [ a, b, c, d ] = [ 1, [ 2, 3 ], 4 ]; |
| 2268 | \end{cfa} |
| 2269 | First the right-hand tuple is flattened and then the values are assigned individually. |
| 2270 | Flattening is also performed on tuple types. |
| 2271 | For example, the type ©[ int, [ int, int ], int ]© can be coerced, using flattening, into the type ©[ int, int, int, int ]©. |
| 2272 | |
| 2273 | A \newterm{structuring coercion} is the opposite of flattening; |
| 2274 | a tuple is structured into a more complex nested tuple. |
| 2275 | For example, structuring the tuple ©[ 1, 2, 3, 4 ]© into the tuple ©[ 1, [ 2, 3 ], 4 ]© or the tuple type ©[ int, int, int, int ]© into the tuple type ©[ int, [ int, int ], int ]©. |
| 2276 | In the following example, the last assignment illustrates all the tuple coercions: |
| 2277 | \begin{cfa} |
| 2278 | [ int, int, int, int ] w = [ 1, 2, 3, 4 ]; |
| 2279 | int x = 5; |
| 2280 | [ x, w ] = [ w, x ]; §\C{// all four tuple coercions}§ |
| 2281 | \end{cfa} |
| 2282 | Starting on the right-hand tuple in the last assignment statement, w is opened, producing a tuple of four values; |
| 2283 | therefore, the right-hand tuple is now the tuple ©[ [ 1, 2, 3, 4 ], 5 ]©. |
| 2284 | This tuple is then flattened, yielding ©[ 1, 2, 3, 4, 5 ]©, which is structured into ©[ 1, [ 2, 3, 4, 5 ] ]© to match the tuple type of the left-hand side. |
| 2285 | The tuple ©[ 2, 3, 4, 5 ]© is then closed to create a tuple value. |
| 2286 | Finally, ©x© is assigned ©1© and ©w© is assigned the tuple value using multiple assignment (see Section 14). |
| 2287 | \begin{rationale} |
| 2288 | A possible additional language extension is to use the structuring coercion for tuples to initialize a complex record with a tuple. |
| 2289 | \end{rationale} |
| 2290 | |
| 2291 | |
| 2292 | \section{Mass Assignment} |
| 2293 | |
| 2294 | \CFA permits assignment to several variables at once using mass assignment~\cite{CLU}. |
| 2295 | Mass assignment has the following form: |
| 2296 | \begin{cfa} |
| 2297 | [ §\emph{lvalue}§, ... , §\emph{lvalue}§ ] = §\emph{expr}§; |
| 2298 | \end{cfa} |
| 2299 | \index{lvalue} |
| 2300 | The left-hand side is a tuple of \emph{lvalues}, which is a list of expressions each yielding an address, \ie any data object that can appear on the left-hand side of a conventional assignment statement. |
| 2301 | ©$\emph{expr}$© is any standard arithmetic expression. |
| 2302 | Clearly, the types of the entities being assigned must be type compatible with the value of the expression. |
| 2303 | |
| 2304 | Mass assignment has parallel semantics, \eg the statement: |
| 2305 | \begin{cfa} |
| 2306 | [ x, y, z ] = 1.5; |
| 2307 | \end{cfa} |
| 2308 | is equivalent to: |
| 2309 | \begin{cfa} |
| 2310 | x = 1.5; y = 1.5; z = 1.5; |
| 2311 | \end{cfa} |
| 2312 | This semantics is not the same as the following in C: |
| 2313 | \begin{cfa} |
| 2314 | x = y = z = 1.5; |
| 2315 | \end{cfa} |
| 2316 | as conversions between intermediate assignments may lose information. |
| 2317 | A more complex example is: |
| 2318 | \begin{cfa} |
| 2319 | [ i, y[i], z ] = a + b; |
| 2320 | \end{cfa} |
| 2321 | which is equivalent to: |
| 2322 | \begin{cfa} |
| 2323 | t = a + b; |
| 2324 | a1 = &i; a2 = &y[i]; a3 = &z; |
| 2325 | *a1 = t; *a2 = t; *a3 = t; |
| 2326 | \end{cfa} |
| 2327 | The temporary ©t© is necessary to store the value of the expression to eliminate conversion issues. |
| 2328 | The temporaries for the addresses are needed so that locations on the left-hand side do not change as the values are assigned. |
| 2329 | In this case, ©y[i]© uses the previous value of ©i© and not the new value set at the beginning of the mass assignment. |
| 2330 | |
| 2331 | |
| 2332 | \section{Multiple Assignment} |
| 2333 | |
| 2334 | \CFA also supports the assignment of several values at once, known as multiple assignment~\cite{CLU,Galletly96}. |
| 2335 | Multiple assignment has the following form: |
| 2336 | \begin{cfa} |
| 2337 | [ §\emph{lvalue}§, ... , §\emph{lvalue}§ ] = [ §\emph{expr}§, ... , §\emph{expr}§ ]; |
| 2338 | \end{cfa} |
| 2339 | \index{lvalue} |
| 2340 | The left-hand side is a tuple of \emph{lvalues}, and the right-hand side is a tuple of \emph{expr}s. |
| 2341 | Each \emph{expr} appearing on the right-hand side of a multiple assignment statement is assigned to the corresponding \emph{lvalues} on the left-hand side of the statement using parallel semantics for each assignment. |
| 2342 | An example of multiple assignment is: |
| 2343 | \begin{cfa} |
| 2344 | [ x, y, z ] = [ 1, 2, 3 ]; |
| 2345 | \end{cfa} |
| 2346 | Here, the values ©1©, ©2© and ©3© are assigned, respectively, to the variables ©x©, ©y© and ©z©. |
| 2347 | A more complex example is: |
| 2348 | \begin{cfa} |
| 2349 | [ i, y[ i ], z ] = [ 1, i, a + b ]; |
| 2350 | \end{cfa} |
| 2351 | Here, the values ©1©, ©i© and ©a + b© are assigned to the variables ©i©, ©y[i]© and ©z©, respectively. |
| 2352 | Note, the parallel semantics of |
| 2353 | multiple assignment ensures: |
| 2354 | \begin{cfa} |
| 2355 | [ x, y ] = [ y, x ]; |
| 2356 | \end{cfa} |
| 2357 | correctly interchanges (swaps) the values stored in ©x© and ©y©. |
| 2358 | The following cases are errors: |
| 2359 | \begin{cfa} |
| 2360 | [ a, b, c ] = [ 1, 2, 3, 4 ]; |
| 2361 | [ a, b, c ] = [ 1, 2 ]; |
| 2362 | \end{cfa} |
| 2363 | because the number of entities in the left-hand tuple is unequal with the right-hand tuple. |
| 2364 | |
| 2365 | As for all tuple contexts in C, side effects should not be used because C does not define an ordering for the evaluation of the elements of a tuple; |
| 2366 | both these examples produce indeterminate results: |
| 2367 | \begin{cfa} |
| 2368 | f( x++, x++ ); §\C{// C routine call with side effects in arguments}§ |
| 2369 | [ v1, v2 ] = [ x++, x++ ]; §\C{// side effects in righthand side of multiple assignment}§ |
| 2370 | \end{cfa} |
| 2371 | |
| 2372 | |
| 2373 | \section{Cascade Assignment} |
| 2374 | |
| 2375 | As in C, \CFA mass and multiple assignments can be cascaded, producing cascade assignment. |
| 2376 | Cascade assignment has the following form: |
| 2377 | \begin{cfa} |
| 2378 | §\emph{tuple}§ = §\emph{tuple}§ = ... = §\emph{tuple}§; |
| 2379 | \end{cfa} |
| 2380 | and it has the same parallel semantics as for mass and multiple assignment. |
| 2381 | Some examples of cascade assignment are: |
| 2382 | \begin{cfa} |
| 2383 | x1 = y1 = x2 = y2 = 0; |
| 2384 | [ x1, y1 ] = [ x2, y2 ] = [ x3, y3 ]; |
| 2385 | [ x1, y1 ] = [ x2, y2 ] = 0; |
| 2386 | [ x1, y1 ] = z = 0; |
| 2387 | \end{cfa} |
| 2388 | As in C, the rightmost assignment is performed first, \ie assignment parses right to left. |
| 2389 | |
| 2390 | |
| 2391 | \section{Field Tuples} |
| 2392 | |
| 2393 | Tuples may be used to select multiple fields of a record by field name. |
| 2394 | Its general form is: |
| 2395 | \begin{cfa} |
| 2396 | §\emph{expr}§ . [ §\emph{fieldlist}§ ] |
| 2397 | §\emph{expr}§ -> [ §\emph{fieldlist}§ ] |
| 2398 | \end{cfa} |
| 2399 | \emph{expr} is any expression yielding a value of type record, \eg ©struct©, ©union©. |
| 2400 | Each element of \emph{ fieldlist} is an element of the record specified by \emph{expr}. |
| 2401 | A record-field tuple may be used anywhere a tuple can be used. An example of the use of a record-field tuple is |
| 2402 | the following: |
| 2403 | \begin{cfa} |
| 2404 | struct s { |
| 2405 | int f1, f2; |
| 2406 | char f3; |
| 2407 | double f4; |
| 2408 | } v; |
| 2409 | v.[ f3, f1, f2 ] = ['x', 11, 17 ]; §\C{// equivalent to v.f3 = 'x', v.f1 = 11, v.f2 = 17}§ |
| 2410 | f( v.[ f3, f1, f2 ] ); §\C{// equivalent to f( v.f3, v.f1, v.f2 )}§ |
| 2411 | \end{cfa} |
| 2412 | Note, the fields appearing in a record-field tuple may be specified in any order; |
| 2413 | also, it is unnecessary to specify all the fields of a struct in a multiple record-field tuple. |
| 2414 | |
| 2415 | If a field of a ©struct© is itself another ©struct©, multiple fields of this subrecord can be specified using a nested record-field tuple, as in the following example: |
| 2416 | \begin{cfa} |
| 2417 | struct inner { |
| 2418 | int f2, f3; |
| 2419 | }; |
| 2420 | struct outer { |
| 2421 | int f1; |
| 2422 | struct inner i; |
| 2423 | double f4; |
| 2424 | } o; |
| 2425 | |
| 2426 | o.[ f1, i.[ f2, f3 ], f4 ] = [ 11, 12, 13, 3.14159 ]; |
| 2427 | \end{cfa} |
| 2428 | |
| 2429 | |