Context Navigation

← Previous Changeset
Next Changeset →

Changeset 52931c5

Timestamp:

Mar 28, 2025, 4:58:25 AM (3 days ago)

Author:

Fangren Yu <f37yu@…>

Branches:

master

Parents:

185cd94

Message:

add some discussion regarding type environment

File:

: 1 edited

doc/theses/fangren_yu_MMath/content2.tex (modified) (5 diffs)

Legend:

: Unmodified
: Added
: Removed

TabularUnified doc/theses/fangren_yu_MMath/content2.tex ¶

-                      r185cd94
+                      r52931c5
+}
 \end{cfa}
 This helper function is used for performance logging as part of computing a geometric;
+This helper function is used for performance logging as part of computing a geometric mean;
 it is called during summing of logarithmic values.
 However, the function name @log2@ is overloaded in \CFA for both integer and floating point types.
 …
 When asked, the developer expected the floating-point overload because of return-type overloading.
 This mistake went unnoticed because the truncated component was insignificant in the performance logging.
+\PAB{Not sure I understand this: The conclusion is that matching the return type higher up in the expression tree is better, in cases where the total expression cost is equal.}
+Investigation of this example leads to the decision that the resolution algorithm favors a lower conversion cost up the expression tree when the total global cost is equal.
 Another change addresses the issue that C arithmetic expressions have unique meanings governed by its arithmetic conversion rules.
 …
 \CFA does not attempt to do any type \textit{inference} \see{\VRef{s:IntoTypeInferencing}}: it has no anonymous functions (\ie lambdas, commonly found in functional programming and also used in \CC and Java), and the variable types must all be explicitly defined (no auto typing).
 This restriction makes the unification problem more tractable in \CFA, as the argument types at each call site are usually all specified.
+There is a single exception case when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression.
+\begin{cfa}
+example ... does malloc work here
+There is a single exception case when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression. One such example is the \CFA wrapper for @malloc@ which also infers size argument from the deducted return type.
+\begin{cfa}
+forall (T*) T* malloc() {
+        return (T*) malloc (sizeof(T)); // calls C malloc with the size inferred from context
+}
 \end{cfa}
 A top level expression whose type still contains an unbounded type variable is considered ill-formed as such an expression is inherently ambiguous.
 …
 max (42, 3.14); // OK, T=double; requires explicit type annotation in C++ such as max<double>(42, 3.14);
 \end{cfa}
+The current \CFA documentation does not include a formal set of rules for type unification.
+In practice, the algorithm implemented in the \CFA translator can be summarized as follows, given a function signature forall$(T_1,..., T_n) f(p_1, ..., p_m)$ and argument types $(a_1, ..., a_m)$, the unification algorithm performs the following steps: \footnote{This assumes argument tuples are already expanded to the individual components.}
+\begin{enumerate}
+\item The type environment is initialized as the union of all type environments of the arguments, plus $(T_1,...,T_n)$ as free variables.
+The inclusion of argument environments serves the purpose of resolving polymorphic return types that needs to be deduced.
+\item Initially, all type variables
+\end{enumerate}
+From a theoretical point of view, the simplified implementation of type environment has its shortcomings. There are some cases that do not work nicely with this implementation and some compromise has to be made. A more detailed discussion follows at the end of this chapter.
 …
 \section{Compiler Implementation Considerations}
+\CFA is still an experimental language and there is no formal specification of expression resolution rules yet.
+Presently there is also only one reference implementation, namely the cfa-cc translator, which is under active development and the specific behavior of the implementation can change frequently as new features are added.
+Ideally, the goal of expression resolution involving polymorphic functions would be to find the set of type variable assignments such that the global conversion cost is minimal and all assertion variables can be satisfied.
+Unfortunately, with a lot of complications involving implicit conversions and assertion variables, fully achieving this goal is not realistic. The \CFA compiler is specifically not designed to accommodate for all edge cases, either.
+Instead it makes a few restrictions to simplify the algorithm so that most expressions that will be encountered in actual code can still pass type checking within a reasonable amount of time.
+As previously mentioned, \CFA polymorphic type resolution is based on a modified version of unification algorithm, where both equivalence (exact) and subtyping (inexact) relations are considered. However, the implementation of type environment is simplified; it only stores a tentative type binding with a flag indicating whether \textit{widening} is possible for an equivalence class of type variables.
+Formally speaking, this means the type environment used in \CFA compiler is only capable of representing \textit{lower bound} constraints.
+This simplification can still work well most of the time, given the following properties of the existing \CFA type system and the resolution algorithms in use:
+\begin{enumerate}
+        \item Type resolution almost exclusively proceeds in bottom-up order, which naturally produces lower bound constraints. Since all identifiers can be overloaded in \CFA, not much definite information can be gained from top-down. In principle it would be possible to detect non-overloaded function names and perform top-down resolution for those; however, the prototype experiments have shown that such optimization does not give a meaningful performance benefit, and therefore it is not implemented.
+        \item Few nontrivial subtyping relationships are present in \CFA, namely the arithmetic types presented in \VRef[Figure]{f:CFACurrArithmeticConversions}, and qualified pointer/reference types. In particular, \CFA lacks the nominal inheritance subtyping present in object-oriented languages, and the generic types do not support covariance on type parameters. As a result, named types such as structs are always matched by strict equivalence, including any type parameters should they exist.
+        \item Unlike in functional programming where subtyping between arrow types exists, \ie if $T_2 <: T_1$ and $U_1 <: U_2$ then $T_1 \rightarrow T_2 <: U_1 \rightarrow U_2$, \CFA uses C function pointer types and the parameter/return types must match exactly to be compatible.
+\end{enumerate}
+\CFA does attempt to incorporate type information propagated from upstream in the case of variable declaration with initializer, since the type of the variable being initialized is definitely known. It is known that the current type environment representation is flawed in handling such type deduction when the return type in the initializer is polymorphic, and an inefficient workaround has to be performed in certain cases. An annotated example is included in the \CFA compiler source code:
+\begin{cfa}
+// If resolution is unsuccessful with a target type, try again without, since it
+// will sometimes succeed when it wouldn't with a target type binding.
+// For example:
+forall( otype T ) T & ?[]( T *, ptrdiff_t );
+const char * x = "hello world";
+int ch = x[0];
+// Fails with simple return type binding (xxx -- check this!) as follows:
+// * T is bound to int
+// * (x: const char *) is unified with int *, which fails
+\end{cfa}
+The problem here is that we can only represent the constraints $T = int$ and $int <: T$, but since the type information flows in the opposite direction, the proper constraint for this case is $T <: int$, which cannot be represented in the simplified type environment. Currently, an attempt to resolve with equality constraint generated from the initialized variable is still made, since it is often the correct type binding (especially in the case where the initialized variable is a struct), and when such attempt fails, the resolution algorithm is rerun without the initialization context.
+One additional remark to make here is that \CFA does not provide a mechanism to explicitly specify values for polymorphic type parameters. In \CC for example, users may specify template arguments in angle brackets, which could be useful when automatic deduction fails, \eg @max<double>(42, 3.14)@.
+There are some partial workarounds such as adding casts to the arguments, but they are not guaranteed to work in all cases. If a type parameter appears in the function return type, however, using the ascription (return) cast will force the desired type binding, since the cast only compiles when the expression type matches exactly with the target.
+\section{Related Work}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 52931c5

Legend:

TabularUnified doc/theses/fangren_yu_MMath/content2.tex ¶

Download in other formats: