source: doc/theses/fangren_yu_MMath/background.tex@ 1e28e05

Last change on this file since 1e28e05 was 57c7e6c4, checked in by Fangren Yu <f37yu@…>, 7 months ago

proofreading fix as suggested by Ondrej

  • Property mode set to 100644
File size: 11.4 KB
Line 
1\chapter{Related Work}
2
3\section{Polymorphism}
4
5ML~\cite{ML} was the first language to support parametric polymorphism.
6Like \CFA, it supports universal type parameters, but not the use of assertions and traits to constrain type arguments.
7Haskell~\cite{Haskell10} combines ML-style polymorphism, polymorphic data types, and type inference with the notion of type classes, collections of overloadable methods that correspond in intent to traits in \CFA.
8Unlike \CFA, Haskell requires an explicit association between types and their classes that specifies the implementation of operations.
9These associations determine the functions that are assertion arguments for particular combinations of class and type, in contrast to \CFA where the assertion arguments are selected at function call sites based upon the set of operations in scope at that point.
10Haskell also severely restricts the use of overloading: an overloaded name can only be associated with a single class, and methods with overloaded names can only be defined as part of instance declarations.
11
12\CC provides three disjoint polymorphic extensions to C: overloading, inheritance, and templates.
13The overloading is restricted because resolution does not use the return type, inheritance requires learning object-oriented programming and coping with a restricted nominal-inheritance hierarchy, templates cannot be separately compiled resulting in compilation/code bloat and poor error messages, and determining how these mechanisms interact and which to use is confusing.
14In contrast, \CFA has a single facility for polymorphic code supporting type-safe separate compilation of polymorphic functions and generic (opaque) types, which uniformly leverage the C procedural paradigm.
15The key mechanism to support separate compilation is \CFA's \emph{explicit} use of assumed type properties.
16Until \CC concepts~\cite{C++Concepts} are standardized (anticipated for \CCtwenty), \CC provides no way of specifying the requirements of a generic function beyond compilation errors during template expansion;
17furthermore, \CC concepts are restricted to template polymorphism.
18
19Cyclone~\cite{Grossman06} also provides capabilities for polymorphic functions and existential types, similar to \CFA's @forall@ functions and generic types.
20Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is a tedious and potentially error-prone process.
21Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as @void *@, \ie only pointer types and @int@.
22In \CFA terms, all Cyclone polymorphism must be dtype-static.
23While the Cyclone design provides the efficiency benefits discussed in~\VRef{s:GenericImplementation} for dtype-static polymorphism, it is more restrictive than \CFA's general model.
24Smith and Volpano~\cite{Smith98} present Polymorphic C, an ML dialect with polymorphic functions, C-like syntax, and pointer types;
25it lacks many of C's features, most notably structure types, and hence, is not a practical C replacement.
26
27Objective-C~\cite{obj-c-book} is an industrially successful extension to C.
28However, Objective-C is a radical departure from C, using an object-oriented model with message passing.
29Objective-C did not support type-checked generics until recently \cite{xcode7}, historically using less-efficient runtime checking of object types.
30The GObject~\cite{GObject} framework also adds object-oriented programming with runtime type-checking and reference-counting garbage collection to C;
31these features are more intrusive additions than those provided by \CFA, in addition to the runtime overhead of reference counting.
32Vala~\cite{Vala} compiles to GObject-based C, adding the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code bases.
33Java~\cite{Java8} included generic types in Java~5, which are type checked at compilation and type erased at runtime, similar to \CFA's.
34However, in Java, each object carries its own table of method pointers, whereas \CFA passes the method pointers separately to maintain a C-compatible layout.
35Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens.
36
37D~\cite{D}, Go, and Rust~\cite{Rust} are modern compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go, and \emph{traits} in Rust.
38However, each language represents a significant departure from C in terms of language model, and none has the same level of compatibility with C as \CFA.
39D and Go are garbage-collected languages, imposing the associated runtime overhead.
40The necessity of accounting for data transfer between managed runtimes and the unmanaged C runtime complicates foreign-function interfaces to C.
41Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more.
42D restricts garbage collection to its own heap by default, whereas Rust is not garbage collected and, thus, has a lighter-weight runtime more interoperable with C.
43Rust also possesses much more powerful abstraction capabilities for writing generic code than Go.
44On the other hand, Rust's borrow checker provides strong safety guarantees but is complex and difficult to learn and imposes a distinctly idiomatic programming style.
45\CFA, with its more modest safety features, allows direct ports of C code while maintaining the idiomatic style of the original source.
46
47
48\section{Tuples/variadics}
49
50\vspace*{-5pt}
51Many programming languages have some form of tuple construct and/or variadic functions, \eg SETL, C, KW-C, \CC, D, Go, Java, ML, and Scala.
52SETL~\cite{SETL} is a high-level mathematical programming language, with tuples being one of the primary data types.
53Tuples in SETL allow subscripting, dynamic expansion, and multiple assignment.
54C provides variadic functions through @va_list@ objects, but the programmer is responsible for managing the number of arguments and their types;
55thus, the mechanism is type unsafe.
56KW-C~\cite{Buhr94a}, a predecessor of \CFA, introduced tuples to C as an extension of the C syntax, taking much of its inspiration from SETL.
57The main contributions of that work were adding MRVF, tuple mass and multiple assignment, and record-member access.
58\CCeleven introduced @std::tuple@ as a library variadic-template structure.
59Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values.
60Operations include @std::get<N>@ to extract values, @std::tie@ to create a tuple of references used for assignment, and lexicographic comparisons.
61\CCseventeen proposes \emph{structured bindings}~\cite{Sutter15} to eliminate predeclaring variables and the use of @std::tie@ for binding the results.
62This extension requires the use of @auto@ to infer the types of the new variables; hence, complicated expressions with a nonobvious type must be documented with some other mechanism.
63Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables.
64Like \CC, D provides tuples through a library variadic-template structure.
65Go does not have tuples but supports MRVF.
66Java's variadic functions appear similar to C's but are type safe using homogeneous arrays, which are less useful than \CFA's heterogeneously typed variadic functions.
67Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML~\cite{sml}, Haskell, and Scala~\cite{Scala}, which decompose tuples using pattern matching.
68
69
70\section{Type Resolution}
71
72\CFA expression resolution must deal with extensive overloading and inference of polymorphic types with assertions.
73The goal is to keep the base algorithm used in type unification simple enough so resolving a complicated expression can still be done reasonably fast.
74The following is work that handles the aforementioned bidirectional subtyping relations concisely.
75
76Melo~\etal~\cite{Melo17} developed PsycheC, a tool built for inferencing missing type and variable declarations of incomplete C programs, which can also be viewed as a dialect of C with type inferencing.
77As PsycheC is built for analyzing standard C programs, it does not have any kind of overloading or polymorphism.
78Instead, all top-level variables and function parameters may have indeterminate types.
79The scope of the problem PsycheC deals with is quite the opposite of that in \CFA, but the techniques used to solve for the unknown types are similar.
80
81In PsycheC, a unification variable is assigned to every identifier with unknown type, and each use of a variable generates a constraint on its type.
82Just like in \CFA, the most common kinds of expression contexts PsycheC deals with are assignments and function calls, which creates inequality constraints: the right hand side of the assignment is a subtype of the left hand side;
83function argument passing is treated the same as assignment.
84Since both sides of the inequality can have unknown types, type resolution needs to handle both upper bound and lower bound constraints.
85Instead of directly solving with lower and upper bounds, the authors developed a two-phase unification algorithm where the equivalences are first resolved and substituted into the inequalities, and showed that by ordering the inequalities using the partial order of concrete types involved, a second pass of unification treating inequalities as equivalences is sufficient to resolve all type variables.
86However, PsycheC does not consider the hierarchy of arithmetic types, as they are all mutually convertible in C, and therefore, considered \emph{equivalent} in unification.
87It also lacks structured types except simple pointers, meaning their strategy of unifying subtypes as equivalences does not work in a more general setting such as in \CFA.
88
89Pierce and Turner~\cite{Pierce00} presented a \emph{local} type-inference method that restricts the propagation of types and use of unification variables within adjacent syntax-tree nodes.
90The technique provides sufficient type checking and inference for the ML family of languages.
91Odersky, Zenger, and Zenger~\cite{Odersky01} refined this method by introducing a coloured type-system that distinguishes inherited types (given directly by context) and synthesized types (deducted from the term itself), to handle bidirectional type propagation in a more concise manner.
92Their algorithm is designed for functional programming, where a first-class (untyped) lambda expression is common and type inference must also deduce the argument types of lambdas requiring bidirectional inference.
93Given that \CFA does not have lambda expressions, but the rest of the type inference cases are quite similar, implementing this type-inference algorithm for \CFA expression resolution might be promising.
94Moreover, the current \CFA implementation also keeps unification variables \emph{almost} local, except when a type variable does not appear in the parameter list.
95Hence, it cannot be given a value from the argument types;
96rather, it is passed upwards as a free variable for matching.
97In all other cases, the substituted minimal types are passed upwards as the result types.
98This process means the current implementation of \CFA expression resolution semantically agrees with Pierce's type-inference system in most of the cases.
99A question remains whether these methods can be adapted to handle \CFA assertion resolution.
Note: See TracBrowser for help on using the repository browser.