Changeset 7ea4073
- Timestamp:
- Aug 19, 2025, 6:01:12 PM (5 weeks ago)
- Branches:
- master
- Children:
- 01e6bd9, a863261
- Parents:
- 8a5aeac
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/modules-alvin/proposal.md
r8a5aeac r7ea4073 1 1 <!-- 2 I fleshed out some problems with strong module theory after working through some examples and writing some code, and got a better idea of how to actually implement my vision. The cynic in me says that I've simply solved many hard problems by slapping "compile-time reflection" on it, but my formalism shows that works. I'm quite happy with the result - its focus on making it possible to export individual top-level declarations provides a really nice logical interface, and many of its concepts are used in other programming languages, so the strategy seems at least reasonable. 3 There are still some aspects that need to be added: initialization order, handling forward declarations of symbols defined outside of modules, handling the full C spec instead of restricting top-level statements, and addressing the additional features that Cforall provides on top of C. In my modules proposal v5 I'll rearrange some of the sections and rewrite some parts in order to make it easier to follow (some sections are outdated, as my ideas changed slightly as I wrote this proposal). This will allow me to better address some of these missing aspects. 2 An iteration/rewrite of the previous module proposal. Moved walkthrough and formalism before concepts. Rewrote major parts in every section (especially intro) to make it easier to approach. Added sections to discuss important topics. 4 3 5 4 Motivation 6 Goal 7 Strategy (high-level) 8 Strategy (implementation details) 9 Handling unboxed types 10 C macros are not exportable 11 Import ordering does not matter 12 Forward declarations are unnecessary 13 Explicit transitive dependencies 14 Exporting inline functions 15 Generating multiple module interfaces 16 Modules use `import` instead of `#include` 17 Forall polymorphism 5 Objectives 6 Overview 18 7 Walkthrough 19 8 Symbols of a module … … 25 14 Providing additional information for importers 26 15 Resolving names for compilation 16 Concepts 17 Handling unboxed types 18 C macros are not exportable 19 Import ordering does not matter 20 Forward declarations are unnecessary 21 Explicit transitive module dependencies 22 Exporting inline functions 23 Generating multiple module interfaces 24 Modules use `import` instead of `#include` 25 Cyclic modules 26 Other kinds of symbols 27 Implementing module namespaces 28 Porting existing C code 29 Handling initialization order 30 Why C++20 modules failed (and why we will succeed) 27 31 --> 28 32 29 # Modules proposal (v 4, strong modules)33 # Modules proposal (v5, reorganize) 30 34 31 35 ## Motivation 32 36 33 Let's walk through the design process of making a C module system. 34 35 First things first: **Why do we need modules?** From a theoretical standpoint, it's superfluous - a module system restricts your creative freedom because it makes you think that organizing code and organizing data is different, even though they use the same concepts. However, in practical scenarios it's useful to have a module system because we care about building things fast, reliably, and spread across multiple teams with varying skill levels. **We want modules because it helps us focus on only the parts of the program we're working on.** 36 37 But **isn't a C translation unit a module?** *(a c translation unit is a `.c` file after preprocessing)* It has `static` and `extern`, but there isn't a good way to analyze which symbols are used where in the program. A symbol defined in `foo.c` can be declared in `foo.h`, but that symbol can also be declared in `unrelated.c`. Calling a C translation unit a module is like saying a struct is well-defined, even if it can only be used if some arbitrary set of functions is called. We don't consider it a module because it doesn't provide a way to control which modules depend on its contents. **We argue that C translation units, with some access control, can be considered a module.** 38 39 ## Goal 40 41 Let's first clarify what we're trying to achieve with our modules. The term "module" is as abused as "polymorphism" in programming languages, which makes it challenging to determine what we're working towards with a module system for C. Let's start: 42 43 * **Make C translation units into modules** (needs some access control on `extern`) 44 * **Generate module interfaces** (so .h files are not necessary) 45 * **Each feature is incremental** (each part of the module system requires minimal changes to existing C code - features should feel optional, functioning more like syntactic sugar) 46 47 Put simply, we want to require minimal changes to the existing C workflow (source code, makefile, programmer thought process, compiler implementation) while providing something similar to Rust modules. There are other aspects of modules that we'll want to cover - initialization order, modules defined across multiple files, modules nested within a file, module friendship - which have their various solutions, but we'll get to them after working with these main 3 goals. 48 49 ## Strategy (high-level) 50 51 From [this article](https://thunderseethe.dev/posts/whats-in-a-module/) (more specifically, [Backpack in Haskell](https://plv.mpi-sws.org/backpack/)), we are introduced to this concept of **weak and strong modules**. A weak module is tied to its dependencies, meaning that swapping out a dependent module for another requires recompilation. Contrast this with a strong module, where we can provide the dependent modules after the module has been compiled. 52 53 For our purposes, the difference is thus: **weak modules are dependent on other modules, while strong modules are dependent on module interfaces.** In order to achieve this, we need modules to control what is exported to other modules, which includes limiting transitive dependencies between modules. This will allow modules to only expose what they own, allowing us to avoid depending on module details. By having modules generate their interfaces, we can separate the parts of C that force modules to be "weak" into the interface generation step, so that the compilation step can work with "strong" modules. 54 55 From a logical perspective, modules can only export symbols that they own, and only use symbols that they import. Some concessions need to be made to support the complexities surrounding unboxed types, but this is limited to what is absolutely necessary. Note that C macros cannot be exported, since they run before module processing. The order of imports does not affect compilation - they can be reordered without changing module behaviour. In fact, forward declarations are unnecessary in these modules - we scan all top-level declarations while determining what to export, so the symbols are already in memory by the time we resolve definitions. Explicit transitive dependencies can be created with the use of `export import` - from the perspective of the importing module, it has the same behaviour as if the exported module were imported separately. To support inline functions, we introduce the distinction between a module being visible to the compiler (in order to expand function definitions) and being visible to an importing module. Since we can generate our own module interfaces, we can generate multiple interfaces, which helps solve the module friendship problem. Modules are imported using `import` instead of `#include`, and a translation unit need not be a module to import modules. The forall keyword is an addition from Cforall to support polymorphism, with polymorphic functions using vtables and a single implementation. This has a straightforward implementation in modules (the module that owns the polymorphic function holds the implementation), though if we wish to support specialization for polymorphic functions (ie. multiple implementations), the module system would need to be updated to support this. More details in the next section. 56 57 ## Strategy (implementation details) 58 59 A module is a collection of symbol definitions, which it can export to other modules. It may also import other modules' symbols, which it uses to define its symbols. There are 3 kinds of symbols in C to consider: variables, functions and types. 60 61 ``` 62 -------------------------------------- 63 | module B; | 64 imports | import A; | exports 65 A::a --------> | export int b = a; | --------> B::b 66 A::aa | export struct aa foobar(); | B::foobar (+ A::aa?) 67 | export struct bb {struct aa foo;}; | B::bb (+ A::aa?) 68 -------------------------------------- 69 ``` 70 71 ### Handling unboxed types 72 73 Above, we note a problem: whereas functions and variables can export only their symbol declarations, we need the definitions of types in order to use them fully. Languages that use boxed types (eg. Java, Go) do not have this restriction, since the caller only needs to reserve space for a pointer (eg. when calling `B.foobar`). In contrast, C and other systems programming languages (eg. C++, Rust) require handling unboxed types, since it demands control over pointer indirection (also, there may not be a heap available to support boxed types). This means we need to know the definitions of types in order to use them, even if it's just to allocate enough space on the stack. In Rust, this is done by compiling an entire crate at a time; in C/C++, this is done by including transitive dependencies in headers. 74 75 We make the following insight: **if we have an oracle that tells us the size and alignment of any type, we can use unboxed types without needing to expose type definitions.** For example, a caller of `B.foobar` need only know that the structure returned has size 32 bytes, aligned to 8 bytes, in order to use it. For type safety, the caller should also know the type returned is `A.aa`, but the caller does not need to know the inner fields of `A.aa`. This means, if `B` does not change, any module that uses `B` does not need to be recompiled unless `A.aa` exceeds its size or alignment allocations. The extra condition that any module that uses `B` needs to be recompiled if `A.aa` exceeding its size or alignment allocations does break the "strong module guarantee" (since `A` is not re-exported, it would ideally not be considered a dependency), but the scope of this has been limited to only what is absolutely necessary to support unboxed types. If the module that uses `B` wants to access the inner fields of `A.aa`, it needs to import `A`. 76 77 How do we actually implement this "oracle"? After analyzing the alternatives, option 2 feels the most "natural", despite its inherent complexity. 78 1. We can perform whole-program analysis to analyze all type-related information, similar to how Rust does it. This allows for maximum expressiveness since this gives us full visibility into the entire program, and can rely on the analyzer to automatically resolve circularly dependent modules (eg. module `A` imports module `B`, which in turn imports module `A`, but in a way that is still resolvable). However, this breaks the principle of separate compilation by accessing the entire program, and raises questions such as "what gets reanalyzed if `A` changes?" 79 2. We can also extract type information into a separate generated module interface. This aligns closer to the principle of separate compilation, though it still requires special analysis to resolve circularly dependent modules (eg. module `A` imports module `B`, which in turn imports module `A`, but in a way that is still resolvable. In order to avoid a circular import error, this requires only importing the symbols from `B` that `A` needs). A criticism is that this does not really resolve the transitive dependency issue; in a certain sense, it's offloading the problem to a compile-time reflection mechanism. This level of compile-time reflection is also non-trivial, poentially requiring significant re-engineering and validation of an existing C compiler in order to implement. Despite these concerns, this strategy seems to align best with general intuition when analyzing an existing codebase. 80 81 ### C macros are not exportable 82 83 **C preprocessing runs first, then the module system generates module interfaces for exporting symbols, then the translation unit is compiled using the generated module interfaces.** The module system runs after preprocessing because we may want to use C macros to generate export definitions. However, this means the module system cannot export C macros, because they have already been evaluated at that point. 84 85 Supporing C macro exports may be possible if we made the preprocessor module-aware. This could be done by augmenting the C preprocessor to check for a `module;` statement or having all module-related statements be `#pragma` directives instead. However, this overcomplicates the preprocessor, making it both hard to develop and demonstrate how to use it. Additionally, certain functionalities such as compile-time reflection are not compatible with the "one pass" nature of the C preprocessor. Put simply, the set of functionalities we wish to support with our module system is complex enough that it warrants being a separate system. 86 87 As it currently stands, one can achieve "exported macros" by simply putting them in a header file, then using `#include`. It could also be argued that the trait system in Cforall makes this kind of metaprogramming unnecessary for most use-cases. That being said, this proposal permits future extensions which provide some form of metaprogramming that executes after the module system. Some ideas for inspiration include: string mixins from D, procedural macros from Rust, and staged functions from Zig. 88 89 ### Import ordering does not matter 90 91 A key problem with `#include` is that textual inclusion is order-dependent - including `a.h` before `b.h` may result in different behaviour than `b.h` before `a.h`. Not only does this cause confusion among developers, it also affects compilation speed - while headers can be precompiled, they must always account for the possibility of a C macro completely changing the meaning of the header file. 92 93 The fact that each import is independent from each other assures developers that reformatting the import list will not break functionality. Additionally, since modules can only export symbols that they own (with the caveat on types), it is clear to the developer what a module is getting from another. 94 95 Since the module system runs after the C preprocessor (and requires a certain formatting of the file), the generated module interfaces can be optimized for maximum compiler efficiency. The potential gains are significant: when the standard library became available to import in C++23 (ie. `import std;`), the time to compile a "Hello World" program essentially halved. This significant reduction in build times likely translates to faster iterations and more productive developers. 96 97 As a point for future development, the generated module interfaces can also be analyzed by a language server in order to provide accurate suggestions to the developer. This may be augmented with AI in order to provide robust code-generation capabilities. 98 99 ### Forward declarations are unnecessary 100 101 Since we scan all top-level declarations while generating module interfaces, forward declarations are unnecessary. In fact, the module system can resolve what would otherwise be incomplete types in C. In the following example, even if we added `struct Other;` after `module;`, `struct Other details;` would still break. However, with our module system, we can resolve this. 102 103 ``` 104 module; 105 export struct Node { 106 int id; 107 struct Other details; 108 }; 109 struct Other { 110 int symbols[max_symbols_supported]; 111 }; 112 const int max_symbols_supported = 100; 113 ``` 114 115 This property of being able to "look ahead" in a file echoes some parallels with classes in object-oriented languages. In those languages, a method's definition can call a method defined later, but still within the same class definition. In fact, modules and objects share many features (eg. abstraction, encapsulation). The main difference is that a module behaves more like a singleton class (as they are actually sections of code), wheras objects can be instantiated. 116 117 ### Explicit transitive dependencies 118 119 There are many cases where development of a module is broken up into parts, yet is often used together. In other cases, some modules are meant to be used in conjunction with another module's symbols. 120 121 ``` 122 // module std 123 module; 124 export import std/vector; 125 export import std/iostream; 126 127 // module wrapper 128 module; 129 export import std/map; 130 export import std/string; 131 export typedef map[string, string] stringmap; 132 133 // module client 134 module; 135 import std; // this is equivalent to import std/vector; 136 // import std/iostream; 137 // import std; // superfluous in this case 138 import wrapper; // this is equivalent to import std/map; 139 // import std/string; 140 // import wrapper; // for stringmap 141 ``` 142 143 In the above case, `std` allows developers to import a common set of functionality without needing to concern themselves with the explicit naming of modules. `wrapper` handles a special case with exporting `typedef`: Since `wrapper` does not own `map` or `string`, it should export the modules that contain them if `client` expects to use the fields of `map[string, string]` (as opposed to simply knowing that `stringmap` is a `map[string, string]`). 144 145 ### Exporting inline functions 146 147 Inline functions require the definition of a function to be available. Similar to the section on "handling unboxed types", we would wish to avoid unwanted transitive dependencies if possible. 148 149 ``` 150 // module inlined 151 module; 152 import supporting; 153 export inline int power(int a, int b) { 154 int ret = 1; 155 for (int i=0; i<b; ++i) ret = multiply(ret, a); 156 return ret; 157 } 158 // module supporting 159 module; 160 import deeper; 161 export int multiply(int a, int b) { 162 int ret = 0; 163 for (int i=0; i<b; ++i) ret = add(ret, a); 164 return ret; 165 } 166 // module deeper 167 module; 168 export int add (int a, int b) {return a + b;} 169 // module client 170 module; 171 export int foobar() {return power(10, 5);} 172 ``` 173 174 In this scenario, to compile `client`, the compiler needs the exported symbols of `inlined` and `supporting`. Note that the compiler does not need `deeper` because `multiply()` is not an inline function. However, `client` cannot access symbols from `supporting` unless it imports it directly. 175 176 If `inlined` has not generated its module interfaces by the time `client` is being compiled, then the presence of a single inlined function would cause all modules imported by `inlined` to need to be traversed, since the module system cannot ascertain which module contains `multiply()`. After the module interfaces are generated (and the modules have not been edited in the meantime), we can use those instead of traversing imports. 177 178 There is another optimization that can be made here: have all exported inline functions generate actual function definitions, so that if importers choose not to expand the function definition, they do not need to regenerate the function definition. It may be tricky to implement if the functionality is not already supported, though this feature is optional. 179 180 ### Generating multiple module interfaces 181 182 A common problem with encapsulation is that certain modules may need certain functionalities from others that should otherwise be private. In object-oriented languages, this is accomplished by designating "friend classes", which get full access to a class' internals. However, this "all or nothing" approach lacks fine-grained control. Another alternative is to present multiple interfaces for importers to choose from, facilitated by the fact that we already generate module interfaces. 183 184 ``` 185 // module multiple 186 module; 187 export int public_function() {return 100;} 188 export(internal) int internal_function() {return 10;} 189 int private_function() {return 1;} 190 // module kernel 191 module; 192 import multiple(internal); // can use public_function, internal_function 193 // module client 194 module; 195 import multiple; // can use public_function 196 ``` 197 198 Here, `internal` is a user-defined tag. We could use `export(aaa)` and `import multiple(aaa)` instead. Additionally, we can attach multiple tags to an export (eg. `export(aaa, internal)`) and import with multiple tags (eg. `import multiple(aaa, internal)`). Any symbol that is exported without a tag is always imported, and a symbol that is exported with tags is imported if the import statement includes one of its tags. 199 200 Another details is that `private_function()` is technically still accessible by accessing its mangled C name directly. If we truly wanted it to be private, we could use `static`. An additional functionality that the module system could provide is to automatically append `static` to these functions. 201 202 ### Modules use `import` instead of `#include` 203 204 A module is defined by having `module;` be the first statement in a source file (somewhat similar to C++20 modules). Internally, modules work like namespaces, implemented by prepending the module name in front of all declared symbols. There are multiple alternatives to determine the module name - we use option 2 for its brevity: 205 1. Have the user define the module names (eg. `module A;`). This is similar to how Java and C++ require specifying packages and namespaces, respectively. This gives the developer some flexibility on naming, as it is not tied to the file system. However, it raises some questions surrounding how module discovery works (if a module imports `A`, where is `A`?). 206 2. Have the module names be defined from a "root directory" (eg. `module;` is module `A` because it is located at `src/A.cfa`, and `src/` is defined as the root directory). This creates import paths that look similar to include paths, allowing us to align more closely with existing C programmers. When searching for an appropriate module, a search is conducted first from the current directory, then we look for an appropriate library (similar to the include path in C). A downside is that this precludes adding nested modules (ie. module definitions within a module file), though nested modules are arguably not that important. 207 208 Another design choice that was made was to have files with the same name as a folder exist outside their folder. For example, module `graph` exists at `src/graph.cfa`, while module `graph/node` exists at `src/graph/node.cfa`. The alternative is to have module `graph` at `src/graph/mod.cfa` - this may be more familiar to some developers, but this complicates module discovery (eg. if there exists a module at `src/graph.cfa` at the same time, which takes precedence? Does `graph` need to `import ../analysis` in order to import the module at `src/analysis`?). Taking insights from Rust's move from `mod.rs` to files with the same name as the folder, we opt to use the more straightforward strategy. 209 210 Since modules prepend their module name in front of all declared symbols, we use `import` instead of `#include` when importing modules (this is also necessary to support some of our compile-time reflection mechanisms, which would be challenging to implement in the preprocessor). This automatically removes the module name from the front of the symbol names. If this causes symbol clashes, they can be resolved using attributes (eg. `A::a` means "symbol `a` within module `A`"). 211 212 This prepending of the module name in front of all symbols within a module can result in undesirable behaviour if we use `#include` within a module, as all of its contents will be prepended with the module name. To resolve this, we introduce extern blocks, which escape the module prefixing (eg. `extern { #include <stdio.h> }`, though with a newline after the `>`). 213 214 By having modules prepend their names to their symbols, we can add functionality to perform a "unity build", as every symbol can be disambiguated. This would allow us to balance a "development configuration" with the benefits of modularization, alongside a "release configuration" with maximum optimizations. 215 216 ### Forall polymorphism 217 218 The forall keyword is an addition to Cforall to support polymorphism, with polymorphic functions using vtables and a single implementation. If a module exports a forall statement, the module owns the polymorphic function implementations, while the polymorphic function declarations are exported (if these were declared inline, the definition could be exported, similar to C++20 modules). Polymorphic types are instantiated from the caller's side, so their definitions are exported. This may present problems, but currently I am not familiar enough with Cforall to judge. 219 220 An interesting case is to consider if Cforall could be updated to perform specialization (multiple implementations for a single function) in addition to the single implementation strategy. This would be similar to Rust's `impl` vs `dyn` traits. To support this, the module system would need to be updated, as we would want the multiple implementations to exist within the module that owns the forall statement. 37 This proposal adds a module system to Cforall, an extension to the C programming language. However, most of the information discussed can be used to create a module system in C or another system programming language. 38 39 **Why do we need modules?** It's worth noting that not all programming languages have modules (C being the obvious example here). In truth, modules don't tend to increase a language's expressive power, as its features can usually be emulated by other code structures. Where modules become really useful is when managing a codebase with multiple collaborators. Modules have the practical benefit of helping us focus only on the parts of the program we're working on. They also help us avoid accidentally relying on implementation details by restricting what can be accessed by other code, thereby keeping a codebase easier to extend and maintain. 40 41 But **isn't a C translation unit a module?** (a c translation unit is a `.c` file after preprocessing) It has `static` and `extern`, but there isn't a good way to analyze which symbols are used where in the program. A symbol defined in `foo.c` can be declared in `foo.h`, but that symbol can also be declared in `unrelated.c`. Calling a C translation unit a module is like saying a struct is well-defined, even if it can only be used if some arbitrary set of functions is called. We don't consider it a module because it doesn't provide a way to control which modules use its symbols. **We argue that C translation units, with some access control, can be considered a module.** 42 43 *Some paragraphs of this proposal are italicized - this is to indicate it is specific to Cforall, or my working thoughts. The rest of this proposal is applicable to systems programming languages in general.* 44 45 ## Objectives 46 47 Let's clarify what we're trying to achieve with our modules. The term "module" is as abused as "polymorphism" in programming languages, which makes it challenging to determine what we're working towards with a module system for C. Here are our 3 main objectives: 48 49 1. **Make C translation units into modules** 50 2. **Remove the need for .h files** 51 3. **Modules should "own" their contents** 52 53 First, to make C translation units into modules, our proposed modules should not require fundamental architectural changes to an existing C project in order to use it. Crucially, there needs to be a way to represent forward declarations of other modules' contents. 54 55 Second, most modern languages don't require an additioanl file just to link symbols between files (Object Oriented languages have interfaces, but they are not required). We would like developers to only need to declare a symbol once, and leave symbol discovery to the module system. 56 57 Third, it should not be confusing as to which module's symbols are visible at a given time, because a module's symbols should only be visible if said module allows them to be. Ideally, such symbols are only visible if said module exports them and a module imports said module (we will find there are special cases where we need to leak some information). 58 59 Our key guiding principle which will help us achieve these goals is to **decouple symbol exports from each other**. This means that it should be possible to export a symbol's implementation details without leaking the implementation details of any other symbol. This allows our module system to have full control over what symbols are visible within a module, which enables us to perform our advanced analysis. 60 61 Other topics that will be dicussed include module friendship, backwards compatibility and interactions with Cforall features. Some topics are highly complex and are pulled into separate sections, such as handling initialization order and importing existing standard libraries. 62 63 ## Overview 64 65 A key guiding principle behind this proposal is the idea that symbol exports are decoupled. This means that exporting a symbol should only provide enough functionality for the importer to use it, and nothing more. We try to take this even further: **only the statement in which a symbol is defined is allowed to export its implementation details,** as opposed to having one export implicitly exporting another's details. This means importers only get what they need (and nothing more) and exporters have full control over who has access to their details. To explain by example: 66 67 ``` 68 // module A 69 module; 70 export struct Inner {int a; int b;}; 71 72 // module B 73 module; 74 import A; 75 export struct Outer {int i; struct Inner f1;}; 76 export struct Outer constructOuter() { /* implementation omitted */ } 77 78 // module C 79 module; 80 import B; 81 export struct Outer o = constructOuter(); 82 int x1 = o.i; 83 struct Inner x2 = o.f1; 84 int x3 = o.f1.a; // error, must import definition of struct Inner to use 85 ``` 86 87 Module C can access the fields of `struct Outer`, but it cannot access the fields of `struct Inner` because it did not import module A. Note that module C can access the name `struct Inner`, because the type system requires the type of `x2` to be written out. Additionally, any module that imports module C does not receive the implementation details of `struct Outer` or `struct Inner`. 88 89 The fact that importing an individual symbol is well-defined here forms the basis of our module system. This is because it allows us to reason about individual symbols, without having to consider everything within a module. This greatly simplifies our proofs of correctness. This also makes it trivial to get order-agnostic imports, enables us to make modules cyclic and allows us to remove the need for forward declarations. In our setup, a module file is simply a convenient way to manage symbol scope and package symbols to be exported together. 90 91 This module system currently allows 3 kinds of symbols to be exported: 92 * Types (ie. `struct` and `typedef` statements) 93 * Functions 94 * Variables (ie. global variables) 95 96 As shown above, sometimes type names need to be visible even if they are not explicitly imported, in order to enable full functionality of other symbols. However, the type must be imported in order to access a type's fields. As for functions and variables, they must be imported in order to be visible or used. Importing an inline function should appear no different than a non-inline function - the only difference being more compilation dependencies and more information available to the optimizer. 97 98 Note that there is a distinction between "visible to the module" and "visible to the optimizer" that is different from regular C: Just because something is provided to the optimizer, doesn't mean the code in a module can access it. In contrast, regular C code would be like putting `export` on every exportable symbol and import statement (ie. transitive module dependency). In this proposal, "visible" refers to "visible to the module". 99 100 *This module system was in part inspired by the concept of weak and strong modules as described by [this article](https://thunderseethe.dev/posts/whats-in-a-module/) and the paper [Backpack: Retrofitting Haskell with Interfaces](https://plv.mpi-sws.org/backpack/). Put roughly, weak modules are dependent on other modules, while strong modules are dependent on module interfaces. Unfortunately, the concept of what is an interface and what is not is unclear in C (where header-only libraries exist), and certain aspects of systems programming require exposing certain details that would otherwise be hidden in other languages (eg. unboxed types). After multiple attempts of trying to figure out the key principle behind the distinction (as well as figure out if it can be implemented), I eventually reached the key insight of decoupling symbol exports.* 221 101 222 102 ## Walkthrough 223 103 224 A module holds a collection of symbols to be exported. What we're concerned with is how the importing module can use these symbols. In order to answer this, we will analyze:104 A module holds a collection of symbols to be exported. A module also uses imports to control what symbols are visible and how those symbols can be used. We will analyze: 225 105 * What are the kinds of symbols and what does the importing module see? 226 106 * How does the importing module disambiguate between symbols? 227 107 228 Our guiding principle behind our modules is that, as much as is possible, **every symbol definition should have full control over who can see it.** This means that you cannot declare a function or struct that is defined in another module unless that module exports it and you import said module (of course, it may be accessible using `extern "C"`). We will see certain cases where types and inline functions break this abstraction somewhat, but we try to limit this "information leakage" as much as we can. A benefit of this is that our inductive proof usually only needs to reason using the previous step, as every symbol is logically isolated from each other.108 As stated in the overview, we wish to enforce that **only the statement in which a symbol is defined is allowed to export its implementation details.** This means that you cannot use a function or struct that is defined in another module unless that module exports it and you import said module (of course, it may be accessible using `extern "C"`). We will see certain cases where C forces this encapsulation to be somewhat broken (eg. types and inline functions), but we try to limit this "information leakage" as much as we can. 229 109 230 110 ### Symbols of a module 231 111 232 Modules in our modules are either types, functions or variables. While types can be defined in the same line as a variable (eg. `struct {int i;} name = {5};`), we will make the simplifying assumption right now that this does not happen (in the example, both `name` and the anonymous struct would be exported together).112 Symbol definitions in our modules (ie. the symbols the module can export) are either types, functions or variables. While types can be defined in the same line as a variable (eg. `export struct {int i;} name = {5};`), we will make the simplifying assumption right now that this does not happen (in the example, both `name` and the anonymous struct would be exported together). 233 113 234 114 #### Types 235 115 236 There are 3 kinds of types: builtins (eg. `int`), user-defined (ie. `struct`) and aliases (ie. `typedef`). We defer discussion on pointers, references and arrays until afterwards.237 238 Built-ins are provided by the compiler to every module, so we don't need to consider them. We can simplify aliases as a `struct` with one field, which has an implicit field access.239 240 So `struct` is really the only type we need to concern ourselves with. Based on our guiding principle, if we export a `struct` that has a `struct` field which is not itself, then it should not export any information related to that field's type. The importer will be able to access all of its fields, but the `struct` field is essentially an opaque type (if a module also wants access to that, it needs to import that `struct` too).241 242 Unfortunately, this isn't technically possible. First, we have unboxed types (boxed types are implemented as a pointer to the actual object). This means we need to know the size and alignment of every field, so we know how large the struct is and the offsets of each field. Second, when a user initializes a variable with a field, they need to be able to write down the type name (eg. `struct Exported e; struct Field f = e.f1;`). So we need to know the size, alignment and name of a field (in the previous example, move/copy constructors may also need to be considered, but in C we simply copy bytes). A similar thing happens when returning types from functions (eg. `struct Unexported foo();`), which we will discuss later.243 244 How do we obtain this size/alignment/name information? We use compile-time reflection to crawl import modules as needed . Note that this needs to handle circular imports, as it should only fail if we encounter a type name ambiguity (eg. we import module `A` and `B` which both export `struct S`, and we need a `struct S`) or the type has infinite size (eg. `struct A {struct B b;}; struct B {struct A a; int i;};`). This means the number of modules that need to be analyzed in order to determine the size/alignment of a `struct` is only bounded by the size of the codebase, but in practice this should not happen often.245 246 Pointers and references are boxed values, so size/alignment information of the underlying type is not gathered. However, we still require modules to import the symbol in order to handle name disambiguation. Arrays may be parameterized by an integer whose value must be immediately available. This is key, otherwise we can end up with cases such as `struct S {struct T t[sizeof(struct U)];};`, which may require fixpoint analysis. In our case, we can simply leverage compile-time reflection in the same manner as we do with field structs.116 We'll consider 3 kinds of types: builtins (eg. `int`), user-defined `struct` types, and aliases (ie. `typedef`). We defer discussion on pointers, references and arrays until the end of this section. We'll consider other user-defined types (eg. `union`) in Other kinds of symbols section. 117 118 Built-ins are provided by the compiler to every module, so we don't need to consider them. We treat aliases like a `struct` with one field, which is implicitly accessed, allowing us to only consider the `struct` case (note that exporting a `typedef` without exporting the underlying type produces different behaviour than what one may be used to). 119 120 Taking the most idealized version of our guiding principles, if we export `struct A` that has a `struct B` field (ie. not itself), then we should not export any information related to `struct B`. The importer will be able to access all fields in any `struct A` variable, but the `struct B` field is essentially an opaque type (if a module also wants to access its fields, it needs to import `struct B` too). 121 122 Unfortunately, this runs into problems. First, we have unboxed types (boxed types are implemented as a pointer to the actual object). This means we need to know the size and alignment of every field, so we know how large the struct is and the offsets of each field. Second, a very common pattern (so for practical purposes, we cannot ban it) is to initialize a variable with a field, which requires writing down the type name (eg. `struct Exported e; struct Field f = e.f1;`). So importers need to know the size, alignment and name of a field (we also need to consider move/copy constructors for Cforall types, but in C we simply copy bytes). A similar problem occurs when returning types from functions (eg. `export struct Unexported foo();`), which we will discuss in the Functions section. 123 124 How do we obtain this size/alignment/name information? We use compile-time reflection to crawl import modules as needed (see Formalism section for details and a proof of correctness). Our technique handles circular imports, as it should only fail if we encounter a type name ambiguity (eg. we import module `A` and `B` which both export `struct S`, and we need a `struct S`) or the type is infinitely recursive (eg. `struct A {struct B b;}; struct B {struct A a;};`). This means we could end up analyzing the entire codebase to determine a the size/alignment of a `struct`, but in practice this should not happen often (regular C also requires the type definitions of every field, so we are arguably not much worse). 125 126 Pointers and references are boxed values, so we know its size/alignment (so no need to analyze the underlying type). However, we still require modules to import the symbol in order to handle name disambiguation. Arrays may be parameterized by an integer whose value must be immediately available. This is important, as otherwise we can end up with cases such as `struct S {struct T t[sizeof(struct U)];};`, which may be difficult to resolve. In our case, we can simply leverage compile-time reflection to obtain the integer value, similar to the way we resolve field types. 247 127 248 128 #### Functions … … 250 130 Functions have the form `returnType name(parameterType, ...numberOfParameters) {...body}`. In order for importers to use this function, we need to provide the size, alignment and name of any types in its declaration. We use compile-time reflection to resolve the details (see Types section for details). 251 131 252 If we are exporting an inline function, from the compiler's perspective the function body is also exported. This may also usecompile-time reflection in order to resolve the symbols within the function body (which is done within the context of the containing module, not the importing module).253 254 Cforall introduces a keyword called `forall`, which can be used to create polymorphic functions. At the moment, these functions do not specialize - there is only one instance of this function, and callers need to pass a vtable alongside any polymorphic type parameters, similar to how the `dyn` trait in Rust works. This means that trait information must be provided to the importing module, which unfortunately breaks our guiding principle somewhat. Within a `forall` block we can also define polymorphic types, which can be used as the return value of a polymorphic function. In that case we need size/align/name information, and the first two are currently handled by calling a "layout function" at runtime (this could possibly be moved to happening at compile-time, but isn't at the moment). As such, we need provide the layout function to the importing module, which unfortunately breaks our guiding principle too. So polymorphic functions can be made to work with our system, but they break our guiding principles to some extent. 132 If we are exporting an inline function, we also need to provide the function body to the compiler/optimizer (though from the perspective of the module, symbol visibility is the same whether inline or non-inline). This also uses compile-time reflection in order to resolve the symbols within the function body (which is done within the context of the containing module, not the importing module). 133 134 *Cforall introduces a keyword called `forall`, which can be used to create polymorphic functions. This is currently implemented using a technique called "dictionary passing", similar to Haskell or Ocaml. This means there is only one instance of the polymorphic function, and callers need to pass any trait information as additional (hidden) parameters. This means we must provide trait information to the importer. Within a `forall` block we can also define polymorphic types, which can be used as the return value of a polymorphic function. In that case we need size/align/name information, and the first two are currently handled by calling a "layout function" at runtime (this could possibly be moved to happening at compile-time, but I don't believe it is). As such, we also need to provide the layout function to the importing module. So polymorphic functions can be made to work with our system, but we may need to relax some aspects of our guiding principles.* 255 135 256 136 #### Variables 257 137 258 Variables have the form `variableType name = expression;`. We make the simplifying assumption that only one name is defined at a time (ie. no `int a, b, c;`). An importing module needs to provide the size, alignment and name of `variableType` in its declaration to any importing modules. We use compile-time reflection to resolve the details(see Types section for details).259 260 If the variable is actually a polymorphic type (see Function section for details), I'm actually not sure how this works. The layout function can only be called at runtime, but you need the information in order to allocate the appropriate amount of space in the `.data` section. More analysis is needed here. 138 Variables have the form `variableType name = expression;`. We make the simplifying assumption that only one name is defined at a time (ie. no `int a, b, c;`). Similar to types and functions, the exporting module needs to also provide the size, alignment and name of `variableType` to any importing modules, which is handled by compile-time reflection (see Types section for details). 139 140 *If the variable is actually a polymorphic type (see Function section for details), I'm actually not sure how this works. The layout function can only be called at runtime, but you need the information in order to allocate the appropriate amount of space in the `.data` section. More analysis is needed here.* 261 141 262 142 ### How importing modules use symbols 263 143 264 The previous section notes compile-time reflection as the solution to many of our challenges with modules. Here we will explain how symbol names are resolved to the correct modules, allowing us to define how our compile-time reflection crawls through modules in order to determine the information it requires.144 The previous section notes compile-time reflection as the solution to many of our challenges with modules. Here we will explain how symbol names are resolved to the correct modules, helping us define how our compile-time reflection crawls through modules. See Formalism section for a more rigorous proof of correctness. 265 145 266 146 #### Gathering symbols … … 302 182 Continuing with our example, we do the same top-level analysis with the imported modules. 303 183 304 If `data/minheap` has `export import minheap/functions;` defined, then it is as if `data/graph/node` has `import ../minheap/functions;` defined. This can keep going - if `data/minheap/functions` has `export import`, it is also added to `data/graph/node`. After all `export import` have been resolved, we can look at the top-level declarations of every imported module in order to determine what symbols are visible to `data/graph/node`. 305 306 Now, we can bind each symbol within `data/graph/node` to the correct module. The details of how multiple symbols are disambiguated are described in the Overload resolution section. 307 308 In addition to name disambiguation, some symbols need additional information in order to be useable by importers. For example, size/alignment information of types, function bodies for inline functions, and trait information for polymorphic functions. This information is obtained by performing the above steps, but on the importing module instead of `data/graph/node`. 309 310 This recursive task raises the problem of circular imports: What if we recurse back to `data/graph/node` (or any module that creates a cycle)? As long as we are analyzing different symbols inside the circularly imported module, this should not pose a problem. This leaves us with handling the problem where we circle back to the same symbol. For size/alignment analysis, coming back to the same type means that said type contains itself, which for our purposes is not resolvable/infinite size. If an inline function calls other inline functions that mutually recurse with itself, we can either produce a forward declaration of the function within the underlying C implementation (Cforall compiles down to C), or make mutual recursion be a regular function call. The first option is ideal, but if this is not supported in the C spec, we can fall back to the second option. For trait information, we could emit a warning, but if a trait includes itself, that can be safely ignored (a trait is a collection of conditions, it contains itself by definition). Since we can handle all of our circular problems, our system is well-defined here. 311 312 #### Overload resolution 313 314 A key aspect about our modules is that imported symbols are used in the same way as they are defined - no namespacing needed (eg. `struct Edge` in the provided example). While this raises the potential for name collisions, we argue that with the right compiler tools and the ability to disambiguate between different modules (eg. `A::a` means "symbol `a` within module `A`"), this can be a reasonable tradeoff (unfortunately, much of the tooling here is out of scope for this proposal). 315 316 The reason behind this decision comes from 2 places: First, if we add namespacing to imports, then when we have an imported function that returns a type defined in another module, we would be forced to write the module name (the alternatives don't fare much better in terms of brevity and readability). Second, a core feature of Cforall is its advanced overloading system (eg. allowing multiple variables with the same name but different types to coexist). Adding namespaces would likely limit our ability to research and take advantage of this feature. 317 318 This means we have both symbols defined within our module and imported symbols participating in overload resolution. This requires extending our existing overload resolution algorithm to handle multiple types with the same name (currently we handle multiple functions and variables with the same name, but not types). This also requires refactoring the overload resolution algorithm to hold the full name of a symbol (symbol name + module name). Finally, if overload resolution favours two options equally, it should favour symbols defined within the given module (eg. `struct MinHeap` in the provided example). Note that this will only happen if the alternative is strictly better, otherwise it is ambiguous. 184 If `data/minheap` has `export import minheap/functions;` defined, then would be as if `data/graph/node` has `import ../minheap/functions;` defined. This can keep going - if `data/minheap/functions` has `export import`, it is also added to `data/graph/node`. After we run out of `export import` statements to resolve, we can look at the top-level declarations of every imported module in order to determine what symbols are visible to `data/graph/node`. 185 186 After gathering the symbols in the imported modules, we can proceed to bind each symbol within `data/graph/node` to the correct module. The details of how multiple symbols are disambiguated are described in the Resolving symbols section. 187 188 In addition to name disambiguation, some symbols need additional information in order to be useable by importers. For example, size/alignment information of types, function bodies for inline functions, and trait information for polymorphic functions. This information is obtained by resolving the symbols on any imported module, and so on, as necessary. 189 190 This task is recursive, which raises the problem of circular imports: What if we recurse back to `data/graph/node` (or any module that creates a cycle)? Since we reason at the level of symbol definitions, as long as we are analyzing different symbols inside the circularly imported module, we don't actually have a cycle. This leaves us with handling the problem where we circle back to the same symbol. For size/alignment analysis, coming back to the same type means that said type contains itself, which for our purposes is not resolvable (emit error and stop). If an inline function calls other inline function that mutually recurses with itself, we produce a forward declaration of the inline function within the underlying C code (Cforall compiles down to C). For trait information, a trait works like a collection of conditions, which means it includes itself, which means we can safely ignore circular references (we may want to emit a warning though). Since we can handle all of our circular problems, our system is well-defined here. 191 192 #### Resolving symbols 193 194 A key aspect about our modules is that imported symbols are used in the same way as they are defined - no namespacing needed (eg. `struct Edge` in the provided example). While this raises the potential for name collisions, we argue that the potential benefits outweigh the limitations (and many of the limitations can be mitigated with additional features). 195 196 *To provide additional context, a core feature of Cforall is its advanced overloading system (eg. allowing multiple variables with the same name but different types to coexist). Adding namespaces would limit our ability to research and take advantage of this feature.* 197 198 When resolving symbols, symbols defined within our module have precedence over imported symbols. If we want to resolve to a specific imported symbol, we use the `::` operator (eg. `struct a::S` or `struct "a"::S`). The module name on the left side of the `::` operator is looked up using the same mechanism as import statements (note that the module being looked up must have been imported, either directly or through `export import`). 199 200 *In Cforall, the addition of the overload system means that symbols defined within our module system only have preference over imported symbols, instead of completely shadowing them. Here, our module system provides a list of candidates for each symbol to the overload resolution system. This will require refactoring the overload resolution algorithm to hold the full name of a symbol (symbol name + module name), and favour symbols defined within the given module (eg. `struct MinHeap` in the provided example).* 319 201 320 202 ## Formalism … … 322 204 Taking the Walkthrough section a step further, we will provide a rough sketch of a formal argument that our module system is well-defined. 323 205 324 We'll first describe the layout of our module system by providing a rough description of the file structure and grammar we are working with. Then we describe what we can grab from each module file separately. By combining this information with those from other modules, we can determine which module's symbol is being referred to in each exported declaration. Note that for some symbols to be properly used by importers, additional information like size/alignment information will be necessary, which we will show is possible to obtain. Finally, with this information, we can resolve every symbol within a function body or expression, thereby completing the compilation process.206 We'll first describe the layout of our module system by providing a rough description of the file structure and grammar we are working with. Then we describe what we can grab from each module file separately. By combining this information with those from other modules, we can resolve which module's symbol is being used in each exported declaration. Note that for some symbols to be properly used by importers, additional information such as size/alignment information will be necessary, which we will show is possible to obtain. Finally, we can resolve every symbol within a function body or expression, thereby completing the compilation process. 325 207 326 208 ### Layout of the codebase 327 209 328 In order to provide a formal argument that our module system is sound, we need to first describe the format of the codebase we are trying to analyze. 210 In order to provide a formal argument that our module system is sound, we need to first describe the format of the codebase we are trying to analyze. We restrict the syntax of our module files significantly in order to simplify our proof. 329 211 330 212 The file system is set up so that imports are relative to the current directory, or found in a library. For example, a module file `data/graph.cfa` trying to access `data/graph/node.cfa` would write `import graph/node;`. If such a file doesn't exist, we will look for `graph/node` within the provided libraries (starting from the root directory of each library). 331 213 332 Module files have `module;` as their first statement. Import statements are written as `import filePath;`, where the file path can be escaped with double-quotes if necessary. Only top-level statements can be exported, which is done by placing `export` at the beginning (eg. `export const int i = 3;`). The available top-level statements are: 333 * Types: `struct` and `typedef` statements. Only one `struct` can be defined per top-level statement (eg. no `typedef struct {int i;} MyStruct;`). 214 Module files have `module;` as their first statement. Import statements are written as `import filePath;`, where the file path can be escaped with double-quotes if necessary. Only top-level statements can be exported, done by having `export` as the first keyword (eg. `export const int i = 3;`). The available top-level statements are: 215 * Imports: Format is `import path/to/file;` or `import "path to/file";` (double quotes for escaping characters). 216 * Types: `struct` and `typedef` statements. Only one `struct`/`typedef` can be defined per top-level statement (eg. no `typedef struct {int i;} MyStruct;`). If we parameterize an array with an integer, it must either be a constant or a single variable. 334 217 * Functions: Format is `returnType name(parameterType ...numberOfParameters) {...functionBody}`. 335 218 * Variables: Format is `variableType name = initializerExpression;`. Initializer may be omitted. Only one variable can be declared per top-level statement (eg. no `int a, b;`). 336 219 337 The syntax of function bodies and initializer expressions can mostly remain unrestricted, as compilation is outside of the module system's purview (we provide the symbols and sufficient information on how to use them to importers). Some aspects we need to note are: 338 * No user-defined copy/move constructors: In order to handle `func(s.f1)` (where `void func(struct Field);`), we need to be able to create a copy of `struct Field` without knowing its fields (in the example, the importer knows `func` and the type of `s`, but does not need to know the details of `struct Field`). Having user-defined copy/move constructors can be done, but we opt to avoid it here. 339 * `::` operator: Sometimes we need to disambiguate which module a symbol belongs to. `struct InBothModules foo();` can either use `struct InBothModules;` from `import a;` or `import b;`. Trying to resolve this by analyzing the function body of `foo()` may be possible, but would likely be very challenging to implement. To resolve this, we can write `struct a::inBothModules foo();` instead. 220 The syntax of function bodies and initializer expressions can mostly remain unrestricted, as the module system provides name matches to them, rather than the module system being affected by anything within. One notable change is the `::` operator, which allows us to disambiguate which module a symbol belongs to. For example, if `struct S` is exported from module `a` and `b` and we import from both, we can write `struct a::S` or `struct "a"::S` to disambiguate. 340 221 341 222 ### Individual module information … … 343 224 For every module file, we can collect the import list and the top-level symbols, without needing to consult any other files. The top-level symbols include types, functions and variables. 344 225 345 Top-level statements, at least the way we have it set up here, avoid any context-sensitive sections (eg. `MyType * a;` c an be multiplication or a variable). This means we have no issues parsing all symbols at the top-level (function bodies and initializer expressions are context-sensitive, but we don't need to deal with them here). This means that for each module file, we can grab an import list, a type list, a function list and a (global) variable list, as well as determine which are exported.226 Top-level statements, at least the way we have it set up here, avoid any context-sensitive sections (eg. `MyType * a;` could be interpreted as an expression, but we don't allow those as a top-level statement). This means we have no issues parsing all symbols at the top-level (function bodies and initializer expressions are context-sensitive, but at this stage we don't need to look inside them). This means that for each module file, we can grab an import list, a type list, a function list and a (global) variable list, as well as determine which are exported. 346 227 347 228 ### Resolving names of exported declarations 348 229 349 Now we wish to resolve every symbol in any module's top-level declaration (this does not include the function bodies or initializer expressions). In order to do this, we first collect a module's full import list, then all of its imported symbols, before finally resolving our symbols. 350 351 In order to determine a module's full import list, we first look at the individual module information of each imported module. We initially look for the imported module relative to the current directory of the module that the import statement is in - if it does not exist, it will look starting at the root directory of any provided libraries (if we can't find the module, we error out). If any imported module has `export import moduleName;` (or some other module name), then we look at those too. If we end up importing any modules we have already imported (or the module we are getting the full import list of), we can ignore those. Since there are a finite number of modules in a codebase (plus any provided libraries) and we don't analyze a module twice, this search will terminate with a full import list. 352 353 To collect all the imported symbols, we take every module in the full import list and we grab all of the top-level statements that have `export` as their first keyword. This gives us a imported type list, a imported function list and a full variable list. 354 355 Now that we have a list of imported symbols as well as the module's own, we can start resolving symbols in the top-level declarations. Since our grammar is not context-sensitive outside of function bodies and initializer expressions, there is no ambiguity determine whether a type or variable is needed. To resolve names, we first check to see if it uses the `::` operator - if so, then we look using the same strategy when looking for imported modules (if the module isn't in the full import list, we error out). Then we look in the module's own top-level symbols to see if we have a type/variable that matches the symbol's name (this means a module's symbols shadow its imported symbols). If we don't find a match, then we look in the list of imported symbols. If the list of imported symbols has more than one match, we error out due to ambiguity (it may be possible to use overload resolution here, but for now we'll prompt the user to use a `::` operator). 230 Now we wish to resolve every symbol in any module's top-level declaration (this does not include the function bodies or initializer expressions). We achieve this with the following steps: 231 1. Collect a module's full import list 232 2. Use the full import list to collect all imported symbols 233 3. Use all imported symbols to resolve our symbols 234 235 In order to determine a module's full import list, we look at the individual module information of each imported module. To find the imported module file, we initially look for it relative to the current directory of the module that the import statement is in - if the module file does not exist, we look starting at the root directory of any provided libraries (if we can't find the module, we error out). If any imported module has `export import moduleName;` (or a name other than `moduleName`), then we recursively examine those too. If we end up importing any modules we have already imported (or the module we are getting the full import list of), we can ignore those. Since there are a finite number of modules in a codebase (plus any provided libraries) and we don't analyze a module twice, this search will terminate with a full import list. 236 237 To collect all the imported symbols, we take every module in the full import list and we grab all of the top-level statements that have `export` as their first keyword. This gives us an imported type list, an imported function list and an imported variable list. 238 239 Now that we have a list of imported symbols as well as the module's own, we can start resolving symbols in the top-level declarations. Since we are not considering function bodies or initializer expressions (and our top-level statements are simple), we do not have ambiguity on whether a type, function or variable is needed. To resolve names, we first check to see if it uses the `::` operator - if so, then we use the same strategy as when resolving import statements (also, we error out if the module isn't in the full import list). Otherwise we look in the module's own top-level symbols to see if we have a match (this means a module's symbols shadow its imported symbols). If we don't find a match, then we look in the list of imported symbols. If the list of imported symbols has more than one match, we error out due to ambiguity (print diagnostic telling user the options and suggesting the use of `::`). 356 240 357 241 Something to note about this setup is that forward declarations are neither necessary nor allowed, because all visible symbols are known before attempting to resolve names. … … 359 243 ### Providing additional information for importers 360 244 361 Unfortunately, in order to use some of these top-level symbols, simply resolving the names within top-level declarations (ie. not including function bodies or initializer expressions) is not enough for an importer to be able to use it. For example, if a function returns a `struct`, we need to at least know the size/ alignment of the `struct` in order to use it. 362 363 Type names are found within the definitions of other types, the return and parameters of a function, and the type of a variable. In order to fully specify a type, we must at least know the size/alignment of its fields. To use a function, we also need to know size/alignment of return and parameter types. To use a variable, the size/alignment of its type is not technically required, but to keep its semantics consistent with that of field access, we require it (eg. `struct Unimported x = s.f1; struct Unimported x1 = importedVariable;`). 364 365 For functions and variables, we first resolve the type (for types, we move to the next step). Then we look at the fields of that type and resolve those types (pointers don't need to be resolved because pointers have a defined size/alignment). We keep recursing until we resolve everything or find a cycle (if we have a cycle, we error out), and this terminates because there are a finite number of symbols. Additionally, types may be arrays, which are parameterized by an integer (if we resolve to something that is not an integer, we error out). The C spec requires that this integer's value must be constant and immediately calculable, which means `char padding[64-sizeof(int)]` is fine but not `char padding[64-sizeof(struct U)];` (this functionality can be supported and may be useful for cache-aware programming, but we will not support this for now). This means we can fully resolve everything within a well-defined type, and therefore collect its size/alignment. 366 367 While the importing module doesn't change its behaviour whether a function is inline or not, the compiler must see the function body, which means it must be included in the file sent to the compiler. In order to resolve the inline function body, then from the perspective of the module that contains the inline function, we must collect the full import list, then all of the imported symbols. If any of those imported symbols are inline functions, then we need to keep recursing. If we find a mututally recursive inline function, we would emit a forward declaration (eg. `inline float a();`). Note that the module that owns the inline function must use `extern` (eg. `extern inline float a() {return 5;}`) in order to ensure a definition is emitted. Since there are a finite number of modules, this will terminate. Note that if an imported module contains an inline function, we do not provide the function definitions of its non-inline functions - this means the compiler cannot inadvertently expand the definition of a function it is not meant to have access to. 368 369 Cforall introduces the `forall` keyword, which allows creating polymorphic functions and types. I am not completely familiar with the internal implementation of these (hence it isn't actually a part of this formalism), but I know they use a single implementation, with callers passing in a vtable for each concrete instance of a polymorphic type (similar to Rust's `dyn` trait). This means that we need to provide the traits of each polymorphic type in a polymorphic function in order for an importer to use it. This analysis follows a similar logic to the size/alignment of types, except that we're working to define a trait rather than a type. As such, if a trait mutually includes another trait, we ignore the circular reference. I am unsure if resolving traits requires resolving more than the symbol names of each type - if necessary we use the same technique as size/alignment analysis. 245 Unfortunately, in order to use some of these top-level symbols, the importer sometimes needs more than the names within top-level declarations (ie. not including function bodies or initializer expressions) in order for the compiler to process it. For example, if a function returns a `struct`, we need to know the size/alignment of the `struct` in order to use it. 246 247 Type names are found within the definitions of other types, the return and parameters of a function, and the type of a variable. In order to use a type, we must know the size/alignment of its fields. Similar problem with functions and their return and parameter types. With variables, size/alignment information is technically not needed, but to keep its semantics consistent with everything else, we require it (eg. `struct Unimported x = s.f1; struct Unimported x1 = importedVariable;`). 248 249 In order to figure out size/alignment of types, we first resolve its name. Pointers, references and built-in types have a defined size/alignment, and don't need to be analyzed further. However, an array type needs its element type analyzed further, as well as its integer parameter (our formalism only allows a simple number, but we could extend this in the future). That leaves `struct` or `typedef`, which is determined by looking at the types of its fields or aliased type. This process keeps recursing until we resolve everything or find a cycle (if we have a cycle, we error out). This terminates because there are a finite number of symbols. This means we can obtain the size/alignment of any well-formed type. 250 251 While the importing module doesn't change its behaviour whether a function is inline or not, the compiler/optimizer must see the function body. This means the module system must resolve all symbols that the inline function body can access. For example, if we import a module that contains an inline function, we need to resolve all symbols that that module can see. And if any of its imported symbols are inline functions, then we need to keep recursing. If we find a mututally recursive inline function, we would emit a forward declaration (eg. `inline float a();`). Note that the module that owns the inline function must use `extern` when compiling it (eg. `extern inline float a() {return 5;}`), to ensure a definition is emitted in the assembly. This terminates because there are a finite number of modules. 252 253 *While the technique used for importing inline functions is sound, it can be optimized (we can resolve symbols within the inline function body first, instead of immediately expanding every import). Here, a single imported inline function forces the containing module to expose its contents to the optimizer. It's as if every import statement within the containing module was converted into `export import`. While our module system ensures these `export import` changes are not visible to the module importing the inline function, it is visible to the optimizer. My major question is: if I add an unused exported inline function to any module, will the additional information provided to the optimizer affect code performance significantly? My intuition says no, since none of it should be accessed by the module that imported the inline function, though I would need to double check.* 254 255 *Cforall introduces the `forall` keyword, which allows creating polymorphic functions and types. The polymorphic functions are currently implemented using a technique called "dictionary passing", similar to Haskell or Ocaml. This means there is only one instance of the polymorphic function, and callers need to pass any trait information as additional (hidden) parameters to it. This means that we need to provide the trait information of each polymorphic argument to the importer. In order to obtain this trait information, we could follow a similar technique to obtaining the size/alignment of types, except that we're working to define a trait rather than a type. As such, if a trait mutually includes another trait, we can ignore the circular reference. I have also heard that there are "layout functions" that need to be provided for polymorphic types (since they can be of different sizes/alignments), so we would need to find those functions too.* 370 256 371 257 ### Resolving names for compilation 372 258 373 Now that we have every top-level symbol visible to a module ready to be used, we can proceed to resolve every symbol in each function body or initializer expression. This functionality is actually the domain of the overload resolution system rather than the module system, but the module system adds additional features which need to be handled by the overload resolution system. 374 375 First, name resolution needs to be augmented to also store the containing module name. We also need to support the `::` operator, used when specifying a specific module name. This doesn't change how the overload resolution algorithm would work, but we need the containing module name in order to generate the full symbol name in the generated C code. 376 377 Additionally, we have the fact that the imports' symbols are now available to participate in overload resolution. In this formalism, our module's symbols have priority over the imports' if they are deemed equal by the overload resolver, and we error out if we have to choose between two imported types with the same name. This allows us to handle the 2 special cases that our module system introduces to the overload resolver. 259 Now that all symbols visible to a module have been collected, we can proceed to resolve all symbols within function bodies or initializer expressions. Technically, the additional information needed to use them isn't required, though it simplifies our proof and implementation. 260 261 We resolve names in the same manner as described in the section Resolving names of exported declarations. This means that symbols in the current module (types and variables/functions are different) shadow those that are imported, though the imported symbols can still be accessed using the `::` operator. 262 263 *In Cforall, the addition of the overload system means that symbols defined within our module system only have preference over imported symbols, instead of completely shadowing them. This works by having our module system provide a list of candidates for each symbol to the overload resolution system. The overload resolution algorithm will need to be refactored to hold the full name of a symbol (symbol name + module name), and favour symbols defined within the given module (eg. `struct MinHeap` in the provided example). Whether favour means "break ties" or "always use current module's symbols if possible" is a design choice (I prefer the latter).* 264 265 ## Concepts 266 267 A module is a collection of symbol definitions, which it can export to other modules. In order to define said symbol definitions, it may import other modules' symbols. We consider 3 kinds of symbols: variables, functions and types. 268 269 *Cforall introduces other kinds of symbols, such as traits. The C language as well as GCC extensions also support a significantly more complex grammar than what is used in this proposal. I discuss some more about this in Other kinds of symbols section.* 270 271 Our proposed module system is one instance within the space of all viable module systems, created from a number of design choices and solutions to challenging problems. We will highlight a number of them, and discuss how our module system handles them. We will also describe a number of functionalities in our system 272 273 ### Handling unboxed types 274 275 In the example below, we note a problem: in order to use `B::foobar` and `B:bb` in C, we also need to provide the definition of `A::a`. This breaks our principle of decoupled symbol exports! 276 277 ``` 278 Module B 279 -------------------------------------- 280 | module; | 281 imports | import A; | exports 282 A::a --------> | export int b = a; | --------> B::b 283 A::aa | export struct aa foobar(); | B::foobar (+ A::aa) 284 | export struct bb {struct aa foo;}; | B::bb (+ A::aa) 285 -------------------------------------- 286 ``` 287 288 This problem arises because the importer needs to know how much space to allocate on the stack for the type. Languages that use boxed types (eg. Java, Go) do not have this restriction, since the caller only needs to store a pointer on the stack (eg. when calling `B::foobar`). In contrast, C and other systems programming languages (eg. C++, Rust) require handling unboxed types, since it demands control over pointer indirection (also, there may not be a heap available to support boxed types). This means we need to know the definitions of types in order to use them, even if it's just to allocate enough space on the stack. In C/C++, this is done by including transitive dependencies in headers; in Rust, this is done by compiling an entire crate at a time. 289 290 It turns out that the importer need not know the definition of `A::aa`, however. **We only need to provide the size and alignment of a type in order to store it (in addition to the name).** This means that if we knew that `A::aa` is 32 bytes in size, aligned to 8 bytes, the caller of `B::foobar` has enough information to output working assembly. This allows any importer of `B::foobar` to assign the output of `B::foobar` to a variable, then pass it by value into a function. However, the importer cannot access the fields of `B::foobar` or change them unless it imports `A::aa`, giving us exactly the symbol decoupling we want. 291 292 An interesting property of this is that, depending on the ABI (Application Binary Interface, determines exactly how things like functions are called in assembly), if `A::a` changes but does not exceed its size or alignment allocations, any module that imports `B::foobar` but not `A::a` does not need to be recompiled! This could lead to libraries where user-facing `struct` types are set to be larger than they currently are, so that even if the `struct` needs its definition changed, the user's code may not even need to be recompiled! 293 294 How do we provide the size and alignment of a type, though? A couple strategies come to mind: 295 1. We provide the entire type definition to the compiler, but anything within the module is not allowed to access its fields unless it is imported. Much simpler to implement, but we might need to crawl the codebase and a lot of additional code needs to be provided to the compiler. 296 2. We extract size/alignment information into a separate generated module interface, which is used in place of the actual type. This provides much more succinct code to the compiler. We still need to crawl the codebase to get the information, but it is cached afterwards. One way to calculate size/alignment information is to generate a separate program with all the types, compile it, and inspect the result. Another way is to collect the sizes of the built-in types, then calculate the size/alignments of types from the idea that `struct` types take the maximum alignment of their fields. 297 3. We store all size/alignment information in some central file. This is the same as the previous idea, except that we store all size/alignment in a central file instead of generating a module interface file for every module. This could help reduce file IO cost and simplify updating size/alignment information when types change. 298 299 Option 3 sounds like the most ideal option, but option 1 is the easiest to implement, and is the configuration that is used in the Formalism section. There is also another "gotcha" to consider with options 2 and 3: using type-punning to implement them seems to break strict aliasing rules in C99. A common way to avoid type-punning is to use `union` types, but 300 301 *Cforall has the sized trait, which may prove useful for me to use when implementing things (it will likely help with handling polymorphic functions and types). I was also made aware of lifetime operations in Cforall, which I am not that familiar with. I assume they are handled the same way as move/copy constructors - they also need to be provided alongside the size/alignment information.* 302 303 ### C macros are not exportable 304 305 **In our proposal, the C preprocessor runs before the module system.** This allows us to write macros that auto-generate symbols to export, which is likely an important C pattern. However, this makes it very challenging to export C macros. 306 307 It is worth pointing out that C++20 modules also do not support exporting C macros. A problem with C macros is that they are too powerful - they can make arbitrary changes to your code, forcing you to reparse, and they operate using a separate language that ignores C code structure. With modern C++ and Cforall features, the need to have modules export C macros is usually not needed outside of porting existing C code. 308 309 What if you are dead set on making modules export C macros? You could make the C preprocessor module-aware by using directives such as `#pragma module` and `#pragma export MY_VALUE 3`. Unfortunately, due to having to work within the confines of the C preprocessor, you'd likely be stuck with having to settle with a simple module system. A better alternative would be to have the C preprocessor to output all C macros that it encountered while preprocessing a file, then collect all import statements within the file. We could then reparse the file using all of the C macros from the other files. This configuration is still somewhat unideal, but it does satisfy some common use-cases, and could be used as a starting point. Note that if you're really, really dead set on having the exact same behaviour, just use precompiled headers - they're an optimization that ensure the same behaviour. 310 311 The other problem with supporting C macros is that they introduce a kind of ordering on `#include` statements, which we are trying to avoid with imports. We also don't wish to promote the use of C macros when other modern language features are often a much better fit. As such, we elect not to support exporting C macros out of modules. 312 313 But what about porting existing C code? With our current formulation, we don't provide a way to use C headers within our modules. This problem is complex enough that we will pull it out into a dedicated section (see Porting existing code section). 314 315 ### Import ordering does not matter 316 317 A key problem with `#include` is that textual inclusion is order-dependent - including `a.h` before `b.h` may result in different behaviour than `b.h` before `a.h`. Not only does this cause confusion among developers, it also affects compilation speed - while headers can be precompiled, they must always account for the possibility of a C macro changing the meaning of the header file. 318 319 Since our module system runs after the C preprocessor, our generated module interfaces can be optimized for maximum compiler efficiency. The potential gains are significant: when the standard library became available to import in C++23 (ie. `import std;`), the time to compile a "Hello World" program essentially halved. This significant reduction in build times likely translates to faster iterations and more productive developers. 320 321 In terms of developer clarity, it's worth noting that in most programming patterns, this ability for `a.h` to influence `b.h` is undesirable. so when a user imports our modules, they receive the same symbols regardless of import order. 322 323 Part of the inspiration for order-agnostic imports comes from functional programming, with its focus on limiting side effects. But what if you want to pass macros into a module? Here, we can take inspiration from the concept of a Functor from the ML programming language. While not in our current implementation, we can create a special kind of module which takes in a set of macros to be used when parsing the module. Otherwise, we use the same set of macros when parsing every module (eg. `#define ARCH amd64`, which we can define in some module configuration file). 324 325 In the future, we can also support imports within functions. This takes some inspiration from Python, except that they are executed at compile-time instead of run time. These behave like a normal import, except they only take effect within the code block that they are imported in, and only for the statements after the import statement. 326 327 Another area for future development is to use the generated module interfaces in a language server. The explicit visibility control provided by our module system over regular C allows us to provide cleaner information to AI in order to provide robust code-generation capabilities. 328 329 ### Forward declarations are unnecessary 330 331 Since we scan all top-level declarations while generating module interfaces, forward declarations are unnecessary. In fact, the module system can resolve what would otherwise be incomplete types in C. In the following example, even if we added `struct Other;` after `module;`, `struct Other details;` would still break. However, with our module system, we can resolve this. 332 333 ``` 334 module; 335 export struct Node { 336 int id; 337 struct Other details; 338 }; 339 struct Other { 340 int symbols[max_symbols_supported]; 341 }; 342 const int max_symbols_supported = 100; 343 ``` 344 345 This property of being able to "look ahead" in a file echoes some parallels with classes in object-oriented languages. In those languages, a method's definition can call a method defined later, but still within the same class definition. In fact, modules and objects share many features (eg. abstraction, encapsulation). The main difference is that a module behaves more like a singleton class (as they are actually sections of code), wheras objects can be instantiated. 346 347 A common pattern in C is to have mutually recursive types defined in different files (eg. `node.h` defines `struct Node {struct Edge **edges;};` while `edge.h` defines `struct Edge {struct Node **nodes;};`). In regular C, this pattern is resolved by inserting forward declarations of the other struct before defining the given struct. In order to make this work in our module system, we would import the other module. This highlights a potential limitation with our module system - it doesn't allow us to only export a forward declaration of a type. We argue this problem isn't particularly concerning, but there are ways to extend our system to support this (we can consider reserving certain export tag names, see Generating multiple module interfaces for details). 348 349 The key feature that allows this is that our specification allows parsing top-level declarations with a context-free grammar (expressions in C are context-sensitive, but top-level declarations don't require parsing the expression part). As we extend this proposal with additional features (eg. nested types), care should be taken to ensure we don't introduce ambiguities in naming or context-sensitive parts. I am unsure if C++ top-level declarations are context-sensitive or not - if they are, then they would need to be disambiguated in order to apply this proposal to them. 350 351 ### Explicit transitive module dependencies 352 353 A problem with C headers is that including them can cause other headers to be included. This can lead to difficulties refactoring the code to use a different library, as the existing code inadvertently makes use of these implicitly included headers. C's design requires this because types must be completely defined in order to use them, but our system avoids this problem by having symbol exports be decoupled from each other. As such, we can produce much cleaner module interfaces. 354 355 However, we do not wish to ban transitive module dependencies, as there are many cases where we want to bundle multiple modules together. Due to this, we allow explicit transitive module dependencies through the use of `export import`. Let's take a look at two possible use-cases: 356 357 ``` 358 // module std 359 module; 360 export import std/vector; 361 export import std/iostream; 362 363 // module wrapper 364 module; 365 export import std/map; 366 export import std/string; 367 export typedef map[string, string] stringmap; 368 369 // module client 370 module; 371 import std; // this is equivalent to import std/vector; 372 // import std/iostream; 373 // import std; 374 import wrapper; // this is equivalent to import std/map; 375 // import std/string; 376 // import wrapper; 377 ``` 378 379 To recap, if a module imports a module that contains `export import ABC;`, then it is as if the importing module had `import ABC;` as one of its import statements. 380 381 In the above case, `std` allows developers to import a common set of functionality without having to remember the names of its constituent modules. `wrapper` handles a special case with exporting `typedef`: Since `wrapper` does not own `map` or `string`, we should `export import std/map; export import std/string;` if `client` expects to use the fields of `map[string, string]` (otherwise, we would only know that `stringmap` is a `map[string, string]`). 382 383 ### Exporting inline functions 384 385 Inline functions require the definition of a function to be available to the compiler/optimizer. However, we want the importing module to behave the same way, whether an imported function is inline or not. Let's look at an example: 386 387 ``` 388 // module inlined 389 module; 390 import supporting; 391 export inline int power(int a, int b) { 392 int ret = 1; 393 for (int i=0; i<b; ++i) ret = multiply(ret, a); 394 return ret; 395 } 396 // module supporting 397 module; 398 import deeper; 399 export int multiply(int a, int b) { 400 int ret = 0; 401 for (int i=0; i<b; ++i) ret = add(ret, a); 402 return ret; 403 } 404 // module deeper 405 module; 406 export int add (int a, int b) {return a + b;} 407 // module client 408 module; 409 export int foobar() {return power(10, 5);} 410 ``` 411 412 In this scenario, to compile `client`, the compiler needs the exported symbols of `inlined` and `supporting`. Note that the compiler does not need `deeper` because `multiply()` is not an inline function. However, `client` cannot access symbols from `supporting` unless it imports it directly. 413 414 If `inlined` has not generated its module interfaces by the time `client` is being compiled, then the presence of a single inlined function would cause all modules imported by `inlined` to need to be traversed, since the module system cannot ascertain which module contains `multiply()`. After the module interfaces are generated (and the modules have not been edited in the meantime), we can use those instead of traversing imports. 415 416 There are some optimizations that can be made here: if `supporting` has an inline function that is not used by `inlined`, then it does not need to be processed. In order to support this, we would need to handle partially defined module interfaces (as we would avoid parsing the body of the inline function in `supporting` while the module system processes `client`). There are also optimizations to be made in terms of how to store the body of an inline function in a module interface so that updates don't require reanalyzing the entire function body. 417 418 ### Generating multiple module interfaces 419 420 A common problem with encapsulation is that certain modules may need certain functionalities from others that should otherwise be private. In object-oriented languages, this is accomplished by designating "friend classes", which get full access to a class' internals. However, this "all or nothing" approach lacks fine-grained control. Another alternative is to present multiple interfaces for importers to choose from, facilitated by the fact that we already generate module interfaces. 421 422 ``` 423 // module multiple 424 module; 425 export int public_function() {return 100;} 426 export(internal) int internal_function() {return 10;} 427 int private_function() {return 1;} 428 // module kernel 429 module; 430 import multiple(internal); // can use public_function, internal_function 431 // module client 432 module; 433 import multiple; // can use public_function 434 ``` 435 436 Here, `internal` is a user-defined tag. We could use `export(aaa)` and `import multiple(aaa)` instead. Additionally, we can attach multiple tags to an export (eg. `export(aaa, internal)`) and import with multiple tags (eg. `import multiple(aaa, internal)`). A symbol that is exported with tags is imported if the import statement includes one of its tags, and a symbol that is exported without a tag is imported as long as the import statement does not have `-` as its first argument (eg. `import multiple(-, internal)` would only make `internal_function` visible). 437 438 This can be implemented by producing one module interface for every unique export tag in a module (+ export without a tag). Since symbol exports are decoupled from each other, we would only need to handle idempotent symbol exports (ignore duplicate symbol exports). Additional optimizations can be made to this process, such as storing all module interfaces in one file, processing common tag combinations, and handling updates to the module file. 439 440 Note that `private_function()` is technically still accessible by accessing its mangled C name directly. An extension to this module system could be to automatically insert `static` to these functions if a flag is set. 441 442 ### Modules use `import` instead of `#include` 443 444 Symbols defined in modules behave differently to those in regular C. In regular C, the defined symbols can be found in the outputted assembly using the same name that they were defined with. This is what allows forward declarations in C headers to work (eg. `void func();` produces a label `func` in the assembly). Unfortunately, this raises the possibility of having symbol names clash with each other. We avoid this by having every symbol defined within a module essentially have their name prepended with the module's file path. This difference in functionality means we cannot use `#include` to define module interfaces. 445 446 Taking inspiration from C++20 modules, we use `import` to make a module's exported symbols be visible to another module. We also take some inspiration from C++20 modules with the use of `module;` as the first statement in a module file. However, contrary to C++20 modules, we do not require modules to be compiled fully in order to refer to any of its symbols, as part of an effort to avoid imposing any architectural constraints that aren't already in regular C. 447 448 If the set of imported symbols still cause name clashes, we take inspiration from C++ namespaces and use the `::` operator to specify which module we are referring to. This uses the same syntax and semantics as `import` statements when resolving the module name. 449 450 More discussion on the details of how modules are implemented is found in section Implementing module namespaces. 451 452 [[THE REST OF THIS DOCUMENT IS A WORK IN PROGRESS]] 453 454 ### Cyclic modules 455 ### Other kinds of symbols 456 [[union types]] 457 The forall keyword is an addition to Cforall to support polymorphism, with polymorphic functions using dictionary passing and a single implementation. If a module exports a forall statement, the module owns the polymorphic function implementations, while the polymorphic function declarations are exported (if these were declared inline, the definition could be exported, similar to C++20 modules). Polymorphic types are instantiated from the caller's side, so their definitions are exported. This may present problems, but currently I am not familiar enough with Cforall to judge. 458 459 An interesting case is to consider if Cforall could be updated to perform specialization (multiple implementations for a single function) in addition to the single implementation strategy. An example of this being done in a production language is with Rust's `impl` vs `dyn` traits. To support this, the module system would need to be updated, as we would want the multiple implementations to exist within the module that owns the forall statement. 460 ### Implementing module namespaces 461 A module is defined by having `module;` be the first statement in a source file (somewhat similar to C++20 modules). Internally, modules work like namespaces, implemented by prepending the module name in front of all declared symbols. There are multiple alternatives to determine the module name - we use option 2 for its brevity: 462 1. Have the user define the module names (eg. `module A;`). This is similar to how Java and C++ require specifying packages and namespaces, respectively. This gives the developer some flexibility on naming, as it is not tied to the file system. However, it raises some questions surrounding how module discovery works (if a module imports `A`, where is `A`?). 463 2. Have the module names be defined from a "root directory" (eg. `module;` is module `A` because it is located at `src/A.cfa`, and `src/` is defined as the root directory). This creates import paths that look similar to include paths, allowing us to align more closely with existing C programmers. When searching for an appropriate module, a search is conducted first from the current directory, then we look for an appropriate library (similar to the include path in C). A downside is that this precludes adding nested modules (ie. module definitions within a module file), though nested modules are arguably not that important. 464 465 Another design choice that was made was to have files with the same name as a folder exist outside their folder. For example, module `graph` exists at `src/graph.cfa`, while module `graph/node` exists at `src/graph/node.cfa`. The alternative is to have module `graph` at `src/graph/mod.cfa` - this may be more familiar to some developers, but this complicates module discovery (eg. if there exists a module at `src/graph.cfa` at the same time, which takes precedence? Does `graph` need to `import ../analysis` in order to import the module at `src/analysis`?). Taking insights from Rust's move from `mod.rs` to files with the same name as the folder, we opt to use the more straightforward strategy. 466 467 This prepending of the module name in front of all symbols within a module can result in undesirable behaviour if we use `#include` within a module, as all of its contents will be prepended with the module name. To resolve this, we introduce extern blocks, which escape the module prefixing (eg. `extern { #include <stdio.h> }`, though with a newline after the `>`). 468 469 This configuration allows for a special kind of optimization to be performed: since modules prepend their names to their symbols, every symobl can be disambiguated. This allows us to add functionality to perform a "unity build", where the entire codebase can be compiled within a single translation unit, allowing the compiler to inline functions as its discretion. This would allow us to balance a "development configuration" with the benefits of modularization, alongside a "release configuration" with maximum optimizations. 470 ## Porting existing C code 471 [[See C macros are not exportable]] 472 [[See forward declarations are not necessary. An interesting idea is to allow naked `#include` within the module file, and try and handle it. It works up until type definitions... so can't work]] 473 [[So have import statements, just like C++20 imports]] 474 ## Handling initialization order 475 ## Why C++20 modules failed (and why we will succeed) 476 [[I will die on the hill of cyclic modules]]
Note:
See TracChangeset
for help on using the changeset viewer.