Changeset dde0236
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/modules-alvin/1_stitched_modules/stitched_modules.md
r710623a rdde0236 5 5 <span style="color:red">PAB</span> Modules are a software engineering mechanism providing information control (hiding), separate compilation, and initialization. 6 6 7 <span style="color:yellow">AZ</span> Separate compilation and initialization are more specific to C/C++, but these are features we'd like to support. Initialization isn't handled in this document, but it can be added later. 8 7 9 C doesn't have modules 8 10 9 11 <span style="color:red">PAB</span> C provides a complex form of module through forward declarations and definitions, `#include`, and translation units with `extern` and `static` visibility control. 12 13 <span style="color:yellow">AZ</span> Yes, it does satisfy the basic definition of a module. That being said, people usually consider it "too weak" to be considered a "real module". 10 14 11 15 -- instead, C relies on the programmer (+ preprocessor) to insert the references to any symbols that are defined in other files. This is because the C compiler only processes one translation unit from top to bottom, so you have to give it everything, and in the correct order. … … 41 45 In theory, these types are infinitely large without some other semantic meaning. 42 46 47 <span style="color:yellow">AZ</span> I think I led you astray here. I talked about the limitations of the C compiler as a way to explain why header files are used. That limitation being that the C compiler was never designed to have the ability to peek into other files, so it relies on the preprocessor. More modern languages use `import` as a note to the compiler to look in another file, which is what we're trying to do here. That being said, since we need to output C code, the output of my stitched modules ends up looking very similar to preprocessed code. Ideally we'd output machine code, saving on IO cost and having to order text to fit Definition Before Use. 48 43 49 So our C module system simply analyzes other modules and extracts any imported information to give to the compiler, right? Yes, but it's a bit tricky because C is a systems programming language, which means we care a lot about how our code is compiled. 44 50 … … 46 52 Dealing with DBU is a separate problem. 47 53 54 <span style="color:yellow">AZ</span> See previous comment. I meant this as more of a prologue into why many existing module systems don't work for systems languages and their unboxed types -- if you can't hide implementation details behind a pointer, then the compiler needs to see the details. 55 48 56 An object-oriented approach hides a class' implementation behind a pointer and a vtable (also a pointer), so importers can use a class without even knowing its size. 49 57 50 58 <span style="color:red">PAB</span> The issue here is garbage collection (GC) not OO. 51 59 Languages *without* GC can place objects on the stack or heap, where objects on the stack are not hidden behind a pointer. 60 61 <span style="color:yellow">AZ</span> I would say it's both GC and performance. C programmers expect no indirection, so the compiler has to ensure that. 52 62 53 63 This doesn't work for C because it has unboxed types, so we need to expose information to the compiler. … … 66 76 ``` 67 77 78 <span style="color:yellow">AZ</span> Interesting, thanks for pointing this out. This detail doesn't seem to interfere with my modules, though it is something to note. 79 68 80 ### A previous attempt 69 81 … … 71 83 72 84 <span style="color:red">PAB</span> Your "stubs" are how Cforall handles polymorphic types. 85 86 <span style="color:yellow">AZ</span> Makes sense, it's a nice way to handle arbitrary sizes. 73 87 74 88 Any function that returned the exported type would return the "stub" type instead, so importers wouldn't need to know any implementation details unless they imported the actual type (which would use type-punning to convert between the two). This approach unfortunately doesn't work in C because type-punning breaks strict aliasing rules, and the C spec allows small structs to be unpacked when used as arguments into functions. Additionally, extracting size and alignment information can require analyzing the entire codebase -- if we have to do all that to make just unboxed types work, perhaps there are better options. … … 82 96 Otherwise, the compiler must have sudo to access source libraries that are not publicly accessible. 83 97 98 <span style="color:yellow">AZ</span> I think you hit the nail on the head -- "type stubs" was an attempt at information hiding. Given an oracle that could tell you the size/alignment of any type, if a function returned an unimported type, then it wouldn't matter if that unimported type was boxed or unboxed (ie. you never need to investigate the implementation). In essence, it's a weaker form of PIMPL because you need to expose size/alignment info. I abandoned this because of a number of fundamental issues: 1. implementing the oracle would be almost as much work as this "stitched modules" setup; 2. as you said, it changes the calling convention to legacy code. In other words: it's a lot of work, it doesn't work, and what you get out of it isn't that useful. 84 99 85 100 ### Other languages … … 89 104 <span style="color:red">PAB</span> What do you mean by acyclic here? 90 105 106 <span style="color:yellow">AZ</span> Suppose you drew out a graph of modules, where a directed edge meant "module A references a symbol in module B". From what I've seen of C++20 modules, they not allow cycles because you cannot reference any symbol from another module until the other module is compiled. While this is useful for library-level organization, at the file level this is too restrictive. It means that if you have any recursive data structure, all of its components must exist in the same C++20 module. 107 91 108 Rust compiles an entire crate all at once, so modules are essentially namespaces, and modules can import freely within a crate. Zig modules are implicitly structs, and are used by assigning the import to a name. 92 109 93 110 <span style="color:red">PAB</span> Modules can correspond one-to-one to files. This is the approach used in Python, Ruby, JavaScript, and others. Modules can correspond one-to-one to directories. This is the approach that Go uses. Modules can correspond to a combination of files and directories as well; this is the case with Rust, Java, and others. `https://denisdefreyne.com/notes/zlc9l-nrkfw-wztwz` 94 111 112 <span style="color:yellow">AZ</span> When I looked around at what other languages consider a "module", or "crate", or "library", I found the term to be completely misused. Your link does a valiant effort of trying to set up the terminology, but as you can tell from the red highlighting and the TODOs, they're not really successful. They started off with strong definitions of "package" and "module" (these are what I would call "acyclic" and "cyclic" modules), but their taxonomy falls apart once they get into the details. As an example, I believe Go modules are more like "packages" than "modules" due to their acyclic nature. 113 95 114 C and C++ header files lead to a lot of manual module management, where the programmer has to ensure the .h and its corresponding .c file stay in sync. It should be possible to condense this workflow into a single file without any practical loss of functionality. 96 115 97 116 <span style="color:red">PAB</span> `.h` files cannot be removed. They are fundamental to C and its development ecosystem. What you are trying to do is auto-generate them. 117 118 <span style="color:yellow">AZ</span> I mean, sure. I can create a tool where we run a modified version of "stitched modules", then extract the "header stuff" to generate the .h file. See section "Further work" for details. That being said, "stitched modules" is kind of like a dynamic preprocessor, so header files would no longer be a requirement -- just do the module work in the compilation step. 98 119 99 120 C++20 modules are acyclic, forcing any mutually recursive structures into the same module -- this doesn't work with the granularity of .c/.h files, which frequently share declarations with each other. Rust modules rely on whole-crate compilation, which clashes with the C philosophy of separate compilation. Zig modules share many similarities with this prototype, and present some interesting avenues for further development. As such, we will discuss Zig modules after presenting the prototype. … … 101 122 <span style="color:red">PAB</span> The concern about recursive references is a DBU issue, which is orthogonal to modules, i.e., languages exist with DBU and modules. 102 123 To remove DBU, requires a multi-pass compiler and often *whole-program* compilation to see the unboxed types. 124 125 <span style="color:yellow">AZ</span> See my note on what acyclic means. It's more than just DBU. Also note that Rust does whole-crate compilation, which is why it doesn't run into problems with DBU and unboxed types (in other words, it's a valid approach to systems programming, just not one we can take). 103 126 104 127 ## The prototype … … 168 191 Second question is information hiding during whole-program compilation. 169 192 193 <span style="color:yellow">AZ</span> Since header files are not dynamically generated (they're a piece of code that doesn't change, while "stitched modules" dynamically generates the topological ordering), you would make this work with header files by generating two header files for `a.cmod` -- one to hold `struct s1` and one to hold `struct s3`. In theory, you might need to generate as many header files as there are symbol definitions, but you can always make this work because a struct definition will never depend on itself (if it does, the program is ill-formed because the struct is of infinite size). 170 194 171 195 ### Design choices … … 232 256 ``` 233 257 258 <span style="color:yellow">AZ</span> See above: "[`module;`] would be used to distinguish a regular C file from a C module file". This is necessary because my modules are implicit namespaces, so you need some way to tell the compiler/programmer that these are treated differently (there's also the logic of exporting symbols). What you've written out is another way to set up modules -- it can work, but it's more verbose than it needs to be, and my modules can support everything you're doing here (or can be extended to support it). This section is not meant to be read standalone -- it's discussing minute details that I've considered when designing my modules, which are points I refer back to in sections "Comparison with Zig" and "Ideas for future direction". For information on what my "stitched modules" are doing, you can refer to section "The prototype". But to give a high-level understanding of what's going on, "stitched modules" is essentially a dynamic preprocessor that scans other files, extracts the necessary symbol information (eg. struct definitions), which is put into a translation unit for the C compiler to use. 259 234 260 ### Comparison with Zig 235 261
Note:
See TracChangeset
for help on using the changeset viewer.