source: doc/proposals/modules.md @ bf64de3

Last change on this file since bf64de3 was bf64de3, checked in by Andrew Beach <ajbeach@…>, 3 months ago

Update to the module system, folding in feedback and some PAB content.

  • Property mode set to 100644
File size: 14.4 KB
Line 
1AJB |
2----'
3
4Module System Proposal
5======================
6
7In this proposal we will be descussing modules. Although their exact nature changes between programming languages, modules are the smallest unit of code reuse between programs, or the base unit in separate compilation. Modules, and the extended module system, will be tied up in various stages of compilation and execution, with a particular focus on visibility between different parts of the program.
8
9Note that terminology is not fixed across languages. For instance, some languages use the word package or library instead. Module was chosen as the generic term because it seems to have the least amount of other uses (for example, a package is sometimes a group of modules).
10
11In C there is no formal definition of module, but informally modules are a pair of files, the body file (.c) and the header (.h). The header provides the interface and the body file gives the implementation. (A translation unit is a source file, usually a .c file, and all the recursively included files.) Some modules, like the main module,
12
13Uses of Modules
14---------------
15This section covers the features module system to allow for the separatation of code across modules and why
16
17Modules are often, but not always, the means by which a language views source files. There is almost always some kind of parity between modules and source files, with modules being mapped onto one (or a few) source files. Sometimes the use of modules is used to find the approprate source files, requring this parity to be enforced in the language. Other times the parity is just a convention or is enforced for other reasons.
18
19[]
20
21If there is a universal feature of modules, it is information visibility. Modules decide what information within them is visible to other modules. Here visibility is the course grained sense of "visible in another module for any purpose".
22
23[]
24
25Accessablity is the more fine grained partner to visibility, allowing for information to be visible, but only usable for certain purposes. This includes privacy and friendship - only usable in certain parts of the program - or inlining information - only usable by the optimizer.
26
27In languages that have namespacing/qualified-names, modules will often interact with namespaces. Such as each module placing its declarations in a namespace of the same name.
28
29C Comparisons
30.............
31To be 99% compatible with C, Cforall pretty much has to use the C-preprocessor (or replace it with a Cforall-preprocessor, that is in turn backwards compatible). To this end, how well does the C-preprocessor operate in these areas?
32
33C is very good at separate compilation. Parallel computation is completely unhindered and recompilation is good, although sometimes a bit preemptive. Information sharing is a bit weaker, C has a tendency to overshare because its copy-and-paste rule gets the entire file. This is also why its recompilation can be preemptive. It is on the user to follow conventions and figure out what information needs to/should be shared. (On a personal note, I have spent a lot of time working to remove extra includes from the Cforall compiler.)
34
35C doesn't use modules to implement any behaviour. Except for preserved source location information used in error messages, they are completely erased by the preprocessor.
36
37Module Linkage Specification
38----------------------------
39A proposed solution keep track of code and whether or not we are in the module we are currently being compiled. This "is_in_module" linkage* is used in the compiler (and perhaps the preprocessor) to mark different declarations. Usually, only the original source file (the `.cfa` file) and its header (a `.hfa` file) are considered to be in the module.
40
41Prelude definitions are never considered to be inside the current module, except when compiling the prelude itself.
42
43* That is linkage in the sense of linkage specifier (like mangled, or overridable) not external/internal linkage (part of storage classes).
44
45How to Specify the Module
46-------------------------
47Perhaps the trickiest issue is figuring out where the module is after the C-preprocessor has finished its work.
48
49If we don't include the preprocessor in the this (which has the distinct advantage of not needing to update the C-preprocessor). Then the module needs to be blocked out in C code. This is fairly trivial in the source file, marking the end of the include statements is usually good enough.
50
51Headers are harder because they are almost always mixed in with other includes, both in other files and their own. I have been able to think of two solutions that do not get caught up in these problems:
521.  Mark out the header include in the source file (in addition to the source file body) and have the header escape all of its includes. This gives us start and stop points for the module.
532.  Have the header mark its body in a way that mentions the source file. Most includes may have these blocks, but the non-matching ones can be discarded.
54
55Using the preprocessor (or at least relying on the line marks/processed line directives) opens things up a bit more. With accurate knowledge of what original file a declaration came from, all that needs to be done it map files onto modules. This is less flexible, but it covers the standard layout of headers, and even many of the unconventional layouts I have seen.
56
57Given which files are part of the module, a source file is always part of its own module. The paired header (same path and name, except for the extension) could automatically be included in the module, but this might take away some needed flexibility. Allowing intermediate extensions (see the AST/Pass files for an example) would allow for slight more flexibility. The other way would be to specify in the source files themselves. Headers could say which modules they are a part of, but I think the more natural solution may be to have a file already in the module say what other files in the module it is including.
58
59Within that, it could always go with the include, part of the include or a list of files in the source files. Any of these options should work.
60>   // With the include:
61>   #pragma module "filename.hfa"
62>   #include "filename.hfa"
63>
64>   // Part of the include:
65>   #include_module "filename.hfa"
66>
67>   // Listed Source Files:
68>   #pragma module "filename.hfa" "included-from-filename.hfa"
69>   #include "filename.hfa"
70>   // In the previous examples, the include in filename.hfa would be updated.
71
72Uses of Module Linkage
73----------------------
74After we know what sections are in the module and which are not, how do we use this to actually support coding?
75
76In the preprocessor, the simplest use is a conditional macro. Takes two arguments, and expands to one of them depending on if the tokens were found in the module or not. This would require an implementation directly in the preprocessor.
77>   __MODULE__(if_inside_module, if_outside_module)
78
79In the compiler proper, the linkage can be checked on declarations to handle them in the compiler. A simple example is a function specifier that takes the module status into account. Say "module_inline", which becomes "inline" (if anything) in the module and "extern inline" elsewhere. This (using some GCC behaviour) allows every file to see the function definition and inline it, but only the module will keep a non-inlined copy. This ensures that there is only one translation unit with a copy without involving the linker.
80
81This may also help solve other memory-allocated-in-header problems, as this memory can then only be allocated in the module.
82
83It may also be used to help implement visibility. The level of granularity is still module level, but private information can be included in the header, used by the compiler, but it will be hidden from direct use in other modules. For example, you could make the fields of a structure as private, while the layout is known for the compiler, other modules cannot perform field access and would have to use other provided functions to manipulate and read the type. (There are a few containers that do this by convention by in the library.)
84
85Remaining Issues
86----------------
87Not all of these have to be solved, but there are still some areas that could really use an improvement.
88
89First, using modules as the visibility tool does lead to a major short-coming. That is, because there is only "in-module" and "out-of-module", multiple things in the same header don't know that they are in the same module. Which could prevent adding inline functions in the header.
90
91Second, this does nothing to solve the oversized header issue. It does not reduce any requirements on what includes need to be use.
92
93Alternate Solutions
94-------------------
95There are other ways C's modules could be improved in Cforall.
96
97Explicit Module Blocks
98......................
99Instead of trying map files to modules, they could instead be declared explicitly. Marking out the beginning and the end of a section of code as a module. If built on top of the body/header and include system might look like this.
100
101>   extern module NAME {
102>       BODY
103>   }
104>
105>   module NAME {
106>       BODY
107>   }
108
109The extern module goes in the header, the other module goes in the body. The basic usage is the forward declarations in the header module and the body contains the definitions. It can be used to check that the two sets match, but on its own it is only replicates the current header/body divide with a bit more explicit syntax. However, it can be used as the base for a lot of features of the module linkage system. It does solve the "knowning two declarations came from the same other module" problem (and could work with namespaces) but is otherwise very similar for a heavier syntax.
110
111Compiled Headers
112................
113Most programming languages do not share source code between modules. Instead each module is compiled without looking at the source code in other modules. The result of compilation includes all the information required for later stages of compilation and information for compiling other modules.
114
115This is a more popular pattern more recent programming languages. It does have some advantages, such as reducing the amount of times that a file will need to be processed and can cut out unneeded transitive information. It is downsides include adding dependences between modules and it prevents any circlar dependences between modules.
116
117There is one other notable downside, and that is retrofitting this pattern on top of C. The problems with GCC precompiled headers and C++ modules give some indication of how tricky the situation is. The problem is the C pre-processor, not only is this the tool by which modules are implemented, but they contain information for the preprocessor itself, such as macros. Macro definitions must also be applied to the text of source files and so must be preserved. This might be possible in cases with strict dependences from the included file, but there are more unusual uses where macros depend on their context (previous includes or a define before the include) in their definition and these would almost imposible to translate over.
118
119##########################################################################################
120
121PAB |
122----'
123
124Programming languages are divided into those embedded in an IDE, think Smalltalk and Racket, largely manipulating a symbol-table/abstract-symbol-tree, and those where the IDE is an external program largely manipulating program text.
125Separate compilation in programming languages without an embedded IDE is the process of giving a compiler command a series of files that are read and processed as a whole.
126The compiler output is placed in another set of files for execution loading or further processing.
127Therefore, in languages without an embedded IDE, the translation unit is some combination of files, where files are defined by the underlying operating system.
128I am unaware of a programming language where it is possible to say: within the following F files, only compile the following C components without compiling anything else.
129I'm sure such a language exists somewhere, but I don't know of it.
130For languages with non-embedded IDEs, there exist separate program configuration and management tools, like Make, Maven, etc.
131
132Since C, and therefore CFA, is in the non-embedded IDE category, separate compilation is reading multiple translation units that are embedded in operating-system files.
133In a file system where file-links can be embedded in data creating a tree, duplicate source code can be eliminated by generating a complex linking structure among the source files.
134Without embedded file-links, dynamic embedding using #include/import is necessary to compose all the program components necessary for a compilation.
135
136I see two separate issues with respect to program structuring for controlling visibility and initializing a program.
137
138Information hiding can occur locally and globally.
139
140Local information hiding leverages lexical scoping to control visibility, such as public/private.
141
142    struct S {
143       private:
144          ...
145       public:
146          ...
147    }
148
149In a non-OO language, like CFA, this might be accomplished with friendship.
150
151    struct S {
152       friend void foo( ... );
153       friend void bar( ... );
154       ...
155       private:
156          ... // friends only
157       public:
158          ...
159    }
160
161
162I'm assuming this might work with polymorphic routines, too, like friend templates.
163I appreciate this is not 100% secure, as for C++ friendship.
164
165Global information hiding is controlling imports/exports from a translation unit (file).
166C++ namespace provides control of names but not information hiding (I think).
167Modules provide name and information hiding.
168
169     module M using M1, M2 { // extra scope level => qualification
170         private:
171            ...
172         public:
173            ...
174                        ?( M & ){ ... } // module constructor
175     }
176
177The "using" is defining module dependences, i.e., what include files have to be brought in.
178The purpose of modules is for organize a collection of program components, like the link-list and string stuff, within the same translation unit, versus multiple separate TUs.
179Hence, all of Mike's stuff is in the same translation unit, but nicely subdivided into multiple independent sections within that unit.
180The module constructor runs any global initialization required to ensure its contents is in a sound state, like zeroing global state or running code.
181
182At the linker level, an extra step is necessary to perform a transitive closure across module dependences, i.e., build a "using" graph to know what order to run the module constructors.
183For example, the heap has to be initialized before any other code that uses it.
Note: See TracBrowser for help on using the repository browser.