Changeset 91b9e10


Ignore:
Timestamp:
May 29, 2024, 10:17:08 AM (2 weeks ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
822332e
Parents:
96c04e4
Message:

added some ideas to the module proposal

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/modules.md

    r96c04e4 r91b9e10  
     1AJB |
     2----'
     3
    14Module System Proposal
    25======================
     
    710---------------
    811The most straight forward purpose of modules is to enable separate compilation.
    9 This inturn reduces recompilation, by isolating changes, and parallel compilation, but making modules independent.
     12This in turn reduces recompilation, by isolating changes, and parallel compilation, but making modules independent.
    1013
    1114An related feature is sharing information between modules. Information needed by other modules must be shared. However, avoiding sharing extra information can further isolating changes, and can also reduces the work of compiling a single module.
     
    1518C Comparisons
    1619.............
    17 To be 99% compatable with C, Cforall pretty much has to use the C-preprocessor (or replace it with a Cforall-preprocessor, that is in turn backwards compatable). To this end, how well does the C-preprocessor operate in these areas?
     20To be 99% compatible with C, Cforall pretty much has to use the C-preprocessor (or replace it with a Cforall-preprocessor, that is in turn backwards compatible). To this end, how well does the C-preprocessor operate in these areas?
    1821
    19 C is very good at separate compilation. Parallel computation is completely unhindered and recompilation is good, although sometimes a bit premptive. Information sharing is a bit weaker, C has a tendency to overshare because its copy-and-paste rule gets the entire file. This is also why its recompilation can be premptive. It is on the user to follow conventions and figure out what information needs to/should be shared. (On a personal note, I have spent a lot of time working to remove extra includes from the Cforall compiler.)
     22C is very good at separate compilation. Parallel computation is completely unhindered and recompilation is good, although sometimes a bit preemptive. Information sharing is a bit weaker, C has a tendency to overshare because its copy-and-paste rule gets the entire file. This is also why its recompilation can be preemptive. It is on the user to follow conventions and figure out what information needs to/should be shared. (On a personal note, I have spent a lot of time working to remove extra includes from the Cforall compiler.)
    2023
    2124C doesn't use modules to implement any behaviour. Except for preserved source location information used in error messages, they are completely erased by the preprocessor.
     
    2326Module Linkage Specification
    2427----------------------------
    25 A proposed solution keep track of code and wheither or not we are in the module we are currently being compiled. This "is_in_module" linkage* is used in the compiler (and perhaps the preprocessor) to mark different declarations. Usually, only the original source file (the `.cfa` file) and its header (a `.hfa` file) are considered to be in the module.
     28A proposed solution keep track of code and whether or not we are in the module we are currently being compiled. This "is_in_module" linkage* is used in the compiler (and perhaps the preprocessor) to mark different declarations. Usually, only the original source file (the `.cfa` file) and its header (a `.hfa` file) are considered to be in the module.
    2629
    2730Prelude definitions are never considered to be inside the current module, except when compiling the prelude itself.
     
    39422.  Have the header mark its body in a way that mentions the source file. Most includes may have these blocks, but the non-matching ones can be discarded.
    4043
    41 Using the preprocessor (or at least relying on the line marks/processed line directives) opens things up a bit more. With accurate knowledge of what original file a declaration came from, all that needs to be done it map files onto modules. This is less flexable, but it covers the standard layout of headers, and even many of the unconventional layouts I have seen.
     44Using the preprocessor (or at least relying on the line marks/processed line directives) opens things up a bit more. With accurate knowledge of what original file a declaration came from, all that needs to be done it map files onto modules. This is less flexible, but it covers the standard layout of headers, and even many of the unconventional layouts I have seen.
    4245
    43 Given which files are part of the module, a source file is always part of its own module. The paired header (same path and name, except for the extension) could automatically be included in the module, but this might take away some needed flexablity. Allowing intermediate extensions (see the AST/Pass files for an example) would allow for slight more flexability. The other way would be to specify in the source files theselves. Headers could say which modules they are a part of, but I think the more natural solution may be to have a file already in the module say what other files in the module it is including.
     46Given which files are part of the module, a source file is always part of its own module. The paired header (same path and name, except for the extension) could automatically be included in the module, but this might take away some needed flexibility. Allowing intermediate extensions (see the AST/Pass files for an example) would allow for slight more flexibility. The other way would be to specify in the source files themselves. Headers could say which modules they are a part of, but I think the more natural solution may be to have a file already in the module say what other files in the module it is including.
    4447
    4548Within that, it could always go with the include, part of the include or a list of files in the source files. Any of these options should work.
     
    6063After we know what sections are in the module and which are not, how do we use this to actually support coding?
    6164
    62 In the preprocessor, the simplest use is a conditional macro. Takes two arguments, and expands to one of them depending on if the tokens were found in the module or not. This would require an implemenation directly in the preprocessor.
     65In the preprocessor, the simplest use is a conditional macro. Takes two arguments, and expands to one of them depending on if the tokens were found in the module or not. This would require an implementation directly in the preprocessor.
    6366>   __MODULE__(if_inside_module, if_outside_module)
    6467
     
    6770This may also help solve other memory-allocated-in-header problems, as this memory can then only be allocated in the module.
    6871
    69 It may also be used to help implement visibility. The level of granularity is still module level, but private information can be included in the header, used by the compiler, but it will be hidden from direct use in other modules. For example, you could make the fields of a structure as private, while the layout is known for the compiler, other modules cannot preform field access and would have to use other provided functions to manipulate and read the type. (There are a few containers that do this by convention by in the library.)
     72It may also be used to help implement visibility. The level of granularity is still module level, but private information can be included in the header, used by the compiler, but it will be hidden from direct use in other modules. For example, you could make the fields of a structure as private, while the layout is known for the compiler, other modules cannot perform field access and would have to use other provided functions to manipulate and read the type. (There are a few containers that do this by convention by in the library.)
    7073
    7174Remaining Issues
     
    7679
    7780Second, this does nothing to solve the oversized header issue. It does not reduce any requirements on what includes need to be use.
     81
     82
     83##########################################################################################
     84
     85PAB |
     86----'
     87
     88Programming languages are divided into those embedded in an IDE, think Smalltalk and Racket, largely manipulating a symbol-table/abstract-symbol-tree, and those where the IDE is an external program largely manipulating program text.
     89Separate compilation in programming languages without an embedded IDE is the process of giving a compiler command a series of files that are read and processed as a whole.
     90The compiler output is placed in another set of files for execution loading or further processing.
     91Therefore, in languages without an embedded IDE, the translation unit is some combination of files, where files are defined by the underlying operating system.
     92I am unaware of a programming language where it is possible to say: within the following F files, only compile the following C components without compiling anything else.
     93I'm sure such a language exists somewhere, but I don't know of it.
     94For languages with non-embedded IDEs, there exist separate program configuration and management tools, like Make, Maven, etc.
     95
     96Since C, and therefore CFA, is in the non-embedded IDE category, separate compilation is reading multiple translation units that are embedded in operating-system files.
     97In a file system where file-links can be embedded in data creating a tree, duplicate source code can be eliminated by generating a complex linking structure among the source files.
     98Without embedded file-links, dynamic embedding using #include/import is necessary to compose all the program components necessary for a compilation.
     99
     100I see two separate issues with respect to program structuring for controlling visibility and initializing a program.
     101
     102Information hiding can occur locally and globally.
     103
     104Local information hiding leverages lexical scoping to control visibility, such as public/private.
     105
     106    struct S {
     107       private:
     108          ...
     109       public:
     110          ...
     111    }
     112
     113In a non-OO language, like CFA, this might be accomplished with friendship.
     114
     115    struct S {
     116       friend void foo( ... );
     117       friend void bar( ... );
     118       ...
     119       private:
     120          ... // friends only
     121       public:
     122          ...
     123    }
     124
     125
     126I'm assuming this might work with polymorphic routines, too, like friend templates.
     127I appreciate this is not 100% secure, as for C++ friendship.
     128
     129Global information hiding is controlling imports/exports from a translation unit (file).
     130C++ namespace provides control of names but not information hiding (I think).
     131Modules provide name and information hiding.
     132
     133     module M using M1, M2 { // extra scope level => qualification
     134         private:
     135            ...
     136         public:
     137            ...
     138                        ?( M & ){ ... } // module constructor
     139     }
     140
     141The "using" is defining module dependences, i.e., what include files have to be brought in.
     142The purpose of modules is for organize a collection of program components, like the link-list and string stuff, within the same translation unit, versus multiple separate TUs.
     143Hence, all of Mike's stuff is in the same translation unit, but nicely subdivided into multiple independent sections within that unit.
     144The module constructor runs any global initialization required to ensure its contents is in a sound state, like zeroing global state or running code.
     145
     146At the linker level, an extra step is necessary to perform a transitive closure across module dependences, i.e., build a "using" graph to know what order to run the module constructors.
     147For example, the heap has to be initialized before any other code that uses it.
Note: See TracChangeset for help on using the changeset viewer.