
#10679: Generalize hi-boot/hi for signatures, to manage intermediate merged interfaces -------------------------------------+------------------------------------- Reporter: ezyang | Owner: ezyang Type: task | Status: new Priority: normal | Milestone: Component: Package | Version: 7.11 system | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- == The problem == In some situations, we need to output multiple interface files for what is morally the same module name. === Example 1: Merging external and home signatures === {{{ unit a-sig where signature A unit p where include a-sig signature A }}} Compiling `p/A.hsig` produces an interface file which contains just the definitions declared in `p`. However, someone including `p` should see the merge of the interface of `p/A.hsig` AND `a-sig/A.hsig` (which was included.) === Example 2: Merging two home signatures === {{{ unit p where signature A signature B where import A ... signature A where import B ... }}} What should we do if a signature is specified multiple times in the same unit? The compilation of each produces a distinct interface, and the public interface we want to expose is the merge of the two. (And by the way, what's the source file name of `A`, if we are not using the inline syntax?) === Example 3: Merging a signature and a module === {{{ unit p where signature A module B where import A ... module A where import B ... }}} `A` and `B` are mutually recursive, and we want to use a signature file to break the gap. The signature produces an interface file, only to be overwritten when we actually define the module proper. But wait! We have a solution for this example already: the first interface file for `A` is not saved to `A.hi`, but `A.hi-boot`... == The proposal == I want to take the `A.hi-boot` versus `A.hi` distinction and generalize it: we should be able to name intermediate interface files A.1.hi, A.2.hi, ... and finally A.hi (which is publically visible outside the unit.) This naming convention applies to Haskell files too. === User-visible consequences === Every signature file is numbered, and every import of a signature file refers to a specific number. This number is unique among all other modules in a unit which share the same name. For backwards compatibility, some number/file name extensions are treated specially: 1. `.hs` files compile to `.hi` (implicitly numbered 0) 2. `.hs-boot` files compile to `.hi-boot` (implicitly numbered 1) 3. `.hsig` files compile to `.hi-boot` (implicitly numbered 1) 4. `.n.hsig` files compile to `.n.hi-boot` (numbered n, where n is greater than 1) **Flex point:** We could give `.hsig` files their own file extension for interface files; just would require some more work to distinguish between `hs-boot` and `hsig` as well as record the numbering. To import, the `{-# SOURCE n #-}` pragma can be used (with `{-# SOURCE #-}` being equivalent `{-# SOURCE 1 #-}`.) Inline Backpack files can omit numbering, since we can figure it out based on the ordering of declarations (numbering in REVERSE order of occurrence). Example 2 can be numbered as follows: {{{ signature {-# SOURCE 2 #-} A signature {-# SOURCE 1 #-} B where import {-# SOURCE 2 #-} A ... signature {-# SOURCE 1 #-} A where import {-# SOURCE 1 #-} B ... }}} === Internal consequences === In many places in the code today, we record a boolean indicating if we depended on the boot interface `hi-boot` or the normal interface `hi`. We now replace this marker with an integer which records the numbering. The primary affected components are dependency recording in interfaces, interface loading code in GHC, and the implementation of `--make`. === Interaction with signature merging === Unlike `hs-boot` files, `hsig` files can be included from external units, in which case the semantics are that all signatures in scope are merged together. The key rule is that we **generate an hi file for each partial merge**; this means that whenever we want to typecheck a module, there is exactly one interface file per module we import. Consider this example: {{{ unit a-sig where signature A unit a-sig2 where signature A unit p where include a-sig module B include a-sig2 module C signature A module D }}} When compiling this, we generate four interface files for `A`: {{{ unit p where include a-sig -- Produces A.3.hi-boot (a-sig) module B -- uses A.3.hi-boot include a-sig2 -- Produces A.2.hi-boot (a-sig + a-sig2) module C -- uses A.2.hi-boot signature A -- Produces A.hi-boot (everything) module D -- uses A.hi-boot -- At the end, A.hi-boot copied to A.hi to be publically visible }}} == Can we do anything simpler? == There are a few barriers to doing something simpler: 1. We can avoid generating extra interface files if we instead merge them on-the-fly when we use them. However, this forces later instances of GHC to do repeated work remerging interface files, so it seems desirable from a performance perspective to merge before writing. Another scheme is that we could merge on use for signatures in the home package, and then write out a unified file at the very end, trading off performance for less written interface files. 2. The Backpack language is defined in a way that allows modules, signatures and includes to be ordered in a semantically meaningful way. For example: {{{ unit q where signature M signature A where f :: Int -> Int ... unit p where signature A where data T module M where import A -- should get T but not f ... include q -- fill in M module S where import A -- should get T and f }}} This means that even within a unit, the interface of a signature file may differ. We could rule this out, but we would have to work out how to explain this limitation to users. (For example, we could solve the example above by saying that units which define modules do not bring their signatures into scope for a package which imports them; but this is a pretty ad hoc rule! And you still have to deal with repeated signatures, or a signature importing a module importing a signature. There are a lot of cases.) 3. This problem cannot be avoided at all if you are truly doing recursive modules, since you need the intermediate interface file to do compilation at all prior to getting the real implementation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10679 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler