
Haskell-98 specifies that module import cycles work automatically with cross-module type inference. It has some weird interactions with defaulting and the monomorphism restriction. In Haskell-prime we're planning on removing artificial monomorphism, but defaulting will still be necessary (and can still be set differently per module). Only JHC fully implements the recursive module imports of Haskell-98. GHC and NYhc each have their own proprietary "boot-files" with slightly odd semantics to allow this to work (albeit the syntax is simple enough) Hugs doesn't support it at all. I propose we simplify things and lay down some rules, without having to invent explicit module-interface signatures. Then I wouldn't complain(:-)) that GHC doesn't have reasonable support for cyclic modules [1][2]. (Compiler writers will have to give feedback how plausible this is :-) -- I think GHC and NYhc "should" be able to adapt their boot-interface-file mechanisms to the scheme I'm proposing.. (This is really more of a sketch than a complete proposal at this stage.) In particular, I propose an amount of annotation in a module that *shall* make it compile. Compilers are free to accept code for other reasons (e.g. .hs-boot files, or some official module interfaces). These first proposals are clean-ups that reflect how ridiculous people think the current standard's module interface semantics are compared to most languages. Also they make cross-module type inference unnecessary, eliminating the defaulting problem. namespace level: Haskell98 says that what a module exports is determined by the smallest fix-point of what is possible. I can't see a practical use for this behavior, which is easily confusing. I think that exports that depend on the result of a fix-point should be rejected. It can be useful in module A to import a few types/functions explicitly from a module B that then goes on to export the whole of module A though. type level: Inside any given SCC (loop) of modules, any function imported from another member of the SCC normally shall have an explicit type signature in the module that exports it. (This doesn't seem a great burden, since type-signature for top-level functions/values are considered good practice anyway. Can anyone think of a use-case where cross-module type inference would be particularly useful?) Exception: imports may be given the {-# SOURCE #-} pragma. This fulfills two purposes: (1) It is a hint to a compiler that compiles modules separately that the current module should be compiled before the module being imported with {-# SOURCE #-}. Obviously, this can make optimization worse, since it's likely that SOURCE-imported functions won't be strictness-analyzed or inlined or anything; but that's the .hs-boot situation already. (And in principle even a compiler that likes separate compilation could break individual functions down into dependency order to compile them, adding another tradeoff point...) (2) If SOURCE pragmas "break the loop", then only functions that are actually imported with SOURCE must be given type signatures, even if module B then goes on to import module A wholesale: example: module A where {import {-#SOURCE#-} B (bf); ...} module B (module A, module B) where {import A; bf :: ...; ...} Since defining data types in logical places is an important use of cyclic imports, I propose not to require any extra annotation for them; the compiler will have to chase them down and understand them in loops (how else to do it?). However, there are some particular things to keep in mind regarding potential recompilation: (with a bit of a GHC bias) Changing any orphan instances in an SCC will force the whole thing to recompile (but what pluckiness, putting orphan instances *there*!) If a data type or newtype is imported without its constructors, then the RHS changing doesn't really force a recompile. I imagine this could work in GHC by, for each SOURCE import, storing the MD5 of the imported interface. Then when checking if you seriously have to recompile module A, you don't have to if none of those MD5s have changed and none of the non-SOURCE-imported modules' interface MD5s have either. In module cycles that aren't explicitly broken by SOURCEs, GHC (or any compiler) should just insert an implicit SOURCE for *all* cyclic imports (and possibly emit a warning) (unless the compiler wants to guess which SOURCES are better for optimization?). Presumably compilers that can do separate as well as non-separate compilation could take an optimization flag that tells them to compile cycles together as one piece rather than obeying the SOURCES for recompilation efficiency. so what does the compiler have to look at in a SOURCE-imported modules? In the case of the proposed SOURCE imports without hs-boot files, GHC would move from calculating one interface(md5) per module (or two interfaces in the case of .hs-boots), to one-per-import. I think this is, in principle, an advantage, although it does require more re-scanning when files are changed (only lexer/parser/renamer/module-chaser work). For example, I've found myself adding to .hs-boot files for the purpose of one module that SOURCE-imports the .hs-boot, which forces the recompile of another module that happens to depend on the .hs-boot too. To replicate the current GHC .hs-boot behavior (in which the hash-recalculation is shared among SOURCE-importers), one could replace a X.hs-boot file with an X_boot.hs file that contains: module X_boot (module X) where import {-# SOURCE #-} X (list of things exported by the old .hs-boot file) , and in other modules, replace import {-# SOURCE #-} X (....) with import X_boot (....) Taking .hs-boot docs as a guide [2], the compiler must look in SOURCE-imported modules for: - if an import list is given explicitly, `B (....)` not `B hiding (....)` or `B`, the export list only needs to be *checked* to make sure it exports the requested things, not remembered. Exception: data or class imported with `Name(..)` must remember exactly which constructors/members were exported. It's recommended to specify exactly what you're importing. - function type signatures - imports of functions, types, etc. If it's imported from outside the SCC, it doesn't need a type signature/whatever. If it's defined somewhere within the SCC, it generally does need a type signature. - fixity declarations, which only have to be imported in conjunction with the corresponding functions/constructors/whatever - data type / newtype declarations. When no constructor is imported, only the *kind* of the data type needs to be recorded, which might have to involve inference on the RHS (possibly involving more import chasing) if there aren't explicit kind annotations for *every* type parameter. - type synonym declarations. The whole thing has to be imported, including RHS. - classes. Including superclasses, class-method signatures, and default methods? Is there some way that GHC manages to allow not declaring all of these in .hs-boots? - instances, whether generated by 'deriving', 'deriving instance', or ordinary 'instance'; everything before the "where" clause of 'instance's is relevant. But an instance is only relevant if it's orphan, or if goes with a data or class that's also being imported. - the compiler-specific RULES pragmas probably follow similar mandates as above for instances and for the functions referenced in the RULE. [1] my official "complaint": http://hackage.haskell.org/trac/ghc/ticket/1409 [2] the GHC .hs-boot docs: http://www.haskell.org/ghc/docs/latest/html/users_guide/separate-compilation...

On Fri, Aug 15, 2008 at 09:27:16AM -0400, Isaac Dupree wrote:
Haskell-98 specifies that module import cycles work automatically with cross-module type inference.
It has some weird interactions with defaulting and the monomorphism restriction. In Haskell-prime we're planning on removing artificial monomorphism, but defaulting will still be necessary (and can still be set differently per module).
I'm not sure if defaulting actually makes this worse, but regardless, I think we should seriously consider removing defaulting anyway: http://hackage.haskell.org/trac/haskell-prime/wiki/Defaulting#Proposal4-remo... Thanks Ian

Ian Lynagh wrote:
I'm not sure if defaulting actually makes this worse, but regardless, I think we should seriously consider removing defaulting anyway:
http://hackage.haskell.org/trac/haskell-prime/wiki/Defaulting#Proposal4-remo...
Oh, actually, I agree with that proposal to remove defaulting. Maybe we should try implementing that and see how much things break. I imagine most uses can be solved by, if nothing else, adding local functions with more-constrained types, a bit similar to the (^) change. I noticed that depending on the resolution of http://hackage.haskell.org/trac/haskell-prime/wiki/KindInference , we might have a different sort of defaulting that examines exactly a whole module (which could also make it harder for my cyclic-module proposal to avoid recompilation? not sure) If we remove defaulting and the monomorphism restriction *and* don't add any other per-module semantics, then we get the module system out of the way of the semantics, which would make me very happy! There are a few GHC extensions that are still unfortunately per-module -- e.g. OverlappingInstances perhaps ought to be a notation or pragma on a class, rather than affecting all classes that happen to be defined in the module. (Pragmas aren't supposed to have an effect if they're not recognized; but sometimes people put OverlappingInstances on a class not because they're planning to make any such instances, but to allow users to define such instances; in which case the class and stock instances really can compile even in compilers that don't support overlapping instances) -Isaac

Ian Lynagh wrote:
http://hackage.haskell.org/trac/haskell-prime/wiki/Defaulting#Proposal4- removedefaulting
Here's a late response to the comments on that wiki page. It seems, to me, an extremely bad idea to remove defaulting *and* make that proposed change to (^) at the same time. Code can currently depend on defaulting, and if it does, then integers most likely default to the Integer type, not Int. If you just remove defaulting, then that code fails to compile, which is fine. If you remove defaulting and change (^) at the same time, then that code compiles but means something different, which is definitely not fine. It may initially seem like there's no problem, since no one would possibly want to use a number anywhere near 2 billion as an exponent for (^). The problem here is that if one is allowing types to be inferred (which is certainly true, if we're worried about defaulting), then that use of the horribly unsafe Int type can propogate through the code. Do I use this number as an exponent, and then also add it to x somewhere else? Then x is an Int as well. Then maybe I calculate (x * y) somewhere else? Okay, now y is an Int. And perhaps y is added to z? So, z is an Int. But maybe z overflows... and now a nasty bug, a numeric overflow in z, was introduced without changing my code, without a warning, by the change to the type of (^) which is used four functions away. -- Chris

Isaac Dupree wrote:
In the case of the proposed SOURCE imports without hs-boot files, GHC would ...
Ah, another difference from the .hs-boot system: in my proposal, when a file is imported with SOURCE and dependency chasing (e.g. of data-types) is done through its imports, it won't make a difference whether those imports have SOURCE pragmas; the compiler is in SOURCE-mode already, and will look at .hi files if there are any up-to-date ones available (e.g. the imported module isn't in the SCC / import loop), and otherwise will look at the source code (if it wanted, it could make some sort of .hi-boot out of it, I suppose). As opposed to the .hs-boot mechanism where .hs-boot files must choose carefully (and perhaps differently to the corresponding .hs file) whether their imports use SOURCE (they must if it's necessary to prevent loops, but must not if that module doesn't have a .hs-boot file that contains what's needed! But sometimes it doesn't make a difference, except for recompilation!) -Isaac

On Fri, 2008-08-15 at 09:27 -0400, Isaac Dupree wrote:
Haskell-98 specifies that module import cycles work automatically with cross-module type inference.
[...] I'd very much like you to consider in any proposal like this how easy it is to implement module dependency chasing. If the dependency chaser has to know too much about Haskell it makes it very difficult for tools like Cabal or hmake and we could be stuck with only ghc --make or ghc -M. Our plan with Cabal is to do dependency chasing which would enable incremental and parallel rebuilds. I'm not saying it's a problem with your proposal, I'd just like it to be taken into account. For example do dependency chasers need to grok just import lines and {-# SOURCE -#} pragmas or do they need to calculate fixpoints. Duncan

Duncan Coutts wrote:
[...]
I'm not saying it's a problem with your proposal, I'd just like it to be taken into account. For example do dependency chasers need to grok just import lines and {-# SOURCE -#} pragmas or do they need to calculate fixpoints.
Good point. What does the dependency chaser need to figure out? - exactly what dependency order files must be compiled (e.g., ghc -c) ? - what files (e.g., .hi) are needed to be findable by the e.g. (ghc -c) ? - recompilation avoidance? -Isaac

On Sat, 2008-08-16 at 13:51 -0400, Isaac Dupree wrote:
Duncan Coutts wrote:
[...]
I'm not saying it's a problem with your proposal, I'd just like it to be taken into account. For example do dependency chasers need to grok just import lines and {-# SOURCE -#} pragmas or do they need to calculate fixpoints.
Good point. What does the dependency chaser need to figure out? - exactly what dependency order files must be compiled (e.g., ghc -c) ? - what files (e.g., .hi) are needed to be findable by the e.g. (ghc -c) ? - recompilation avoidance?
It needs to work out which files the compiler will read when it compiles that module. So currently, I think we just have to read a single .hs file and discover what modules it imports. We then can map those to .hi or .hs-boot files in one of various search dirs or packages. We also need to look at {#- SOURCE #-} import pragmas since that means we look for a different file to ordinary imports. Calculating dependency order and recompilation avoidance are things the dep program has to do itself anyway. The basics is just working out what things compiling a .hs file depends on. Obviously it's somewhat dependent on the Haskell implementation. Duncan

Isaac Dupree wrote:
Duncan Coutts wrote:
[...]
I'm not saying it's a problem with your proposal, I'd just like it to be taken into account. For example do dependency chasers need to grok just import lines and {-# SOURCE -#} pragmas or do they need to calculate fixpoints.
Actually, good point, Duncan, that got me thinking about what we need in order to obviously not to lose much/any of the .hs-boot efficiency. (warning: another long post ahead, although the latter half of it is just an example from GHC's source) [and I re-read my post and wasn't sure about a few things, but maybe better to get feedback first -- please tell me if I'm being too verbose somewhere, too] Let's look at the total imports of a .hs and its .hs-boot, as they currently are for GHC. Either can be non-SOURCE imports (let's call them NOSOURCE), SOURCE imports, or not importing that. .hs:NOSOURCE, .hs-boot:NOSOURCE : okay .hs:NOSOURCE, .hs-boot:SOURCE : okay .hs:NOSOURCE, .hs-boot:not-imported : okay .hs:SOURCE, .hs-boot:NOSOURCE : bad, if the .hs needs SOURCE, then probably so does the .hs-boot .hs:SOURCE, .hs-boot:SOURCE : okay .hs:SOURCE, .hs-boot:not-imported : okay - the .hs-boot importing a module that the .hs doesn't is invalid, or at least useless [actually, see later example -- there may be reasons for this, but in that case, it doesn't hurt to also import the module in the .hs (assuming there's no syntactic/maintenance burden), and it provides better automatic error-checking to do so] Given the limited amount of information a .hs-boot file (or SOURCE-imported file, in my scheme) needs for being a boot-file, there is no advantage to import the modules it depends on as NOSOURCE. The compiler just has to be clever enough to ignore imports of functions that it can't find out the type of. Also, currently using SOURCE requires the imported module to have a .hs-boot. But it should work fine to look for a .hi and use that in the absence of .hi-boot, because it has strictly a superset of the information (so that my statement that "SOURCE is superior to NOSOURCE when it works" can be truer, for the sake of demonstration). [oops! I was wrong, it may need to NOSOURCE-import on occasion to find out a function's type - more on that in a later post?] Now, since the .hs-boot SOURCE vs NOSOURCE has been collapsed, I think we can move mostly-all .hs-boot info into the .hs file. If the .hs-boot file had imported something, the corresponding import in the .hs is imported with {-#SOURCE_FOLLOW#-} (in addition to {-#SOURCE#-} or {-#NOSOURCE#-}); otherwise it's imported with {-#SOURCE_NOFOLLOW#-} (ditto). For demonstration, I'll assume that all imports are annotated this way, with two bits of information. Presumably all imports that aren't part of an import loop are NOSOURCE (which includes all cross-package imports). Now let's look at the dependency chaser. NOSOURCE imports must not form a loop. They form dependency chains as normal. SOURCE imports depend on either a .hi or a .hi-boot for the imported module. When a X.hi-boot is demanded: only SOURCE_FOLLOW imports are dependency-chased from X.hs, through any .hs modules that don't already have a .hi or .hi-boot. In the case where .hs-boots worked, this *can* avoid cycles. If this SOURCE_FOLLOW dependency DAG doesn't have any cycles, then it should be as simple as calling (the fictional) `ghc -source X.hs` to produce X.hi. If there are cycles, and it is sometimes necessary*, GHC needs to be slightly smarter and be able to produce all the .hi-boot files at once from any graph SCCs (loops) that prevent it from being a DAG (e.g., `ghc -source X.hs Y.hs` to produce X.hi-boot and Y.hi-boot). Note that it doesn't need to be particularly smart here -- e.g., no type inference is done. *necessary loops: example 1, the data/declarations literally loop: module X1 where { import Y1(Y); data X a = End a | Both Y Y; } module Y1 where { import X1(X); data Y = Only (X (Maybe Y)); } (or kind annotations could be required for these loops in general, e.g. data X (a :: *) = ...) [hmm, in this case actually all we need is the data left-hand-side, so we could do this in two stages. But that wouldn't work out so well if their RHSs contained {-#UNPACK#-}!SomeNewtypeForInt where SomeNewtypeForInt was from the other module. But that's an optimization that it might be okay not to do, as long as it was consistently not done both for .hi-boot and .hi/.o; and it could perhaps be doable] example 2, there are just too many back-and-forths: module X2 where { import Y2(Yb); data Xa = Xa; data Xc = Xc Yb; } module Y2 where { import X2(Xa,Xc); data Yb = Yb Xa; data Yd = Yd Xc; } This second one "could" also be accomplished if multiple different .hs-boots were allowed per .hs, although it doesn't seem worth the annotation!! such as using SOURCE_FOLLOW[0] or [1], [2]... I'm not even going to try to write that! [oh wait, SOURCE[0->1] = SOURCE, SOURCE[1->1] = SOURCE_FOLLOW, SOURCE[1->null] = SOURCE_NOFOLLOW, maybe something can be done like that, more complicated in one way but perhaps a bit sounder in another] Now, SOURCE_NOFOLLOW is a bit of a hack, for a couple reasons: - instances (especially orphans, and especially overlapping instances) may not always be imported when they should be. - There may be some information that could be SOURCE-imported from the module if all its imports were SOURCE_FOLLOW, but not enough information was imported that way due solely to SOURCE_NOFOLLOW. That's probably okay though; after all, the presence/absence of explicit type signatures should have the same effect. Any information that the shallow -hi-boot-making search can't figure out, just doesn't go into the .hi-boot (possibly leading to erroring later... perhaps the .hi-boot could store info saying which information existed but it couldn't figure out, to enhance error messages if that info is ever demanded.) Obviously, Template Haskell can only be run if NOSOURCE-imported from another module. The stupid dependency chaser (the most complicated thing it does besides parsing import statements is computing graph SCCs) will, of course, still find a few more changed dependencies than really need to be recompiled; as always, this is where GHC's fancier recompilation checking will come into effect. Some compilers might benefit from (require?) explicit import or export lists in some places... also I wonder if perhaps items in export lists should be markable as whether they're exported to SOURCE-importers (although it doesn't seem necessary) Obviously, annotating every import with both [NO]SOURCE and SOURCE_[NO]FOLLOW is unreasonable! So let's look at inferring them. Any import that's not explicitly annotated with [NO]SOURCE can default to NOSOURCE if it's not part of an import-loop, or SOURCE if it is. NOSOURCE is allowed as a pragma here as well as SOURCE. That is, the dependency chaser assumes NOSOURCE, and if it finds a loop of imports that aren't explicitly SOURCE imports, it converts all that aren't *explicitly* NOSOURCE into SOURCE imports (if they're all explicitly NOSOURCE, it's an error). Since SOURCE_NOFOLLOW can easily break things, it really shouldn't be the default. (And there's never any need of it for imports of modules that aren't part of the current module cycle). However, we really don't want to have to specify it on all imports within the loop -- .hs-boots manage to only specify for modules that *are* needed. I suggest that SOURCE_[NO]FOLLOW be allowed as a top-level pragma that says all (following?) imports are annotated with that it they're not explicitly annotated SOURCE_[NO]FOLLOW. For example, let's take some random file from GHC's source that has a [l]hs-boot file: compiler/deSugar/DsExpr current lhs-boot: \begin{code} module DsExpr where import HsSyn ( HsExpr, LHsExpr, HsLocalBinds ) import Var ( Id ) import DsMonad ( DsM ) import CoreSyn ( CoreExpr ) dsExpr :: HsExpr Id -> DsM CoreExpr dsLExpr :: LHsExpr Id -> DsM CoreExpr dsLocalBinds :: HsLocalBinds Id -> CoreExpr -> DsM CoreExpr \end{code} current lhs: lots of imports, it will become obvious proposed new lhs, just like old lhs but with a few pragmas inserted: \begin{code} -- ... module DsExpr ( dsExpr, dsLExpr, dsLocalBinds, dsValBinds, dsLit ) where {-# SOURCE_NOFOLLOW #-} #include "HsVersions.h" import Match import MatchLit import DsBinds import DsGRHSs import DsListComp import DsUtils import DsArrows import {-# SOURCE_FOLLOW #-} DsMonad import Name #ifdef GHCI import PrelNames -- Template Haskell stuff iff bootstrapped import DsMeta #endif import {-# SOURCE_FOLLOW #-} HsSyn import TcHsSyn -- NB: The desugarer, which straddles the source and Core worlds, sometimes -- needs to see source types import TcType import Type import {-# SOURCE_FOLLOW #-} CoreSyn import CoreUtils import DynFlags import CostCentre -- hmm, actually Var was not imported by the lhs, -- only Id (which imports Var) ! It looks okay to -- just annotate the Id import here: import {-# SOURCE_FOLLOW #-} Id -- Are there times where this would ever -- be a terrible problem? Well, we could have -- added a line --import {-# SOURCE_FOLLOW #-} Var ( Id ) -- instead, which would not hurt much. -- (if Var.Id were a different type than Id.Id, -- compiling this DsExpr module would give a -- simple ambiguity error, no risk of -- hs vs. hs-boot inconsistency) import PrelInfo import DataCon import TysWiredIn import BasicTypes import PrelNames import SrcLoc import Util import Bag import Outputable import FastString \end{code} ... -Isaac

A very good paper on what it actually "means" to have recursive modules is presented in this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.8816 in jhc, it is implemented in the module FrontEnd.Exports. I think the main rule that should be followed is that the name resolutions generated with the hs-boot files should _never_ conflict with the name resolution that would happen were the compiler to support full mutually recursive modules as described in the above paper. That way tools that need them can use the hs-boot files and tools that don't can ignore them and still be guarenteed to get the same results. This doesn't necessarily mean that implementations need to support full recursive modules, just that where they do, they don't conflict with a full implementation. In terms of dependency chasing, why not have the tools do it? ghc has the '-M' option, though, it its current form it isn't very convinient to use (I always have to postprocess its output), it shouldn't be too tough to beef it up a little. though, I would love it if haddock performed full recursive inter-module name resolution. If anyone wants to use jhc's code to achieve these goals, I will hapilly relicense any parts wanted under the MIT/2 clause bsd or ghc license upon asking. John -- John Meacham - ⑆repetae.net⑆john⑈

On Tue, Aug 26, 2008 at 04:31:33PM -0700, John Meacham wrote:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.8816
Doh! wrong paper. http://portal.acm.org/citation.cfm?id=581690.581692 anyone have a free link? John -- John Meacham - ⑆repetae.net⑆john⑈

Hi,
a free copy is available at:
http://www.purely-functional.net/yav/publications/modules98.pdf
(the source code, is also available at the same site).
Hope that this helps,
-Iavor
On Tue, Aug 26, 2008 at 4:33 PM, John Meacham
On Tue, Aug 26, 2008 at 04:31:33PM -0700, John Meacham wrote:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.8816
Doh! wrong paper.
http://portal.acm.org/citation.cfm?id=581690.581692
anyone have a free link?
John
-- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime

On Mon, Sep 01, 2008 at 10:16:50PM -0700, Iavor Diatchki wrote:
a free copy is available at: http://www.purely-functional.net/yav/publications/modules98.pdf (the source code, is also available at the same site). Hope that this helps,
Thanks. I liked this paper and hope we can come up with a similar formal treatment of the module system for haskell' in the specification itself. describing the result of import/export statements as the minimal fixpoint of a set of equations is delightfully concise and straightforward. -- John Meacham - ⑆repetae.net⑆john⑈
participants (6)
-
Chris Smith
-
Duncan Coutts
-
Ian Lynagh
-
Iavor Diatchki
-
Isaac Dupree
-
John Meacham