The base library and GHC 6.10

Hi all, We're trying to decide what to do with the base library for GHC 6.10, in terms of how much of it should be broken up into separate packages. Since the recent proposal about this, we may be rethinking what we want to do, and we would welcome your opinions. First, the motivation for splitting base up: It becomes possible to separately upgrade the parts, and makes it easier for different people to maintain different parts. It makes it easier to see what the hierarchy is, and to restructure the hierarchy, and work towards more of the code being shared between different Haskell implementations. Plus it means that people can't re-tangle the logically separate components, which is all too easy to do when you just have one huge package. It also means that packages are clearer about what they depend on. One possibility, which would be really cool, is to separate all the IO modules from the non-IO modules; between that and looking at the extensions used (e.g. TH and FFI) it would then be clear whether or not a library could do any IO. Of course, the Prelude is a hurdle for this goal. Also, GHC's current plan for the base library: http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Planforlibraries essentially means forking base (as nhc98 would continue to use base in a darcs repo, while GHC would use it from a git repo, and there are no plans for any merging between these repos). Therefore any code that is to be shared between the implementations needs to not be in base, so from that point of view it would be good to pull out as much as possible. The main argument /against/ splitting base up is that at some point the dependencies of packages need to be updated to reflect the changes. However, GHC 6.10 will come with a base version 3, as well as the new base version 4, so the transition should be much smoother than the base 2 -> base 3 transition. Now, on with the proposed splitting. In the below, LoC stands for "Lines of Code". First the easy bit: The Data.Generics hierarchy is going to have a separate maintainer, and I think that everyone is agreed that it should be pulled out into an "syb package". I'll treat this as not part of base from here on. The only thing still being debated here is whether the Data class itself should remain in base or not. Some people believe that it should remain in base, as it is desirable to have Data instances for as many types as possible, and because there is a resistance among library writers against adding dependencies. The counter argument is that there are many other classes that the same is true of (e.g. uniplate, syb-with-class, binary), and it does not scale to put all of these classes into base. Also, by requiring a dep to be added even for the classes that have historically been included in base, adding dependencies for the sake of providing instances may become more socially acceptable. Now, on with the splitting. We have System.Console.GetOpt (129 LoC, 1 module) This doesn't really fit in with anything else in base, so the proposal is to split it off into its own getopt package. I don't think there is much objection to this one. Next we have the Control.Monad.ST Data.STRef (120 LoC, 6 modules) hierarchies. The proposal is to put these into an st package. The low-level implementation is still in base (69 LoC of in the GHC.ST and GHC.STRef), so to some extent this is a false separation. On the other hand, nhc98 doesn't support ST, so splitting this package off gets us closer to all implementations exposing the same modules from base. Then we have Control.Concurrent (490 LoC, 6 modules) hierarchy, along with System.Timeout (39 LoC) Data.Unique (32 LoC) (those modules depend on Control.Concurrent.*). The proposal is to put these into concurrent, timout and unique packages respectively. Again, this is a false separation, with 698 LoC left behind in GHC.Conc; at some time we'd hope that this could either be moved down to ghc-prim, or make a new ghc-concurrent package for it, depending on how the dependencies work out. Again, nhc doesn't support concurrent or its dependencies, so this gets us closer to a consistent base interface. Splitting off the above 5 packages would leave 106 modules and 16621 LoC in base. About 5% of the LoC, and 12.5% of the modules, would be in the new packages. Thanks Ian

Hello,
On Thu, Aug 28, 2008 at 13:12, Ian Lynagh
First the easy bit: The Data.Generics hierarchy is going to have a separate maintainer, and I think that everyone is agreed that it should be pulled out into an "syb package". I'll treat this as not part of base from here on.
The only thing still being debated here is whether the Data class itself should remain in base or not. Some people believe that it should remain in base, as it is desirable to have Data instances for as many types as possible, and because there is a resistance among library writers against adding dependencies. The counter argument is that there are many other classes that the same is true of (e.g. uniplate, syb-with-class, binary), and it does not scale to put all of these classes into base. Also, by requiring a dep to be added even for the classes that have historically been included in base, adding dependencies for the sake of providing instances may become more socially acceptable.
Is there a way not to have the Data class in base while still preserving the deriving mechanism? I think that one big reason for the popularity of SYB is not only the fact that it comes with GHC but also that you get support for generics on user-defined datatypes for "free". So if there is no way to have derivable Data with Data outside base, then I think Data should stay in base. Pedro

| > Is there a way not to have the Data class in base while still preserving the | > deriving mechanism? | | Yes, you can still have "deriving Data" if the class is in the syb | package. If you were to change the methods in Data, the deriving stuff would have to change too. That is true but I agree with Neil: Data and Typeable are the basic foundation on which we may build a variety of reflection/introspection libraries, SYB among them.

[Darn: somehow sent too early] | | > Is there a way not to have the Data class in base while still preserving the | | > deriving mechanism? | | | | Yes, you can still have "deriving Data" if the class is in the syb | | package. True, but if you were to change the methods in Data, the deriving stuff would have to change too. So putting Data in SYB would make it appear more separate than it truthfully is. Overall, I agree with Neil: Data and Typeable are the basic foundation on which we may build a variety of reflection/introspection libraries, SYB among them. Let's leave 'em in base. Simon

Hi
The only thing still being debated here is whether the Data class itself should remain in base or not. Some people believe that it should remain in base, as it is desirable to have Data instances for as many types as possible, and because there is a resistance among library writers against adding dependencies. The counter argument is that there are many other classes that the same is true of (e.g. uniplate, syb-with-class, binary), and it does not scale to put all of these classes into base.
My opinion is that Data should remain in base. Data is much lower than other classes, and provides reflection and examination of Haskell values at runtime. You can layer uniplate and binary on top of Data, and the Derive tool's next release will layer an additional 20 classes on top of Data. In some ways Data is more primitive, and more powerful, than other classes. The rest of SYB is a traversal mechansim, which is much more of a library concern, so deserves to be split off.
Also, by requiring a dep to be added even for the classes that have historically been included in base, adding dependencies for the sake of providing instances may become more socially acceptable.
Keeping dependencies short is good, and it will be hard to persuade most library authors (including me) otherwise.
Splitting off the above 5 packages would leave 106 modules and 16621 LoC in base. About 5% of the LoC, and 12.5% of the modules, would be in the new packages.
The goal of exposing a consistent base seems a sensible one, so all those changes look good. Thanks Neil

On Thu, Aug 28, 2008 at 12:28:18PM +0100, Neil Mitchell wrote:
The only thing still being debated here is whether the Data class itself should remain in base or not. Some people believe that it should remain in base, as it is desirable to have Data instances for as many types as possible, and because there is a resistance among library writers against adding dependencies. The counter argument is that there are many other classes that the same is true of (e.g. uniplate, syb-with-class, binary), and it does not scale to put all of these classes into base.
My opinion is that Data should remain in base.
OK; so I guess that means the whole Data.Generics.Basics module should stay in base. Thanks Ian

On Sat, Aug 30, 2008 at 01:01:34PM +0100, Ian Lynagh wrote:
On Thu, Aug 28, 2008 at 12:28:18PM +0100, Neil Mitchell wrote:
My opinion is that Data should remain in base.
OK; so I guess that means the whole Data.Generics.Basics module should stay in base.
Should Data.Generics.Instances (an orphanage) be folded into Data.Generics.Basics?

On Sun, Aug 31, 2008 at 01:10:56PM +0100, Ross Paterson wrote:
On Sat, Aug 30, 2008 at 01:01:34PM +0100, Ian Lynagh wrote:
On Thu, Aug 28, 2008 at 12:28:18PM +0100, Neil Mitchell wrote:
My opinion is that Data should remain in base.
OK; so I guess that means the whole Data.Generics.Basics module should stay in base.
Should Data.Generics.Instances (an orphanage) be folded into Data.Generics.Basics?
Sounds good to me. Thanks Ian

On Sun, Aug 31, 2008 at 01:16:33PM +0100, Ian Lynagh wrote:
On Sun, Aug 31, 2008 at 01:10:56PM +0100, Ross Paterson wrote:
On Sat, Aug 30, 2008 at 01:01:34PM +0100, Ian Lynagh wrote:
OK; so I guess that means the whole Data.Generics.Basics module should stay in base.
Should Data.Generics.Instances (an orphanage) be folded into Data.Generics.Basics?
Sounds good to me.
The name Data.Generics.Basics identifies it as the basic part of the generics library. If it's to be presented as a general class, perhaps the module should be renamed (with re-exports under the old names in syb). Data.Data?

On Sun, Aug 31, 2008 at 01:25:53PM +0100, Ross Paterson wrote:
On Sun, Aug 31, 2008 at 01:16:33PM +0100, Ian Lynagh wrote:
On Sun, Aug 31, 2008 at 01:10:56PM +0100, Ross Paterson wrote:
On Sat, Aug 30, 2008 at 01:01:34PM +0100, Ian Lynagh wrote:
OK; so I guess that means the whole Data.Generics.Basics module should stay in base.
Should Data.Generics.Instances (an orphanage) be folded into Data.Generics.Basics?
Sounds good to me.
The name Data.Generics.Basics identifies it as the basic part of the generics library. If it's to be presented as a general class, perhaps the module should be renamed (with re-exports under the old names in syb). Data.Data?
If the old names are in syb then existing libraries need to change either their dependencies or their imports. We could put the old names in base, but deprecate them? Thanks Ian

Hello Ian, Sunday, August 31, 2008, 6:13:50 PM, you wrote:
The name Data.Generics.Basics identifies it as the basic part of the generics library. If it's to be presented as a general class, perhaps the module should be renamed (with re-exports under the old names in syb). Data.Data?
If the old names are in syb then existing libraries need to change either their dependencies or their imports.
We could put the old names in base, but deprecate them?
i think that correct solution for all such cases is to provide compatibility reexport only in old base library. programs written with new base in mind should also accommodate changes in module names -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

"Ian Lynagh"
On Sun, Aug 31, 2008 at 01:10:56PM +0100, Ross Paterson wrote:
On Sat, Aug 30, 2008 at 01:01:34PM +0100, Ian Lynagh wrote:
On Thu, Aug 28, 2008 at 12:28:18PM +0100, Neil Mitchell wrote:
My opinion is that Data should remain in base.
OK; so I guess that means the whole Data.Generics.Basics module should stay in base.
Should Data.Generics.Instances (an orphanage) be folded into Data.Generics.Basics?
Sounds good to me.
I just stumbled into this thread by accident. No, the instances should not be folded into Basics. For one, Basics seems likely to remain in base, and half of the instances are under dispute (see previous SYB threads that were copied here). Splitting the instances into standard and dubious prior to deprecating the latter was one of the motivations for moving SYB out of base (apart from Basics and deriving) and for seeking a maintainer for syb. Since José Pedro Magalhães has offered to take on ownership of the syb package, it would be appropriate to cc him on any discussions related to this, so that he is aware of all developments and possible conflicts. Claus

On Sun, Aug 31, 2008 at 07:11:11PM +0100, Claus Reinke wrote:
"Ian Lynagh"
wrote in message news:20080831121633.GA17919@matrix.chaos.earth.li... On Sun, Aug 31, 2008 at 01:10:56PM +0100, Ross Paterson wrote:
On Sat, Aug 30, 2008 at 01:01:34PM +0100, Ian Lynagh wrote:
OK; so I guess that means the whole Data.Generics.Basics module should stay in base.
Should Data.Generics.Instances (an orphanage) be folded into Data.Generics.Basics?
Sounds good to me.
I just stumbled into this thread by accident. No, the instances should not be folded into Basics. For one, Basics seems likely to remain in base, and half of the instances are under dispute (see previous SYB threads that were copied here). Splitting the instances into standard and dubious prior to deprecating the latter was one of the motivations for moving SYB out of base (apart from Basics and deriving) and for seeking a maintainer for syb.
Since José Pedro Magalhães has offered to take on ownership of the syb package, it would be appropriate to cc him on any discussions related to this, so that he is aware of all developments and possible conflicts.
Hmm. Of course it's not possible to deprecate instances, and there's only GHC bug #2356 to protect against instance clashes. It does seem a bit contradictory to argue that a class is so basic that it belongs in the core, but its instances for core types are unclear. Well at least the instances for [], tuples, Maybe, Either, Array and type constants could go with the Data class, I presume. Presumably Complex could be given definitions of gfoldl and gmapT, and Ratio a definition of gfoldl, though perhaps not gmapT.

Leaving Data.Generics.Basics in base while moving Data.Generics.Instances to syb raises the interesting issue of dealing with the accidental re-exports of Data.Generics.Instances from various places. Here is that list again (*): $ find . -name '*hs' | grep -v _darcs | xargs grep -l 'Data.Generics' | grep -v Generics ./array/Data/Array.hs ./base/Data/Typeable.hs ./bytestring/Data/ByteString/Internal.hs ./bytestring/Data/ByteString/Lazy/Internal.hs ./bytestring/Data/ByteString/Unsafe.hs ./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./haskell-src/Language/Haskell/Syntax.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs ./template-haskell/Language/Haskell/TH/Quote.hs ./template-haskell/Language/Haskell/TH/Syntax.hs And here is a brief scan of what each of these is doing. References to 'standard' vs 'dubious' Data instances are wrt the suggested split in [1], with some possible refinements: - array: the Data instance for Array could be moved into array, avoiding the need for instance imports and syb dependency? - bytestring: uses deriving, which for Internal.hs depends on Data instances for Int [standard] and (ForeignPtr Word8) [dubious]; would need to depend on syb; and import both standard and dubious instances :-( perhaps Data instances for type constructors with phantom types should be re-classified into Standard, given that there are no data objects to be traversed? - containers: IntMap.hs, IntSet.hs, Map.hs, Sequencs.hs, Set.hs, Tree.hs define their own Data instances, or derive them in such a way that they do not need to import any instances :-) - haskell-src: uses deriving, will need to depend on syb; depends almost exclusively on standard instances (the only exception I can see in a quick scan is Rational); perhaps this is an argument in favour of moving the Data instance for 'Ratio a' from Dubious to Standard: the parameter type is never meant to be traversed, and tainting every client of 'Ratio a' with the really bad instances is not a good idea. Opinions? - network: uses deriving, will need to depend on syb; depends only on standard instances - packedstring: defines its own instances, no need to import any - template-haskell: uses deriving, roughly the same situation as for haskell-src? Claus [1] see the last page of http://www.haskell.org/pipermail/libraries/2008-July/010313.html [2] http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/#syb-utils

Hello,
This issue has been discussed before [1] and I got the impression all
instances were going to be in the syb package and not in base. I think it's
preferable to deal with the dubious instances issue [2] once SYB has been
put into its own package, but maybe the "standard" instances could stay in
base.
Regarding the naming issue: I would very much be in favor of entirely
renaming SYB to Generics.SYB altogether. I think the current name doesn't
make much sense (why Data? it doesn't really define a datastructure), and
simply calling it Generics is too broad, given that this is only one of the
libraries for generic programming around. This would also fit nicely with
other (upcoming) libraries for generic programming.
Thanks,
Pedro
[1] http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9738
[2] http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9672
On Mon, Sep 1, 2008 at 00:02, Claus Reinke
Leaving Data.Generics.Basics in base while moving Data.Generics.Instances to syb raises the interesting issue of dealing with the accidental re-exports of Data.Generics.Instances from various places. Here is that list again (*):
$ find . -name '*hs' | grep -v _darcs | xargs grep -l 'Data.Generics' | grep -v Generics ./array/Data/Array.hs ./base/Data/Typeable.hs ./bytestring/Data/ByteString/Internal.hs ./bytestring/Data/ByteString/Lazy/Internal.hs ./bytestring/Data/ByteString/Unsafe.hs ./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./haskell-src/Language/Haskell/Syntax.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs ./template-haskell/Language/Haskell/TH/Quote.hs ./template-haskell/Language/Haskell/TH/Syntax.hs
And here is a brief scan of what each of these is doing. References to 'standard' vs 'dubious' Data instances are wrt the suggested split in [1], with some possible refinements:
- array: the Data instance for Array could be moved into array, avoiding the need for instance imports and syb dependency?
- bytestring: uses deriving, which for Internal.hs depends on Data instances for Int [standard] and (ForeignPtr Word8) [dubious]; would need to depend on syb; and import both standard and dubious instances :-(
perhaps Data instances for type constructors with phantom types should be re-classified into Standard, given that there are no data objects to be traversed?
- containers: IntMap.hs, IntSet.hs, Map.hs, Sequencs.hs, Set.hs, Tree.hs define their own Data instances, or derive them in such a way that they do not need to import any instances :-)
- haskell-src: uses deriving, will need to depend on syb; depends almost exclusively on standard instances (the only exception I can see in a quick scan is Rational);
perhaps this is an argument in favour of moving the Data instance for 'Ratio a' from Dubious to Standard: the parameter type is never meant to be traversed, and tainting every client of 'Ratio a' with the really bad instances is not a good idea. Opinions?
- network: uses deriving, will need to depend on syb; depends only on standard instances
- packedstring: defines its own instances, no need to import any
- template-haskell: uses deriving, roughly the same situation as for haskell-src?
Claus
[1] see the last page of http://www.haskell.org/pipermail/libraries/2008-July/010313.html [2] http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/#syb-utilshttp://www.cs.kent.ac.uk/%7Ecr3/toolbox/haskell/#syb-utils
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Let's think of what we are trying to achieve.
Plan A:
· The class Data is defined in 'base'
· The SYB library builds on it
· Other libraries may build on it, independent of SYB
Plan B:
· The class Data is defined in SYB
· Other libraries that want reflection-like facilities must build on SYB
I think we were proposing to carry out Plan A. If so, then it would make sense for instances of Int, Maybe, tuples, etc to be in 'base' too; otherwise you have to import SYB anyway to get those instances.
Claus says that "half the instances of Data are controversial". Is that really right Claus? Isn't it just functions and IO?
My suggestion:
· Plan A
· Simple, obvious data instances (eg Bool, Maybe, lists) go in base
· A handful of controversial instances go in SYB
This more or less what Ross suggested.
Jose, should we be copying the 'generics' list?
We need a chairperson for this discussion, to drive us to a timely conclusion. Jose: would you be willing to play that role?
Simon
From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of José Pedro Magalhães
Sent: 01 September 2008 07:54
To: libraries@haskell.org; Claus Reinke
Subject: Re: The base library and GHC 6.10
Hello,
This issue has been discussed before [1] and I got the impression all instances were going to be in the syb package and not in base. I think it's preferable to deal with the dubious instances issue [2] once SYB has been put into its own package, but maybe the "standard" instances could stay in base.
Regarding the naming issue: I would very much be in favor of entirely renaming SYB to Generics.SYB altogether. I think the current name doesn't make much sense (why Data? it doesn't really define a datastructure), and simply calling it Generics is too broad, given that this is only one of the libraries for generic programming around. This would also fit nicely with other (upcoming) libraries for generic programming.
Thanks,
Pedro
[1] http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9738
[2] http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9672
On Mon, Sep 1, 2008 at 00:02, Claus Reinke

I thought class 'Data' was in 'Data.Generics.Basics' because it provides generic access to 'data'-definitions. SYB's generic programming library code (strategies for queries and transformations) builds on that, so does (one version of) Uniplate. Most other generic programming libraries are based on generic access to _type_ representations, the basics of which would more accurately appear somewhere in 'Types.Generics' - no conflict with SYB here. One could move the actual generic libraries into 'Generics.*', but until there is an actual need for that, I'd prefer things to stay stable, with libraries building on generic data access in 'Data.Generics' and libraries building on generic type access appearing in 'Types.Generics'. One could rename some of the SYB modules, eg, 'Data.Generics.Schemes' -> 'Data.Generics.SybSchemes' and so forth, but as long as other 'data'-based libraries are not deprived of namespace there, and other 'type'-based libraries either don't provide general traversal schemes or live in 'Types.Generics', there is no immediate need for such renaming, beyong putting the modules in a 'syb' package, is there? (note that 'Data.Typeable' is outside 'Data.Generics', though it is part of the basics that SYB depends on)
Claus says that "half the instances of Data are controversial". Is that really right Claus? Isn't it just functions and IO?
As an ideal, I'd like 'Data.Generics.Instances.Dubious' to be empty - 'Data' instances should either be standard, or not exist at all (at least not in library code). What I did was simply to take anything that looked dubious and move it into a separate module, to facilitate further discussion and more control over imports. The discussion I had hoped for didn't happen, so that is still were my code stands, but I do hope it isn't the final state. As for numbers, I currently have 32 instances in 'Data.Generics.Instances.Standard' and 11 instances in 'Data.Generics.Instances.Dubious' [1]. My initial split was mostly on the basis of 'gfold'/'gmapT' not traversing substructures, so some more of the 'Standard' instances are actually incomplete, and some of the 'Dubious' instances could possibly be declared "safe (with side conditions)", but then someone would still have to look at making the instances more complete/less certain to generate runtime errors. The current 'Dubious' list has things like - 'Ratio a': while values of type 'a' actually exist here, they are not meant to be visible in a concrete way, only via the abstract interface; and the abstract interface can support a 'data'-like view - various 'Ptr a': here the 'a' is a phantom type, there are no objects of type 'a' to be traversed; but neither is there much 'data'-like about these pointers.. - 'b->a', 'IO a', 'ST s a', 'STM a': these are thoroughly un-'data'-like; though the instances could be improved to provide transformation access to the 'a' values, the same doesn't work for queries, and the '(->)b' context is completely out of range for 'Data'. The current 'Standard' list has various instances that just bomb on some operations, including the 'Array a b' instance, which otherwise nicely demonstrates how to handle abstract types. Moving the more stable and standard 'Data' instances into base might not hinder development/debugging of the remaining instances, but right now, I don't think it will include sufficiently many instances to avoid dependencies on syb. As I explained in my previous email, the implicit presence of instances is itself a source of bugs, due to the propagation of instances in Haskell, not to mention ghc bug #2182. Since very few of the current 'Data' instance importers actually need those imports (they just happen to be included if one imports 'Data.Generics'), I'd prefer to remove the implicit imports (and implied re-exports), making the remaining real dependencies explicit by depending on syb (again, see previous email). I'd really like to see the real issues addressed before we start worrying about names, as this has turned out to be a rats nest of bugs, including: - incomplete 'Data' instances (operations that bomb now, but might be given better implementations) - incompleteable 'Data' instances (operations that cannot be implemented, suggesting that these instances shouldn't exist) - 'deriving Data' depending on 'Data' instances for everything, instead of skipping substructure types that cannot be handled anyway (smarter deriving could avoid dumb instances, by annotating types that should not be traversed instead of traversing these types via dummy instances that are then globally available/irreplaceable) - unneccessary 'Data' instance import/export (Data.IntMap has absolutely no business bringing 'instance Data (IO a)' into scope) - ghc sessions retaining instances (#2182), leading to build errors even in separate module hierarchies - ghc listing "orphan instances" as a performance issue, re-emphasized recently by warnings turned into errors, which has led some to believe they are a design fault, rather than a representation of a valid design decision - it doesn't help that Haskell doesn't support instance import/ export control (yes, the instances are unnamed, but naming class, type, and module would seem sufficient to block instance imports/exports where they are not wanted) Claus [1] http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/#syb-utils

| I'd really like to see the real issues addressed before we start | worrying about names, as this has turned out to be a rats nest | of bugs, including: I'm all for addressing the real issues! But there are two time-frames involved a) In the next fortnight we must decide what stays in 'base' and what moves to the new 'syb' package. b) On a longer timescale, the 'syb' package maintainers can improve the library. The whole point of splitting syb out is to decouple (b) from (a). I'm currently focused on (a), because it is urgent, and this is the discussion that Jose is now kindly chairing. (b) remains important, but it is less urgent. Concerning the GHC-related parts of Claus's specific points: | - ghc sessions retaining instances (#2182), leading to build errors | even in separate module hierarchies Yes I agree this is bad, but I don't think I can fix it in time for 6.10; it's a consequence of an ill-made but fairly deeply wired in design choice. | - ghc listing "orphan instances" as a performance issue, | re-emphasized recently by warnings turned into errors, which | has led some to believe they are a design fault, rather than a | representation of a valid design decision They are not turned into errors. They are simply warnings. They only appear if you ask for them. If you use -Werror then they turn into errors, but then you asked for that! (The recent change is that previously they were second-class warnings that did not have this property.) Simon

I'm all for addressing the real issues! But there are two time-frames involved a) In the next fortnight we must decide what stays in 'base' and what moves to the new 'syb' package. b) On a longer timescale, the 'syb' package maintainers can improve the library. The whole point of splitting syb out is to decouple (b) from (a). I'm currently focused on (a), because it is urgent, and this is the discussion that Jose is now kindly chairing. (b) remains important, but it is less urgent.
understood and agreed. We just need to make sure that decoupling 'Data' from the decoupling of 'syb' doesn't ruin the intended decoupling!-) | - ghc sessions retaining instances (#2182), leading to build errors | even in separate module hierarchies
Yes I agree this is bad, but I don't think I can fix it in time for 6.10; it's a consequence of an ill-made but fairly deeply wired in design choice.
Just knowing whether you actually hope to do something about it, or whether it is so deeply ingrained that it can't be fixed, would help. | - ghc listing "orphan instances" as a performance issue, | re-emphasized recently by warnings turned into errors, which | has led some to believe they are a design fault, rather than a | representation of a valid design decision
They are not turned into errors. They are simply warnings. They only appear if you ask for them. If you use -Werror then they turn into errors, but then you asked for that! (The recent change is that previously they were second-class warnings that did not have this property.)
You have made your views quite clear in previous messages. Unfortunately, that doesn't keep others from drawing other conclusions, turning against any instance of orphans;-) Claus

| | - ghc sessions retaining instances (#2182), leading to build errors | | even in separate module hierarchies | | >Yes I agree this is bad, but I don't think I can fix it in time for 6.10; | >it's a consequence of an ill-made but fairly deeply wired in design choice. | | Just knowing whether you actually hope to do something about it, | or whether it is so deeply ingrained that it can't be fixed, would help. Oh yes, it can definitely be fixed, and I fully intend to fix it. It just needs a little uninterrupted thought. Simon

Simon Peyton-Jones wrote:
Plan A: · The class Data is defined in ‘base’ · The SYB library builds on it · Other libraries may build on it, independent of SYB
Plan B: · The class Data is defined in SYB · Other libraries that want reflection-like facilities must build on SYB
What about a new "introspection" package containing Typeable, Dynamic and Data? -- Ashley Yakeley

Claus Reinke wrote:
Leaving Data.Generics.Basics in base while moving Data.Generics.Instances to syb raises the interesting issue of dealing with the accidental re-exports of Data.Generics.Instances from various places. Here is that list again (*):
$ find . -name '*hs' | grep -v _darcs | xargs grep -l 'Data.Generics' | grep -v Generics ./array/Data/Array.hs ./base/Data/Typeable.hs ./bytestring/Data/ByteString/Internal.hs ./bytestring/Data/ByteString/Lazy/Internal.hs ./bytestring/Data/ByteString/Unsafe.hs ./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./haskell-src/Language/Haskell/Syntax.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs ./template-haskell/Language/Haskell/TH/Quote.hs ./template-haskell/Language/Haskell/TH/Syntax.hs
This raises the more general issue of instance-visibility. Since instances are automatically re-exported and therefore break abstraction barriers, the only sensible way to think about instances is as global properties. Attempting to limt the visibility of instances by putting them in separate packages or modules is futile. If an instance exists in a library *anywhere*, it potentially exists *everywhere*, and we should think of it as global. The only way to limit the visibility of instances is to not put them in a package. That means, for the particular case of the Data class, someone should decide once and for all whether there is an instance for IO, or functions, or whatever, and either define them along with the Data class or not at all. Cheers, Simon

On Wed, Sep 03, 2008 at 09:17:16AM +0100, Simon Marlow wrote:
This raises the more general issue of instance-visibility. Since instances are automatically re-exported and therefore break abstraction barriers, the only sensible way to think about instances is as global properties.
Attempting to limt the visibility of instances by putting them in separate packages or modules is futile. If an instance exists in a library *anywhere*, it potentially exists *everywhere*, and we should think of it as global. The only way to limit the visibility of instances is to not put them in a package.
That means, for the particular case of the Data class, someone should decide once and for all whether there is an instance for IO, or functions, or whatever, and either define them along with the Data class or not at all.
I agree with all that, but there may not be time to be confident of making the right decision in all these cases in time for 6.10.

This raises the more general issue of instance-visibility. Since instances are automatically re-exported and therefore break abstraction barriers, the only sensible way to think about instances is as global properties.
I've heard this fatalistic view expressed before, and even for impoverished Haskell 98, it just isn't true. Perhaps it has come about from years of being bitten by either #2182, by attempts to avoid "orphan" instances, by carelessly designed libraries, or by careless instance imports, all of which make combining libraries that provide instances of the same class for the same type a pain? A type class specifies a relation between types. Both the types and the class are named, and if instances are placed in separate modules, the modules are named as well. The combination of module, class and type names gives quite a bit of control over instance import/export, even if it is terribly cumbersome and limited (and easily defeated by just one library importing all instances "for convenience"). Neither the relation (class), nor its domain (types), nor its extent (instances) are "global". Here's an example: module A where class A a where a :: a module B where import A instance A Bool where a = True module C where import A f :: A a => a f = a -- g :: Bool -- g = a In 'C', class 'A' and type 'Bool' are available, but the instance is not, so delaying instance selection ('f') works, while forcing instance selection ('g') gives a "missing instance" error. module D where import A import B import C g :: Bool g = a main = print (f::Bool,g) In 'D', type, class, and instance are available, so 'g' is allowed here. The word you are looking for is perhaps "accumulative": no module provides fewer instances than its imports, so instances accumulate recursively along the import hierarchy, no matter what the import/ export specifications say. In Haskell 98, that means conflicting instances ought to raise an error in 'Main' at the latest, but GHC deviates from that on purpose (#2356), which makes the life of projects using multiple libraries a little easier. I have often wondered why Haskell doesn't provide control over instance import/export, even without going all the way to naming instances: module B(instance A Bool) or import B hiding (instance A Bool) should go quite a way to improving instance scope control. Is it just that people want more (named instances), or is there any real problem with this simple approach? I'd prefer explicit instance scope control import X hiding (instance C a) -- no 'C' instances from package 'x' import Y (instance C a) -- all 'C' instances from package 'y' over the current "you can have conflicting instances, as long as you don't use them in conflicting ways" (#2356).
That means, for the particular case of the Data class, someone should decide once and for all whether there is an instance for IO, or functions, or whatever, and either define them along with the Data class or not at all.
Ultimately, yes. And there have been several suggestions for improving the instances, or limiting the class to what is implementable, or avoiding the need for dummy instances to support deriving, etc. But even in the final analysis, there may be no better option than to provide dummy instances via explicit imports for those applications that want them. I mean I could decide that I'm in a "don't permit anything unsafe" mood and try to force this instance into any importer of 'Data' instance Fail a => Data (IO a) (where 'Fail' is a class that cannot have instances, as guaranteed by not exporting the class). Then I'd be happy until the day I actually need an instance, any instance of 'Data (IO a)' (or someone else does;-). The trick is to split the imperfect code we have in such a way that it does not prevent users from taking their pick, and making their own choices, suitable for their projects. Anyway, there is no hope of deciding the fate of these instances in time for the release candidate, and the decision to be made now is how to support a realistic amount of improvement after the release, by making suitable _preparations_ before the release. Claus

(sorry for the delay in replying to this...) Claus Reinke wrote:
This raises the more general issue of instance-visibility. Since instances are automatically re-exported and therefore break abstraction barriers, the only sensible way to think about instances is as global properties.
I've heard this fatalistic view expressed before, and even for impoverished Haskell 98, it just isn't true. Perhaps it has come about from years of being bitten by either #2182, by attempts to avoid "orphan" instances, by carelessly designed libraries, or by careless instance imports, all of which make combining libraries that provide instances of the same class for the same type a pain?
No, it has nothing to do with bugs or misfeatures in GHC, it's a fact of Haskell 98.
A type class specifies a relation between types. Both the types and the class are named, and if instances are placed in separate modules, the modules are named as well. The combination of module, class and type names gives quite a bit of control over instance import/export, even if it is terribly cumbersome and limited (and easily defeated by just one library importing all instances "for convenience"). Neither the relation (class), nor its domain (types), nor its extent (instances) are "global".
The point is that instances are unconditionally re-exported, which exposes knowledge about the import structure underneath a module. If we consider instances to be part of the API of a module, then the API of a module is changed simply by changing what is imported. This is clarly an abstraction failure, because we want a module to be able to control its API independently of its implementation. This is even worse at the level of packages. We can hide modules that are used internally to a package's implementation, but we can't hide the fact that a package used some non-standard instances internally, and furthermore we can't change this aspect of its implementation without changing the API. So thinking of instances as part of the API of a module is wrong, because it leads to the aforementioned abstraction failures. The only sensible way to think of instances is as global properties - one instance per class/type pair. Orphan instances are usually wrong unless the orphans are also exported via the standard API for either the class or the type. That is, orphans are ok in the implementation of a package, but not in the exposed API, because that makes it possible for a client to import both the class and type without getting the instance, which is what we have to avoid. Cheers, Simon

On Wed, Sep 24, 2008 at 03:59:12PM +0100, Simon Marlow wrote:
Claus Reinke wrote:
This raises the more general issue of instance-visibility. Since instances are automatically re-exported and therefore break abstraction barriers, the only sensible way to think about instances is as global properties.
I've heard this fatalistic view expressed before, and even for impoverished Haskell 98, it just isn't true. Perhaps it has come about from years of being bitten by either #2182, by attempts to avoid "orphan" instances, by carelessly designed libraries, or by careless instance imports, all of which make combining libraries that provide instances of the same class for the same type a pain?
No, it has nothing to do with bugs or misfeatures in GHC, it's a fact of Haskell 98.
Except for #2356, which some people think is a feature. (Not me, though)

(sorry for the delay in replying to this...)
No problem, though you seem to be restating your opinion intead of addressing my concrete points? For those who, like me, have lost the thread in the meantime, here is a link to the message you reply to: http://www.haskell.org/pipermail/libraries/2008-September/010623.html
.., the only sensible way to think about instances is as global properties. ..it has nothing to do with bugs or misfeatures in GHC, it's a fact of Haskell 98.
I thought my example demonstrated quite clearly that instances are *not* global in Haskell.
A type class specifies a relation between types. Both the types and the class are named, and if instances are placed in separate modules, the modules are named as well. The combination of module, class and type names gives quite a bit of control over instance import/export, even if it is terribly cumbersome and limited (and easily defeated by just one library importing all instances "for convenience"). Neither the relation (class), nor its domain (types), nor its extent (instances) are "global".
The point is that instances are unconditionally re-exported,
Yes, and I'm not disputing that point. What I am disputing are its consequences/interpretation. As I said, instances accumulate upwards along the import hierarchy. But they do not propagate downwards, so they do not have global scope, and one can exert some control over all of type relation (class), domain (types), and extent (instances). What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
If we consider instances to be part of the API of a module, then the API of a module is changed simply by changing what is imported.
That is true, whether we like it or not. The language does not give the programmer a way to control this part of the API by limiting re-export of locally visible instances. The programmers may not like it, but those instances are visible to importers even if they distribute warning labels stating "these instances are not part of our intended API".
This is even worse at the level of packages. We can hide modules that are used internally to a package's implementation, but we can't hide the fact that a package used some non-standard instances internally, and furthermore we can't change this aspect of its implementation without changing the API.
How do packages make a difference here? As long as I don't import base:Control.Monad.Error, the base:Control.Monad.Instances instances of Functor, say, are not visible in base:Control.Monad.
So thinking of instances as part of the API of a module is wrong, because it leads to the aforementioned abstraction failures. The only sensible way to think of instances is as global properties - one instance per class/type pair.
I'm afraid that is the wrong way round: the abstracton failures occur because of lack of control over the API - trying to wish away the offending instances ("don't think of them as part of the API") won't help. And trying to think of instances as global fails to account for the limited control we do have, by arranging code inside the import hierarchy, as in the example I gave. So this interpretation is unhelpful, demonstrably wrong, and prevents more detailed analysis. There may be real theory reasons why the extent of type relations has to increase monotonically as one walks up the import hierarchy, and such reasons might invalidate the proposal I made about providing more control over instance re-export. I'd certainly be interested in hearing about such real arguments (the fact that moving code downwards in the import hierarchy, away from the branch that provides the instances, has the same effect, suggests that such reasons either do not exist or already point to issues with the current language definition). But simply jumping from "we don't have full control" to "we don't have any control, and we never will" is unhelpful. We need to understand these issues better, so that we can discuss whether the language can be improved, and so that we can see what needs to change so that Ghc and Haddock start supporting the existing language. Clarity of language precedes better understanding, I hope.
Orphan instances are usually wrong unless the orphans are also exported via the standard API for either the class or the type. That is, orphans are ok in the implementation of a package, but not in the exposed API, because that makes it possible for a client to import both the class and type without getting the instance, which is what we have to avoid.
Sadly, that has all been discussed to death already, and again, it is a matter of being precise. "Orphan" instances are not wrong per se - they encode and name the extent of type relations via modules, but one needs to think carefully about their intended use and whether that use is really supported by the language or just an illusion. Of the top of my head, I can think of two uses: (a) having two instances of the same class for the same types in the same program only works by "virtue" of #2356, so should be avoided unless and until the positive aspects of #2356 are moved from accident to design decision (b) giving clients control over which instances they want to use (eg, use set A or set B, or neither) should work, and mostly does, but may run into ghc #2182 and haddock #54. Also, it is advisable only for client applications, not for client libraries, as long as their users might run into unresolved aspect of (a). My preference would be to see ghc #2182, #2356 (for Haskell 98 mode) and haddock #54 fixed. #2356 (for Ghc mode) is documented behaviour, I believe, inherited from Ghc's handling of overlapping instances, but there is no LANGUAGE extension specifying this behaviour, so it isn't portable. Next, I'd like to see whether more control over instance re-export is permissible in theory and -if yes- would like to see it implemented and standardised. Currently, I see no reason why we couldn't support (a) and (b). Claus

On Wed, 2008-09-24 at 17:30 +0100, Claus Reinke wrote:
(sorry for the delay in replying to this...)
No problem, though you seem to be restating your opinion intead of addressing my concrete points? For those who, like me, have lost the thread in the meantime, here is a link to the message you reply to:
http://www.haskell.org/pipermail/libraries/2008-September/010623.html
.., the only sensible way to think about instances is as global properties. ..it has nothing to do with bugs or misfeatures in GHC, it's a fact of Haskell 98.
I thought my example demonstrated quite clearly that instances are *not* global in Haskell.
A type class specifies a relation between types. Both the types and the class are named, and if instances are placed in separate modules, the modules are named as well. The combination of module, class and type names gives quite a bit of control over instance import/export, even if it is terribly cumbersome and limited (and easily defeated by just one library importing all instances "for convenience"). Neither the relation (class), nor its domain (types), nor its extent (instances) are "global".
The point is that instances are unconditionally re-exported,
Yes, and I'm not disputing that point. What I am disputing are its consequences/interpretation. As I said, instances accumulate upwards along the import hierarchy. But they do not propagate downwards, so they do not have global scope, and one can exert some control over all of type relation (class), domain (types), and extent (instances).
What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
Of course, when you consider that Main, by definition, imports every module in the program directly or indirectly, this is pretty close to saying `instances are global'. Isn't it? jcc

On Wed, Sep 24, 2008 at 09:28:04AM -0700, Jonathan Cast wrote:
On Wed, 2008-09-24 at 17:30 +0100, Claus Reinke wrote:
(sorry for the delay in replying to this...)
No problem, though you seem to be restating your opinion intead of addressing my concrete points? For those who, like me, have lost the thread in the meantime, here is a link to the message you reply to:
http://www.haskell.org/pipermail/libraries/2008-September/010623.html
.., the only sensible way to think about instances is as global properties. ..it has nothing to do with bugs or misfeatures in GHC, it's a fact of Haskell 98.
I thought my example demonstrated quite clearly that instances are *not* global in Haskell.
A type class specifies a relation between types. Both the types and the class are named, and if instances are placed in separate modules, the modules are named as well. The combination of module, class and type names gives quite a bit of control over instance import/export, even if it is terribly cumbersome and limited (and easily defeated by just one library importing all instances "for convenience"). Neither the relation (class), nor its domain (types), nor its extent (instances) are "global".
The point is that instances are unconditionally re-exported,
Yes, and I'm not disputing that point. What I am disputing are its consequences/interpretation. As I said, instances accumulate upwards along the import hierarchy. But they do not propagate downwards, so they do not have global scope, and one can exert some control over all of type relation (class), domain (types), and extent (instances).
What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
Of course, when you consider that Main, by definition, imports every module in the program directly or indirectly, this is pretty close to saying `instances are global'. Isn't it?
No, because the Main module has a choice to only import certain modules. Thus when orphan instances are defined, we may have a choice of which instances we want to use. David

http://www.haskell.org/pipermail/libraries/2008-September/010623.html .. What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
Of course, when you consider that Main, by definition, imports every module in the program directly or indirectly, this is pretty close to saying `instances are global'. Isn't it?
Close, but no cigar!-) The global view accepts strictly more programs than the Haskell 98 view (see the mesage I linked to for an example). It would also refuse some programs that Ghc (in non-Haskell 98 mode) accepts. And would stand in the way of looking for more precise instance import/export control (see both of my messages;-). Claus

On Wed, 2008-09-24 at 18:48 +0100, Claus Reinke wrote:
http://www.haskell.org/pipermail/libraries/2008-September/010623.html .. What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
Of course, when you consider that Main, by definition, imports every module in the program directly or indirectly, this is pretty close to saying `instances are global'. Isn't it?
Close, but no cigar!-) The global view accepts strictly more programs than the Haskell 98 view (see the mesage I linked to for an example).
Right. I (of course) take the global view as axiomatic, and I view Haskell 98's implementation something like this: If an instance is defined (somewhere in existence), then it should be in scope everywhere. This is indistinguishable, in practice, from being in scope everywhere the class and type are both in scope. However, implementing this would require the implementation to search *every* module with an orphan instance when looking for instances. (Conceptually, this should include all of Hackage; if you use an instance from a package, but nothing else, under --make (or -e or -i) that package should (in principle) still be found and linked in.) Presumably, it's easier to only search the modules you've imported for instances. So a compromise is adopted: modules imported (directly or indirectly) have to be searched, but other modules can safely be ignored. So I view refusing to import one module so you can define/import a conflicting instance as deliberate exploitation of a mis-feature/limitation of the language.
It would also refuse some programs that Ghc (in non-Haskell 98 mode) accepts. And would stand in the way of looking for more precise instance import/export control (see both of my messages;-).
Intentionally. I see type class systems where the dictionary type is not intended to be a singleton as a substantively different language feature than what Haskell type classes are intended to be. Both are useful, IMHO; at least, adopting named instances (or what have you) should be viewed as removing one feature from the language and adding another. It's not just an enhancement/strengthening of the language. jcc

On Wed, Sep 24, 2008 at 5:18 PM, Jonathan Cast
Right. I (of course) take the global view as axiomatic, and I view Haskell 98's implementation something like this:
If an instance is defined (somewhere in existence), then it should be in scope everywhere. This is indistinguishable, in practice, from being in scope everywhere the class and type are both in scope. However, implementing this would require the implementation to search *every* module with an orphan instance when looking for instances. (Conceptually, this should include all of Hackage; if you use an instance from a package, but nothing else, under --make (or -e or -i) that package should (in principle) still be found and linked in.) Presumably, it's easier to only search the modules you've imported for instances. So a compromise is adopted: modules imported (directly or indirectly) have to be searched, but other modules can safely be ignored.
So I view refusing to import one module so you can define/import a conflicting instance as deliberate exploitation of a mis-feature/limitation of the language.
Another reason to prefer the global interpretation of instances is
that it allows for more efficient, non-dictionary-passing
implementations of type classes. Both JHC's type-case and Mark Jones's
partial evaluation techniques require instances that are the same
everywhere.
--
Dave Menendez

Another reason to prefer the global interpretation of instances is that it allows for more efficient, non-dictionary-passing implementations of type classes. Both JHC's type-case and Mark Jones's partial evaluation techniques require instances that are the same everywhere.
Why would partial evaluation require global instances? Mark's prototype implementation ignored separate compilation issues by requiring all modules to be available at specialisation time. But that does not imply that specialisation is limited to a one instance-per-program view, it just has to be clear which instance to specialise at any point. That needs to be clear for any implementation of type classes, with or without dictionaries, but the consequences differ slighly: in a dictionary-based implementation, undecided instances mean dictionaries parameters, to be supplied at usage points; in a dictionary-free implementation, undecided instances mean general code, to be specialised at usage points. Right? Claus

On Wed, Sep 24, 2008 at 05:30:27PM +0100, Claus Reinke wrote:
Orphan instances are usually wrong unless the orphans are also exported via the standard API for either the class or the type. That is, orphans are ok in the implementation of a package, but not in the exposed API, because that makes it possible for a client to import both the class and type without getting the instance, which is what we have to avoid.
Sadly, that has all been discussed to death already, and again, it is a matter of being precise. "Orphan" instances are not wrong per se - they encode and name the extent of type relations via modules, but one needs to think carefully about their intended use and whether that use is really supported by the language or just an illusion. Of the top of my head, I can think of two uses:
(a) having two instances of the same class for the same types in the same program only works by "virtue" of #2356, so should be avoided unless and until the positive aspects of #2356 are moved from accident to design decision
(b) giving clients control over which instances they want to use (eg, use set A or set B, or neither) should work, and mostly does, but may run into ghc #2182 and haddock #54. Also, it is advisable only for client applications, not for client libraries, as long as their users might run into unresolved aspect of (a).
In the interest of providing a concrete example, see the darcs bug: http://bugs.darcs.net/issue387 It's a wishlist bug that we can't implement because the Show instance for Control.Exception.Exception is *not* an orphan (and thus we cannot define a duplicate instance). If it were an orphan, we could define a second show instance and then we'd get an error message if anyone ever calls show on an exception (which is always a bug). Alas, we cannot do this, and so we're stuck perpetually auditing our code to ensure that noone calls this pernicious function. Admittedly, we only want this orphan instance as a workaround for a poor API. Nevertheless, I think it illustrates your point that orphan instances could be useful, *particularly* because we have very little control over instance propogation. David

David Roundy wrote:
In the interest of providing a concrete example, see the darcs bug:
Nice motivation for wanting to *not* import an instance. The first thing that occurs to me is to avoid using UserError - is that feasible? Cheers, Simon

On Wed, Sep 24, 2008 at 07:48:06PM +0100, Simon Marlow wrote:
David Roundy wrote:
In the interest of providing a concrete example, see the darcs bug:
Nice motivation for wanting to *not* import an instance.
The first thing that occurs to me is to avoid using UserError - is that feasible?
It's feasible, but extremely ugly, and it seems almost impossible to audit the code for this, as we'd have to look at every instance of fail to see if it might happen to be used in the IO monad. And, of course, we'd have to write our own version of error. Either of these sounds very tricky. fail is a great function (albeit much maligned), and I'd hate to have to replace it throughout the code. Now, if we could avoid importing the Monad instance of IO from the Prelude, then we could write our own instance that would have a fail such as fail = throw . AssertionFailed David

David Roundy wrote:
On Wed, Sep 24, 2008 at 07:48:06PM +0100, Simon Marlow wrote:
David Roundy wrote:
In the interest of providing a concrete example, see the darcs bug:
http://bugs.darcs.net/issue387 Nice motivation for wanting to *not* import an instance.
The first thing that occurs to me is to avoid using UserError - is that feasible?
It's feasible, but extremely ugly, and it seems almost impossible to audit the code for this, as we'd have to look at every instance of fail to see if it might happen to be used in the IO monad. And, of course, we'd have to write our own version of error. Either of these sounds very tricky. fail is a great function (albeit much maligned), and I'd hate to have to replace it throughout the code.
Now, if we could avoid importing the Monad instance of IO from the Prelude, then we could write our own instance that would have a fail such as
fail = throw . AssertionFailed
I'd strongly urge you *not* to use error (or anything with value _|_) for reporting errors to the user. The reason being that GHC (or indeed Haskell) doesn't guarantee that your program will report the error message that you think it will - all it guarantees is that you'll get *some* error message. As GHC gets more clever, we're seeing more an more cases of this. If GHC can prove that your program has value _|_, then it is free to explore different orders of evaluation that also produce _|_ and pick one of them. See http://hackage.haskell.org/trac/ghc/ticket/1171 now GHC doesn't follow the imprecise exception semantics as described in the paper, but that is because the semantics in the paper isn't liberal enough (in our opinion). Of course it's fine to use error for things that are bugs in your program, and perhaps that's the way it's used in darcs. Cheers, Simon

I'm trying to understand the implications of this discussion in the current splitting off of SYB from base. In the division that was more or less agreed [1], the Data class and its uncontested instances would be kept in base, whereas the rest would go into a new syb package. The instances were traditionally orphans in Data.Generics.Instances, but it was suggested that they should now be moved into Data.Generics.Basics. This would make SYB free of orphans, but no longer offers a user the possibility of avoid the import of Data.Generics.Instances to define its own instances. It also invalidates some of the changes proposed in [1], namely caring about the imports of Data.IntMap, Data.IntSet, etc. So my question is: should Data.Generics.Instances indeed be folded into Data.Generics.Basics (which contains the Data class and associated stuff)? (Worth noting is that the instances that are moved to the syb package will remain orphans, but the idea is that those might change or disappear in the future.) Thanks, Pedro [1] http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9962/focus=9981

On Tue, Sep 30, 2008 at 03:52:37PM +0200, José Pedro Magalhães wrote:
I'm trying to understand the implications of this discussion in the current splitting off of SYB from base. In the division that was more or less agreed [1], the Data class and its uncontested instances would be kept in base, whereas the rest would go into a new syb package. The instances were traditionally orphans in Data.Generics.Instances, but it was suggested that they should now be moved into Data.Generics.Basics. This would make SYB free of orphans, but no longer offers a user the possibility of avoid the import of Data.Generics.Instances to define its own instances. It also invalidates some of the changes proposed in [1], namely caring about the imports of Data.IntMap, Data.IntSet, etc.
I would expect that the uncontested instances (except for the derived Complex instance) would be in the same module as the Data class. The container modules would thus bring these instances into scope, but would not import modules from the syb package, and thus not bring the dubious instances into scope.

José Pedro Magalhães wrote:
I'm trying to understand the implications of this discussion in the current splitting off of SYB from base. In the division that was more or less agreed [1], the Data class and its uncontested instances would be kept in base, whereas the rest would go into a new syb package. The instances were traditionally orphans in Data.Generics.Instances, but it was suggested that they should now be moved into Data.Generics.Basics. This would make SYB free of orphans, but no longer offers a user the possibility of avoid the import of Data.Generics.Instances to define its own instances. It also invalidates some of the changes proposed in [1], namely caring about the imports of Data.IntMap, Data.IntSet, etc.
So my question is: should Data.Generics.Instances indeed be folded into Data.Generics.Basics (which contains the Data class and associated stuff)? (Worth noting is that the instances that are moved to the syb package will remain orphans, but the idea is that those might change or disappear in the future.)
My suggestion would be that yes, Data.Generics.Instances should be folded into Data.Generics.Basics. If there are orphans to be moved to the syb package, then I think we ought to export the methods rather than defining instances, so that clients can define the instances themselves if they wish (although applications only, not libraries - a library would have to use a newtype). This also means you don't have to break the API later by removing the orphans. Cheers, Simon

On Wed, Sep 24, 2008 at 05:30:27PM +0100, Claus Reinke wrote:
What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
What do you mean by "in the same import hierarchy"? Do you mean "in the same program"? I've put an example showing how multiple instances can cause Data.Set to go wrong here: http://hackage.haskell.org/trac/ghc/ticket/2356#comment:8 Do you think that that behaviour is OK?
This is even worse at the level of packages.
How do packages make a difference here?
Generally, you control what other modules in your program do, but not what the packages that you depend on do. So if a package you use starts using an instance that you don't want, then you are stuck. Thanks Ian

What one cannot do (in Haskell 98) is to have two instances of the same class, for the same types, in the same import hierarchy.
What do you mean by "in the same import hierarchy"? Do you mean "in the same program"?
Actually, I was thinking of any import hierarchy with a common root, but it doesn't make much difference; Haskell'98 only rules this out for the whole program, but sub-hierarchies (libraries) with duplicate instances can thus never become part of any Haskell'98 program.
I've put an example showing how multiple instances can cause Data.Set to go wrong here: http://hackage.haskell.org/trac/ghc/ticket/2356#comment:8
Do you think that that behaviour is OK?
Note that I submitted that ticket;-) I think that the behaviour, the Haskell'98 vs Ghc discrepancy, the comments, and this thread, show that the rules need to be specified carefully and clearly, and that the current situation is not satisfactory. Such an investigation might have one of several consequences, including: 1 your example is accepted (*), and causes no problem, because noone is making assumptions about the instance being the same everywhere (one instance implying the absence of others) 2 your example is rejected (*), and alternative solutions are found for people who rely on this Ghc behaviour for overlapping and other instances (late resolution of overlaps, omitting errors if "late" can be pushed after full program compilation) 3 your example generates a warning (*), indicating that Ghc's interpretation is at odds with what people might expect from the Haskell'98 subset (I would tend to prefer this as an interim workaround, until one of 1/2/3 can be decided on) (*) in Ghc mode; in Haskell'98 mode, it should always cause an error message
How do packages make a difference here? Generally, you control what other modules in your program do, but not what the packages that you depend on do. So if a package you use starts using an instance that you don't want, then you are stuck.
Ah, ok. I was assuming that problem for any modules supplied by others, so packages don't make the situation worse for me (packages seem neither necessary nor sufficient to worsen the situation: I might provide my own package, which I could change; or I might use package-less modules that I can't change directly - we had that issue with Programatica and Strafunski a while back;-). Claus

Hi Claus, On Wed, Sep 24, 2008 at 07:21:19PM +0100, Claus Reinke wrote:
I've put an example showing how multiple instances can cause Data.Set to go wrong here: http://hackage.haskell.org/trac/ghc/ticket/2356#comment:8
Do you think that that behaviour is OK?
I'm not sure from your reply what your answer to this is. Is it "I don't know whether I think that that behaviour is OK or not"?
Note that I submitted that ticket;-) I think that the behaviour, the Haskell'98 vs Ghc discrepancy, the comments, and this thread, show that the rules need to be specified carefully and clearly,
While that would be helpful, I think that the behaviour you are interested in only works because it would be expensive to fix it (at least, that's how I understand Simon's "So I'm not sure what to do; and the only thing that comes to mind isn't cheap." comment), not because of any unclarity. I'd be interested to see a design that allows for local instances, in a way that doesn't cause problems like in the URL above, but as far as I know none exist yet. Thanks Ian

Hi Ian,
http://hackage.haskell.org/trac/ghc/ticket/2356#comment:8 Do you think that that behaviour is OK? I'm not sure from your reply what your answer to this is.
Is it "I don't know whether I think that that behaviour is OK or not"?
Your example is a variation of the one that opened the ticket, so no, I do not think this behaviour is OK. But I also think that the situation is too complex for a straightforward answer, hence my less-than-straightforward answer, which perhaps I can expand: (a) it is clearly not ok in Haskell'98 mode (that is the bug) (b) it seems to be an unavoidable consequence of a feature of Ghc's non-Haskell'98 mode (specifically Ghc's interpretation of allow-overlapping-instances) that actual code relies on, so one cannot simply change this behaviour in that mode People starting from viewpoint (a) have completely different expectations than people starting from viewpoint (b), for the same example. And that is clearly not good, so: (c) this issue highlights the need for a better specification, to reduce the confusion (d) the Ghc feature itself is only a work-around for Haskell's limited control over instance visibility. In particular, one cannot prevent re-export of instances in Haskell, so Ghc tries to ensure that such re-export only hurts those who actually try to use such re-exported instances. Haskellers using overlapping instances need to know what they are doing anyway, and Haskellers not using overlapping instances, but using libraries that use overlapping instances, are less likely to run into trouble this way. Therefore, my recommendation was: - follow the Haskell'98 report in Haskell'98 mode (solves (a)) - issue a warning whenever a program contains multiple conflicting or duplicate instances in non-Haskell'98 mode, so that (a)-people don't get hurt by a feature that (b)-people depend on (that warning will confuse, but not hurt, third-party users of (b)-people libraries - the ones meant to be helped by (d)) [this should be done anyway] - look for a real solution to instance re-export (that would avoid (b) by making (d) superfluous) - whatever plan is adopted, document it, as it is likely to be different from Haskell'98, and confusion (c) can lead to hard to find bugs. The new warning should point to this documentation, so that people know what they are being warned about. Is that any clearer?-)
Note that I submitted that ticket;-) I think that the behaviour, the Haskell'98 vs Ghc discrepancy, the comments, and this thread, show that the rules need to be specified carefully and clearly,
While that would be helpful, I think that the behaviour you are interested in only works because it would be expensive to fix it (at least, that's how I understand Simon's "So I'm not sure what to do; and the only thing that comes to mind isn't cheap." comment), not because of any unclarity.
Well, you could rule out this Ghc feature for any program not using allow-overlapping-instances, but then you get back to the traditional problem (a library uses overlapping instances, a library client doesn't; and the feature is meant to help users who do not want to know which of their libraries use overlapping instances). You can definitely rule it out for Haskell'98 programs, but that won't help much. You could argue that duplicate instances aren't overlapping instances, but that only introduces another special case, and the issue remains for overlapping instances. You cannot rule out this behaviour for overlapping instances, unless you fix instance re-export control first (that is the reason this behaviour was introduced in the first place). Simon's "not cheap" fix refers to the effort implied by fixing the issue for the special case of no-overlapping-instances (same comment, two paragraphs up). There is a righteous approach ("this shouldn't happen. forbid it!"), there is a pragmatic approach ("this does happen. help users somehow!"), but I don't know of any right approach unless the instance re-export control issue is fixed. So I assume fixing that is the right approach!-)
I'd be interested to see a design that allows for local instances, in a way that doesn't cause problems like in the URL above, but as far as I know none exist yet.
While this has come up now and then, most recently here http://www.haskell.org/pipermail/haskell-prime/2008-May/002601.html http://www.haskell.org/pipermail/libraries/2008-September/010623.html I'm also not aware of a proper design. As I mention in my reply to Simon M, there will actually be an implementation of it in ghci if #2182 is fixed, and it doesn't look too difficult to come up with a syntax, but a proper design would have to look at the theory (does it make sense to have modular logic over types? my guess is yes; if not, we better change Haskell'98 to propagate all instances through the whole program!) and practical consequences (and those tend to be unexpected;-). Simon PJ said that fixing #2182 will need some effort. But after that, perhaps Ghc can experiment with a simple syntax for instance import/ export control (I was thinking of something similar to .hs-boot file syntax, bare 'instance C t' to permit export/import of any matching instances, which would rely on being able to hide items that may not exist), so that users can help figuring out any of those unwanted consequences that are never obvious in theory. Claus

I'd be interested to see a design that allows for local instances, in a way that doesn't cause problems like in the URL above, but as far as I know none exist yet.
While this has come up now and then, most recently here
http://www.haskell.org/pipermail/haskell-prime/2008-May/002601.html http://www.haskell.org/pipermail/libraries/2008-September/010623.html
of course, as soon as I hit send I recalled: Wolfram Kahl and Kahl Scheffczyk, Named Instances for Haskell Type Classes http://www.cs.uni-bonn.de/~ralf/hw2001/4.pdf Derek Dreyer, Robert Harper, and Manuel M. T. Chakravarty Modular Type Classes http://www.cse.unsw.edu.au/~chak/papers/DHC07.html while searching for those urls, I also discovered that Paul replied to the 2001 paper in a 1992 email: http://www.cse.unsw.edu.au/~dons/haskell-1990-2000/msg00727.html That's what I call foresight!-) Claus Ps. there have also been encodings of named instances, iirc.

Claus Reinke wrote:
.., the only sensible way to think about instances is as global properties. ..it has nothing to do with bugs or misfeatures in GHC, it's a fact of Haskell 98.
I thought my example demonstrated quite clearly that instances are *not* global in Haskell.
Yes, I know that you can limit the visibilty of instances by using orphans in Haskell 98. My point was that it is a mistake to design libraries this way, because it leads to a loss of abstraction.
This is even worse at the level of packages. We can hide modules that are used internally to a package's implementation, but we can't hide the fact that a package used some non-standard instances internally, and furthermore we can't change this aspect of its implementation without changing the API.
How do packages make a difference here? As long as I don't import base:Control.Monad.Error, the base:Control.Monad.Instances instances of Functor, say, are not visible in base:Control.Monad.
If I write a package that uses Control.Monad.Instances internally, that becomes visible via its API, and I cannot abstract away from this aspect of the implementation of my package. This is an absolute failure of abstraction, and if we value abstraction (which we surely do?) then we should look for ways to fix the problem. So one way is to look at changing the language, perhaps by supporting explicit re-export of instances. Certainly that's a worthwhile direction to explore, but since our current language doesn't have this extension, we have to look for other ways to avoid the problem in the meantime. The only solution is to not have orphan instances in the original library API, which is what I'm arguing for. I think you misunderstood my point before (but you recognised that I repeated it :-). I'm not trying to say that Haskell 98 only supports global instances, rather I'm saying we should think of it that way and design our libraries with global instances in mind, because otherwise we have serious problems with abstraction. We agree that this is a failure of the language. You seem to be arguing that we should pretend that the language is not broken in the hope that it gets fixed in the future - I think that's the wrong approach.
Sadly, that has all been discussed to death already, and again, it is a matter of being precise. "Orphan" instances are not wrong per se - they encode and name the extent of type relations via modules, but one needs to think carefully about their intended use and whether that use is really supported by the language or just an illusion. Of the top of my head, I can think of two uses:
(a) having two instances of the same class for the same types in the same program only works by "virtue" of #2356, so should be avoided unless and until the positive aspects of #2356 are moved from accident to design decision
Right, agreed.
(b) giving clients control over which instances they want to use (eg, use set A or set B, or neither) should work, and mostly does, but may run into ghc #2182 and haddock #54. Also, it is advisable only for client applications, not for client libraries, as long as their users might run into unresolved aspect of (a).
I'm not sure what "client libraries" are, but I think I agree, if what you're saying is that orphan instances should be kept out of library APIs. I'd also argue that they shouldn't be used in application code either, but the reasons are less compelling there.
My preference would be to see ghc #2182, #2356 (for Haskell 98 mode) and haddock #54 fixed. #2356 (for Ghc mode) is documented behaviour, I believe, inherited from Ghc's handling of overlapping instances, but there is no LANGUAGE extension specifying this behaviour, so it isn't portable.
I'll just point out that if we stop using orphan instances in library APIs, then #2182 is much less of an issue.
Next, I'd like to see whether more control over instance re-export is permissible in theory and -if yes- would like to see it implemented and standardised.
Sure, please do! Cheers, Simon

.. I'm not trying to say that Haskell 98 only supports global instances, rather I'm saying we should think of it that way and design our libraries with global instances in mind, because otherwise we have serious problems with abstraction.
Thinking of it that way still doesn't make it so. If you want instances to be global, you have to change Haskell 98, by propagating instances through the whole program. Also, you don't mean "global", you mean "at most one per program". The latter matches what Haskell'98 does, but not what Ghc does. The former matches neither, so will lead to trouble in use (just because one user follows your rule, that doesn't mean everyone does, so anytime your user imports a not-invented-here library, his model breaks down).
We agree that this is a failure of the language. You seem to be arguing that we should pretend that the language is not broken in the hope that it gets fixed in the future - I think that's the wrong approach.
I have been arguing that we need to be more precise and explicit about what is going on, so that we can see exactly in what ways the language or its implementations are broken and so that we can avoid errors arising from mismatched implicit interpretations.
(b) giving clients control over which instances they want to use (eg, use set A or set B, or neither) should work, and mostly does, but may run into ghc #2182 and haddock #54. Also, it is advisable only for client applications, not for client libraries, as long as their users might run into unresolved aspect of (a).
I'm not sure what "client libraries" are, but I think I agree, if what you're saying is that orphan instances should be kept out of library APIs. I'd also argue that they shouldn't be used in application code either, but the reasons are less compelling there.
Assuming that LT, LC, and LIa/LIb, are libraries that provide type, class, and two sets of instances, respectively, then using any combination of these libraries that does not include both LIs in an application A is fine, using LT/LC in a library L is fine, but using any LI in another library is likely to cause trouble later (because the LI instances are implicitly re-exported from L, while L's clients might also have other instances for LT/LC's type and class).
Next, I'd like to see whether more control over instance re-export is permissible in theory and -if yes- would like to see it implemented and standardised.
Sure, please do!
Once #2182 is fixed, you will need to implement the basics of this anyway, for ghci sessions (where :m -M should remove access to M's instances), then we just need to provide some syntax for your implementation!-) Claus

Claus Reinke wrote:
.. I'm not trying to say that Haskell 98 only supports global instances, rather I'm saying we should think of it that way and design our libraries with global instances in mind, because otherwise we have serious problems with abstraction.
Thinking of it that way still doesn't make it so. If you want instances to be global, you have to change Haskell 98, by propagating instances through the whole program.
Sigh. I'm not having much luck convincing you, so let me try a different tack. Here's a scenario I'd like your opinion on: We have a test in GHC's test suite called instance-leak, which tests that none of the Haskell 98 libraries exports the Functor instance for (->). I broke this test recently by accident. How did I break it? By using Data.Map internally in System.Process. The Functor instance for (->) is exported from Control.Monad.Instances, makes its way into Data.Map, then into System.Process and thereby into System, which is a Haskell 98 module. In your opinion, who or what is at fault for this test failure? Think carefully about the ramifications of your answer. Oh, and "Haskell 98" is not a valid answer, because we can't change that. Cheers, Simon

Sigh. I'm not having much luck convincing you, so let me try a
And there I thought the goal of this discussion was to reach a common understanding, not to convince anyone of anything..
different tack. Here's a scenario I'd like your opinion on:
We have a test in GHC's test suite called instance-leak, which tests that none of the Haskell 98 libraries exports the Functor instance for (->). I broke this test recently by accident. How did I break it? By using Data.Map internally in System.Process. The Functor instance for (->) is exported from Control.Monad.Instances, makes its way into Data.Map, then into System.Process and thereby into System, which is a Haskell 98 module.
In your opinion, who or what is at fault for this test failure?
You, of course:-p But seriously: I can't see easily how the instance even gets into Data.Map, but since Data.Map doesn't seem to be using either Functor or fmap (other than defining an instance itself), it has no business importing (and thus re-exporting) the instance. If it is actually using any of the other instances (is it?), then perhaps Control.Monad.Instances needs to be split up. That is the approach taken for Data.Generics.Instances, which also has no business being imported in Data.Map. Btw, it would be useful to have a tool for tracing instances (not just final export and original definition, but the import path in between). And it would be useful to see exported instances for each module in Haddock (not just alongside class and type).
Think carefully about the ramifications of your answer.
That was my quick answer. Do you think it will change?-) Claus

We have a test in GHC's test suite called instance-leak, which tests that none of the Haskell 98 libraries exports the Functor instance for (->). I broke this test recently by accident. How did I break it? By using Data.Map internally in System.Process. The Functor instance for (->) is exported from Control.Monad.Instances, makes its way into Data.Map, then into System.Process and thereby into System, which is a Haskell 98 module.
In your opinion, who or what is at fault for this test failure?
You, of course:-p But seriously: I can't see easily how the instance even gets into Data.Map, but since Data.Map doesn't seem to be using either Functor or fmap (other than defining an instance itself), it has no business importing (and thus re-exporting) the instance. If it is actually using any of the other instances (is it?), then perhaps Control.Monad.Instances needs to be split up.
It seems that the ultimate offender is Control.Applicative, which imports Control.Monad.Instances() for no other reason than to infect its importers. That then propagates all over the place. I assume the rationale is "convenience" again? Saving one import in importers at the price of limiting the modules that can afford to import Control.Applicative in the first place? Since libraries@ is the maintainer for Control.Applicative, may I suggest to add Control.Applicative.Alt, differing from the original only in omitting the import of Control.Monad.Instances? Then all importers of Control.Applicative should be checked for whether import of Control.Applicative.Alt is sufficient, and be switched to that if possible. Or am I on the wrong track?-) Claus

Claus Reinke wrote:
We have a test in GHC's test suite called instance-leak, which tests that none of the Haskell 98 libraries exports the Functor instance for (->). I broke this test recently by accident. How did I break it? By using Data.Map internally in System.Process. The Functor instance for (->) is exported from Control.Monad.Instances, makes its way into Data.Map, then into System.Process and thereby into System, which is a Haskell 98 module.
In your opinion, who or what is at fault for this test failure? You, of course:-p But seriously: I can't see easily how the instance even gets into Data.Map, but since Data.Map doesn't seem to be using either Functor or fmap (other than defining an instance itself), it has no business importing (and thus re-exporting) the instance. If it is actually using any of the other instances (is it?), then perhaps Control.Monad.Instances needs to be split up.
It seems that the ultimate offender is Control.Applicative, which imports Control.Monad.Instances() for no other reason than to infect its importers. That then propagates all over the place.
Good! I completely disagree. Control.Monad.Instances is at fault for exposing an orphan instance in the first place. Every other answer to the question leads to problems, I claim. Why is it wrong to say that Control.Applicative, or Data.Map, or my use of Data.Map in System.Process, are at fault? Because all of these modules just happen to be importing something that exports an orphan instance, and they can't be held responsible for re-exposing the orphan instance because they have no control over that. You will argue that they have control over what they import, and can thereby control whether they export the instance. This is certainly true, but it ignores the need for module abstraction: the need to be able to change the implementation of a module without changing its API. We must have the property that the imports of a module do not affect its API - and the only way to have this property is to avoid orphan instances in library APIs. (hmm, I finally feel like I've explained this clearly. I hope it comes across that way.) There are basically only two sensible choices for the Functor instance for (->): (a) don't define one at all (b) define one in Control.Monad, and give up on Haskell 98 compliance The current situation, namely (c) define it as an orphan, and give up on module abstraction is not a sensible choice. Incedentally, Iavor Diatchki pointed out to me today a nice way to provide "optional" instances for things in libraries. In your library if you have an abstract type T and want to suggest, but not mandate, a Functor instance, then you could export a function fmapT :: (a->b) -> T a -> T b so that a client can easily say instance Functor T where fmap = fmapT if they want. (but not in a library! A library would have to wrap T in a newtype to avoid an orphan instance.)
I assume the rationale is "convenience" again? Saving one import in importers at the price of limiting the modules that can afford to import Control.Applicative in the first place?
Since libraries@ is the maintainer for Control.Applicative, may I suggest to add Control.Applicative.Alt, differing from the original only in omitting the import of Control.Monad.Instances? Then all importers of Control.Applicative should be checked for whether import of Control.Applicative.Alt is sufficient, and be switched to that if possible.
That justs moves the problem around, and doesn't fix it. Cheers, Simon

Good! I completely disagree.
Ok. Since we now agree to disagree on the solution, but seem to agree on the problem, if not the language used to describe it (sigh:), it might be useful to note that our suggested approaches are somewhat dual: A every instance should be available as widely as possible B no instance should be available more widely than necessary One consequence of following A is that it doesn't matter whether an imported module exports instances, because anyone possibly affected by them will have them anyway. That is less flexible than B, as A doesn't allow for localised instances, but that limitation is in line with Haskell'98 and means that the only trouble that can arise from conflicting instances is that a program doesn't compile. This means that some code can't be written, because missing instances can't be added to external libraries, which also means that there can't be two libraries providing or using different instances for a shared class/type. One consequence of following B is that very fine control over instance import and export needs to be exercised, with the benefit that localised instances become thinkable (they have been possible in Ghc for some time, but could be put on a less adhoc basis), at the expense of having to think carefully about possible conflicts arising from different libraries providing or using different instances for a shared class/type. This means that the current language limitations become very visible, as they interfere with the fine control of instance re-export, which also means that it is easy to mess up. As for orphan instances, approach A forbids them entirely, while approach B forbids importing them into libraries (in the present context, a library module is a module that can be imported). So, in B, orphan instances can only be imported into applications (modules that aren't themselves imported). If one absolutely needs to combine a library L with some orphan instance module O, one should provide two versions of L, one with and one without importing O. As for language limitations, approach A tends to adapt to them, while approach B tends to highlight them. I hope that summary isn't too biased?-)
instance Functor T where fmap = fmapT
Providing default instance method implementations instead of instances is indeed used in practice, eg Data.Traversable.
Since libraries@ is the maintainer for Control.Applicative, may I suggest to add Control.Applicative.Alt, differing from the original only in omitting the import of Control.Monad.Instances? Then all importers of Control.Applicative should be checked for whether import of Control.Applicative.Alt is sufficient, and be switched to that if possible.
That justs moves the problem around, and doesn't fix it.
While giving up Haskell'98 compatibility, as in your suggestion, cuts the Gordian Knot? I don't think so, but then we tend to disagree on such things;-) Would anyone else like to venture an opinion? Claus

On Fri, 2008-09-26 at 21:51 +0100, Simon Marlow wrote:
You will argue that they have control over what they import, and can thereby control whether they export the instance. This is certainly true, but it ignores the need for module abstraction: the need to be able to change the implementation of a module without changing its API. We must have the property that the imports of a module do not affect its API - and the only way to have this property is to avoid orphan instances in library APIs.
(hmm, I finally feel like I've explained this clearly. I hope it comes across that way.)
Yes, I think that's a clear explanation and one I think is quite convincing.
There are basically only two sensible choices for the Functor instance for (->):
(a) don't define one at all (b) define one in Control.Monad, and give up on Haskell 98 compliance
The current situation, namely
(c) define it as an orphan, and give up on module abstraction
is not a sensible choice.
Right. No orphan instances in libraries. Duncan

On Fri, 26 Sep 2008, Duncan Coutts wrote:
On Fri, 2008-09-26 at 21:51 +0100, Simon Marlow wrote:
There are basically only two sensible choices for the Functor instance for (->):
(a) don't define one at all (b) define one in Control.Monad, and give up on Haskell 98 compliance
The current situation, namely
(c) define it as an orphan, and give up on module abstraction
is not a sensible choice.
Right.
No orphan instances in libraries.
Me too. Although I'm afraid some of my packages still contain orphan instances. When I need a new instance to an existing class+type I ask the library author for that instance and in the meantime I use a 'newtype'. Frequently a newtype turns out to be better anyway.

Simon Marlow
We must have the property that the imports of a module do not affect its API - and the only way to have this property is to avoid orphan instances in library APIs.
To be more precise, I would say that each orphan instance in a library API must be the only export of a dedicated visible module, and only exported by that module. (The only importers of orphan instances are typically Main modules, or ``almost-Main'' modules.)
There are basically only two sensible choices for the Functor instance for (->):
(a) don't define one at all (b) define one in Control.Monad, and give up on Haskell 98 compliance
The current situation, namely
(c) define it as an orphan, and give up on module abstraction
is not a sensible choice.
I would say that there should the a separate module Data.Function.Functor () exporting only that instance. (And I find it a surprising decision of Haskell98 that something that is so obviously a functor in so obviously only one way must not be a Functor instance...) Wolfram

kahl@cas.mcmaster.ca wrote:
Simon Marlow
wrote: We must have the property that the imports of a module do not affect its API - and the only way to have this property is to avoid orphan instances in library APIs.
To be more precise, I would say that each orphan instance in a library API must be the only export of a dedicated visible module, and only exported by that module.
(The only importers of orphan instances are typically Main modules, or ``almost-Main'' modules.)
Why do you believe that? We seem to have evidence to the contrary, with Control.Monad.Instances being imported by library modules. Cheers, Simon

Simon Marlow
kahl@cas.mcmaster.ca wrote:
Simon Marlow
wrote: We must have the property that the imports of a module do not affect its API - and the only way to have this property is to avoid orphan instances in library APIs.
To be more precise, I would say that each orphan instance in a library API must be the only export of a dedicated visible module, and only exported by that module.
(The only importers of orphan instances are typically Main modules, or ``almost-Main'' modules.)
Why do you believe that?
Sorry, I was being idealistic. ;-) I mean: The only importers of orphan instances should typically be Main modules, or ``almost-Main'' modules. That does of course require conscious effort... And it also assumes that orphan instances are in dedicated modules, so they are imported only intentionally. (I would propose to never warn about orphan instances in modules that export only a single orphan instance.)
We seem to have evidence to the contrary, with Control.Monad.Instances being imported by library modules.
That's because at least some of its instances are not ``truly orphan'', but should, and reasonably can, be in scope whenever both type and class are in scope. (Yes, for Functor ((->) a) this means that Haskell98 is wrong.) A counterexample to that is a Read instance that pulls in a major dependency, for example Parsec --- that should not be automatically in scope wherever both type and class are in scope, but only where it is actually needed. Haskell software architecture can be quite tricky... Wolfram

On Fri, Sep 26, 2008 at 2:37 PM, Claus Reinke
It seems that the ultimate offender is Control.Applicative, which imports Control.Monad.Instances() for no other reason than to infect its importers. That then propagates all over the place.
Control.Applicative defines Applicative instances for (->) and (,).
Doing so requires Functor instances for (->) and (,), which are
defined in Control.Monad.Instances.
I suppose one solution would be to move the Applicative instances for
(->) and (,) to Control.Monad.Instances.
--
Dave Menendez

On Fri, Sep 26, 2008 at 2:37 PM, Claus Reinke
wrote: It seems that the ultimate offender is Control.Applicative, which imports Control.Monad.Instances() for no other reason than to infect its importers. That then propagates all over the place.
Control.Applicative defines Applicative instances for (->) and (,). Doing so requires Functor instances for (->) and (,), which are defined in Control.Monad.Instances.
Wouldn't it be possible just to delay the instance selection? instance Applicative ((->) a) where would become instance Functor ((->)a) => Applicative ((->) a) where But, that code doesn't seem to require the instances? Only I just noticed that Control.Applicative imports Control.Arrow, which imports Control.Monad.Fix, which imports Control.Monad.Instances. We can apply the constraint delay in Control.Monad.Fix, making instance Functor ((->)r) => MonadFix ((->)r) but that requires {-# LANGUAGE UndecidableInstances #-} {-# LANGUAGE FlexibleContexts #-} (the need for the former highlights another language limitation..) Claus
I suppose one solution would be to move the Applicative instances for (->) and (,) to Control.Monad.Instances.
-- Dave Menendez
http://www.eyrie.org/~zednenem/

Hello Ian, Thursday, August 28, 2008, 3:12:32 PM, you wrote:
Again, this is a false separation, with 698 LoC left behind in GHC.Conc
i propose to consider ghc.* as not the part of Base, but separate library (GhcPrim) which is bundled together with Base only due to technical limitations -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, Aug 28, 2008 at 03:42:40PM +0400, Bulat Ziganshin wrote:
Thursday, August 28, 2008, 3:12:32 PM, you wrote:
Again, this is a false separation, with 698 LoC left behind in GHC.Conc
i propose to consider ghc.* as not the part of Base, but separate library (GhcPrim) which is bundled together with Base only due to technical limitations
Just for interest's sake: Of those 106 modules and 16621 LoC left in base, 33 modules and 8555 LoC are in GHC.* (so about 1/3 and 1/2). The hugs and nhc-specific modules are in other packages. There's a lot of implementation-specific stuff ifdef'ed in the "shared" modules too. Thanks Ian

Hi
Again, this is a false separation, with 698 LoC left behind in GHC.Conc
i propose to consider ghc.* as not the part of Base, but separate library (GhcPrim) which is bundled together with Base only due to technical limitations
This is in fact exactly what Hoogle 4 does, compare: http://haskell.org/hoogle/?hoogle=map+%2Bghc http://haskell.org/hoogle/?hoogle=map Thanks Neil

On Thu, Aug 28, 2008 at 13:12, Ian Lynagh wrote:
Also, GHC's current plan for the base library: http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Planforlibraries essentially means forking base (as nhc98 would continue to use base in a darcs repo, while GHC would use it from a git repo, and there are no plans for any merging between these repos). Therefore any code that is to be shared between the implementations needs to not be in base, so from that point of view it would be good to pull out as much as possible.
...
First the easy bit: The Data.Generics hierarchy is going to have a separate maintainer, and I think that everyone is agreed that it should be pulled out into an "syb package". I'll treat this as not part of base from here on.
For what it's worth, I would be happy to see "syb" in a git repository. And, if I understood everything correctly, that's what you plan on doing. BTW, thanks for putting out these updates, Ian. They are very helpful. Sean

| We're trying to decide what to do with the base library for GHC 6.10, in | terms of how much of it should be broken up into separate packages. | Since the recent proposal about this, we may be rethinking what we want | to do, and we would welcome your opinions. Thanks Ian. I found it helpful to number off the advantages and disadvantages so they are easy to refer to, so I enclose a slightly text-processed version of your message below. My thoughts * I find (D2), (D3), and (D4) -- see below -- quite strong reasons for maintaining the status quo * While (A1)-(A3) are advantages, I'm not sure they are powerful enough to want to disturb the status quo in the *short term* (ie before 6.10). * The exception is SYB, for which we have a willing and active maintainer, so (A1) is very strong. That isn't the case for any other package. So my suggestion would be: * for 6.10: split out SYB and nothing else * later: maybe more, let's see Simon =================== Text-processed version of Ian's message =================== We're trying to decide what to do with the base library for GHC 6.10. Specifically we want to work out how much of the current package "base" should be split into separate packages. Since the recent proposal about this (http://hackage.haskell.org/trac/ghc/ticket/1338), we may be rethinking what we want to do, and we would welcome your opinions. Motivation: why split up "base"? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A1. It becomes possible to separately upgrade the parts, and makes it easier for different people to maintain different parts. A2. It makes it easier to see what the hierarchy is, and to restructure the hierarchy, and work towards more of the code being shared between different Haskell implementations. Plus it means that people can't te-tangle the logically separate components, which is all too easy to do when you just have one huge package. A3. It also means that packages are clearer about what they depend on. One possibility, which would be really cool, is to separate all the IO modules from the non-IO modules; between that and looking at the extensions used (e.g. TH and FFI) it would then be clear whether or not a library could do any IO. Of course, the Prelude is a hurdle for this goal. Also, GHC's (still in flux) plan for the base library: http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Planforlibraries essentially means forking base (as nhc98 would continue to use base in a darcs repo, while GHC would use it from a git repo, and there are no plans for any merging between these repos). Therefore any code that is to be shared between the implementations needs to not be in base, so from that point of view it would be good to pull out as much as possible. Why *not* split up "base"? ~~~~~~~~~~~~~~~~~~~~~~~~~~ D1. Splitting up base imposes costs on others. Specifically, the dependencies of packages need to be updated to reflect the changes. However, GHC 6.10 will come with a base version 3, as well as the new base version 4, so the transition should be much smoother than the base 2 -> base 3 transition. D2. It would be bad to make a change, and then make *another* change to the same thing. So anywhere there is doubt we should leave htings unchanged D3. Several people expressed reservations about a proliferation of packages containing only one module, or only a little code (less than 500 lines, say). D4. Splitting out a package whose *implementation* depends in an intimate way on "base" is a bit of a false separation. At one extreme a new package could simply re-export a bunch of types and functions from "base". If this is the case, none of A1-A3 hold. What I propose ~~~~~~~~~~~~~~ (In the below, LoC stands for "Lines of Code".) ----- SYB: generic programming ------- First the easy bit: The Data.Generics hierarchy is going to have a separate maintainer, and I think that everyone is agreed that it should be pulled out into an "syb package". I'll treat this as not part of base from here on. The only thing still being debated here is whether the Data class itself should remain in base or not. Some people believe that it should remain in base, as it is desirable to have Data instances for as many types as possible, and because there is a resistance among library writers against adding dependencies. The counter argument is that there are many other classes that the same is true of (e.g. uniplate, syb-with-class, binary), and it does not scale to put all of these classes into base. Also, by requiring a dep to be added even for the classes that have historically been included in base, adding dependencies for the sake of providing instances may become more socially acceptable. ----- GetOpt ------------ System.Console.GetOpt (129 LoC, 1 module) This doesn't really fit in with anything else in base, so I propose to split it off into its own getopt package. I don't think there is much objection to this one. [SLPJ: I am unconvinced.] ----- ST ---------------- Control.Monad.ST Data.STRef (120 LoC, 6 modules) hierarchies. I propose that we put these into an "st" package. The low-level implementation is still in base (69 LoC of in the GHC.ST and GHC.STRef), so to some extent this is a false separation (D4). On the other hand, nhc98 doesn't support ST, so splitting this package off gets us closer to all implementations exposing the same modules from base. ------ Concurrent -------- Control.Concurrent hierarchy (490 LoC, 6 modules) and System.Timeout (39 LoC) Data.Unique (32 LoC) (those latter modules depend on Control.Concurrent.*). I propose that we put these into "concurrent", "timeout" and "unique" packages respectively. Again, this is a false separation, with 698 LoC left behind in GHC.Conc; at some time we'd hope that this could either be moved down to ghc-prim, or make a new ghc-concurrent package for it, depending on how the dependencies work out. Again, nhc doesn't support concurrent or its dependencies, so this gets us closer to a consistent base interface. [SLPJ: I don't think we should split out concurrent yet. I'm pretty certain that we should not generate tiny new packages for "timeout" and "unique".] ------ Summary ------- Splitting off the above 5 packages would leave 106 modules and 16,621 LoC in base. About 5% of the LoC, and 12.5% of the modules, would be in the new packages. [SLPJ: the fact that the change is so small makes me think that A2, A3 are not being helpful. I think there is only a strong case for SYB, becuase of A1.]

On Thu, Aug 28, 2008 at 12:12:32PM +0100, Ian Lynagh wrote:
We're trying to decide what to do with the base library for GHC 6.10, in terms of how much of it should be broken up into separate packages. Since the recent proposal about this, we may be rethinking what we want to do, and we would welcome your opinions.
The decision is to split syb out from base (some details still being discussed) and leave the getopt st concurrent, timout and unique modules in base for GHC 6.10. Thanks Ian
participants (16)
-
Ashley Yakeley
-
Bulat Ziganshin
-
Claus Reinke
-
David Menendez
-
David Roundy
-
Duncan Coutts
-
Henning Thielemann
-
Ian Lynagh
-
Jonathan Cast
-
José Pedro Magalhães
-
kahl@cas.mcmaster.ca
-
Neil Mitchell
-
Ross Paterson
-
Sean Leather
-
Simon Marlow
-
Simon Peyton-Jones