Splitting SYB from the base package in GHC 6.10

Hello all, I'm initiating this discussion per suggestion of SP Jones and following from [1]. The issue is: SYB is being moved out of base into its own package. However, the Data class is, in a way, tied to base since it depends on the deriving mechanism. Therefore, it was suggested that the entire Data.Generics.Basics module [2] should remain in base. This module defines the Data class and several associated functions and datatypes. I don't think anyone objected to this so far: please correct me if I'm wrong, or object now. Then it was also suggested that Data.Generics.Instances [3] could stay in base (perhaps inside Basics as well). This, however, would prevent dealing with the "dubious" Data instances [4], and this was one of the motivating factors to split SYB from base. This refers concretely to the instances: instance (Data a, Data b) => Data (a -> b) instance Typeable a => Data (IO a) instance Typeable a => Data (Ptr a) instance Typeable a => Data (StablePtr a) instance Typeable a => Data (IORef a) instance Typeable a => Data (ForeignPtr a) instance (Typeable s, Typeable a) => Data (ST s a) instance Typeable a => Data (TVar a) instance Typeable a => Data (MVar a) instance Typeable a => Data (STM a) instance (Data a, Integral a) => Data (Ratio a) These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed. However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance. Claus, please correct me if I'm wrong, but if the 11 "dubious" instances (or perhaps less, given your message in [5]) go in the syb package and the remaining, "standard" ones stay in base, we: - Mantain backwards compatibility regarding SYB in 6.10, and - Can still deal with the issue by releasing a new version of the syb package later, independently of GHC. Since the deadline for 6.10 is approaching I'm assuming that we should try to minimize the changes there, while keeping future development in the syb package as open as possible. Finally, there are module naming issues, which are probably secondary to the issue above and can be dealt with separately and later. Thanks, Pedro [1] The base library and GHC 6.10: http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9929 [2] http://www.haskell.org/ghc/dist/stable/docs/libraries/base/Data-Generics-Bas... [3] http://www.haskell.org/ghc/dist/stable/docs/libraries/base/Data-Generics-Ins... [4] http://www.haskell.org/pipermail/generics/2008-June/000347.html [5] http://article.gmane.org/gmane.comp.lang.haskell.libraries/9957

On Mon, Sep 01, 2008 at 02:49:01PM +0200, José Pedro Magalhães wrote:
Then it was also suggested that Data.Generics.Instances [3] could stay in base (perhaps inside Basics as well). This, however, would prevent dealing with the "dubious" Data instances [4], and this was one of the motivating factors to split SYB from base. This refers concretely to the instances:
instance (Data a, Data b) => Data (a -> b) instance Typeable a => Data (IO a) instance Typeable a => Data (Ptr a) instance Typeable a => Data (StablePtr a) instance Typeable a => Data (IORef a) instance Typeable a => Data (ForeignPtr a) instance (Typeable s, Typeable a) => Data (ST s a) instance Typeable a => Data (TVar a) instance Typeable a => Data (MVar a) instance Typeable a => Data (STM a) instance (Data a, Integral a) => Data (Ratio a)
These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed. However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance.
These two statements appear to be contradictory. Perhaps an example of a possible instance would help. In any case, it seems there's no objection to placing the following instances together with the Data class: () Bool Char Double Float Handle Ordering Int Int8 Int16 Int32 Int64 Integer Word Word8 Word16 Word32 Word64 DataType TyCon TypeRep [] Maybe Either tuples Claus also mentioned the Ptr/Var types, which also don't seem to have a value that can be traversed. I think the Ratio instance really ought to be in base, as Rational is in the Prelude. What about Complex? Claus argued that -> and the monads could be treated by analogy with Show for these types. There is an additional problem with types like ThreadId, Array, ST, STM, TVar and MVar: they're notionally defined in other packages, even though they're actually defined in partially-hidden GHC.* modules in base and re-exported.

These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed. However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance.
|These two statements appear to be contradictory. Perhaps an example of |a possible instance would help. "no other possible implementation" is an overstatement, though an easy one to make: those 'Data' instances are incomplete because better instances are hard to come by. One can perhaps do little improvements, like replace the effective 'gmapT = id' for 'IO a' and 'b -> a' with something like[1]: gmapT f fun = f . fun -- instead of gmapT f fun = fun gmapT f io = (return . f) =<< io -- instead of gmapT f io = io but that still doesn't make those instances complete. If it wasn't for the partial uses, like skipping 'IO a' and 'b -> a' as parts of derived 'Data' instances, one wouldn't want these instances at all, imho (at least not in their current form). Then there are abstract types, for which the current default when implementing reflection is to assume "no constructors", hence no basis for 'gunfold', hence more incomplete 'Data' instances and runtime errors. It might be possible to experiment with associating exactly one, abstract, constructor with each abstract type instead, but that isn't something I'd like to bake in without more experience. Another way to look at it: 'Data' tries to do too much in a single class, and the consequence are all those half-implemented 'Data' instances. The probable long-term solution is to split 'Data' into 2 or 3 classes, so that we know that a any type instantiating 'DataGfoldl' really supports 'gfoldl' b any type instantiating 'DataGunfold' really supports 'gunfold' c any type instantiating 'DataReflect' really supports 'Data' reflection Currently, too many types instantiate 'Data' without supporting b or c (-> runtime errors), and a few instances don't even support a. All of which suggests that 'Data' should probably leave 'base', as it needs to evolve further? |Claus argued that -> and the monads could be treated by analogy |with Show for these types. I had mentioned 'Text.Show.Functions' as an example of "improper" instances provided for optional import to support 'deriving Show'. But when I read your sentence, my first thought was: perhaps there's also a way to apply the showList trick? That would neatly avoid either changing the 'deriving' mechanism or having dummy instances. More reason for moving everything to 'syb', keeping it flexible for a while. |There is an additional problem with types like ThreadId, Array, ST, STM, |TVar and MVar: they're notionally defined in other packages, even though |they're actually defined in partially-hidden GHC.* modules in base and |re-exported. Would it be sufficient for 'syb' to depend on both 'base' and those notional source packages? It would be useful to keep the instances in 'syb' until the 'Data' story has settled down, after which the instances ought to move to their 'data' type source packages. Claus [1] http://www.haskell.org/pipermail/libraries/2008-July/010319.html

Hello,
On Mon, Sep 1, 2008 at 21:04, Claus Reinke
"no other possible implementation" is an overstatement, though an easy one to make: those 'Data' instances are incomplete because better instances are hard to come by. One can perhaps do little improvements, like replace the effective 'gmapT = id' for 'IO a' and 'b -> a' with something like[1]:
gmapT f fun = f . fun -- instead of gmapT f fun = fun
gmapT f io = (return . f) =<< io -- instead of gmapT f io = io
But wouldn't these introduce additional inconsistencies? At least if introduced in the library itself. I am used to think that gmapT is implemented using gfoldl, and is only inside the Data class to allow for more efficient implementations, and not for alternative implementations...
Another way to look at it:
'Data' tries to do too much in a single class, and the consequence are all those half-implemented 'Data' instances. The probable long-term solution is to split 'Data' into 2 or 3 classes,
so that we know that
a any type instantiating 'DataGfoldl' really supports 'gfoldl' b any type instantiating 'DataGunfold' really supports 'gunfold' c any type instantiating 'DataReflect' really supports 'Data' reflection
Currently, too many types instantiate 'Data' without supporting b or c (-> runtime errors), and a few instances don't even support a.
All of which suggests that 'Data' should probably leave 'base', as it needs to evolve further?
Just for my understanding, can you give me an example of a datatype which currently has (b) but not (c) and vice-versa? Anyway, I guess keeping Data inside base does not preclude such splitting of Data: for backward compatibility the original Data would have to remain available, right?
|Claus argued that -> and the monads could be treated by analogy |with Show for these types.
I had mentioned 'Text.Show.Functions' as an example of "improper" instances provided for optional import to support 'deriving Show'.
But when I read your sentence, my first thought was: perhaps there's also a way to apply the showList trick? That would neatly avoid either changing the 'deriving' mechanism or having dummy instances.
More reason for moving everything to 'syb', keeping it flexible for a while.
By "everything" do you mean all instances or all the "dubious" ones? IIRC, the argument for having the "standard" instances in base is that leaving Data alone without any instances would mean that in most cases you would have to import SYB anyway to get any functionality. Or are there other reasons? Thanks, Pedro

gmapT f fun = f . fun -- instead of gmapT f fun = fun
But wouldn't these introduce additional inconsistencies? At least if introduced in the library itself. I am used to think that gmapT is implemented using gfoldl, and is only inside the Data class to allow for more efficient implementations, and not for alternative implementations...
Well, I'd like to define 'gmapT' in terms of 'gfoldl' (in a non-trivial, sensible way). The default for gfoldl is 'gfoldl _ z = z', but that doesn't help much here since 'z's type is rather too polymorphic to be of use: 'forall c g . g -> c g'. I've wondered occasionally whether requiring 'Typeable g' there would help. The next try is to expand our function, so that we can pretend we have some constructor to work on in 'gfoldl': -- fun ==> \x->fun x ==> (\fun x->fun x) fun Then we can do (using scoped type variables to fix the 'a' and 'b'): gfoldl k z fun = z (\fun x->fun x) `k` fun -- gmapT f fun = f . fun gmapT f fun = unId $ gfoldl (k f) (Id) fun where k f (Id c) x = Id (c (case (cast x :: Maybe (a -> b)) of Just x -> fromJust $ cast (f . x) Nothing -> x)) but whether that is very enlightening, I wouldn't want to say;-)
Just for my understanding, can you give me an example of a datatype which currently has (b) but not (c) and vice-versa?
b ('toConstr'&co) usually comes with c ('gunfold'). I've defined some 'Data' instances which implemented b without c, but I don't think that is typical. My reason for splitting the functionality in three ('gfoldl', 'toConstr', 'gunfold') was just to be systematic, hoping in particular for implementations of 'gunfold' (or, more generally, constructing 'data' from parts) that do not depend on reflection.
Anyway, I guess keeping Data inside base does not preclude such splitting of Data: for backward compatibility the original Data would have to remain available, right?
It used to be the case that 'base' could not be updated, so anything in it would be fixed until the next ghc release. Preserving the original 'Data' would also preserve the original clients and incomplete instances, which is not what one would want (instead, one would want to instantiate just those component classes whose methods can be implemented and used without runtime errors, preserving compatibility of non-failing code). But that is all far future, 6.12 or so, not urgent now. I just mentioned it because there is very little about SYB that I'm sure about, and this is another example of something that might be worth looking into. And the more you keep in 'base', the less you can improve.
More reason for moving everything to 'syb', keeping it flexible for a while.
By "everything" do you mean all instances or all the "dubious" ones? IIRC, the argument for having the "standard" instances in base is that leaving Data alone without any instances would mean that in most cases you would have to import SYB anyway to get any functionality. Or are there other reasons?
Note the "for a while" there. If you are at liberty to change 'base' and users can update 'base' without waiting for the next ghc release, then you can do the changes in 'base'. Otherwise, everything that might change should be in a package you can change and users can update. Making that package 'syb' keeps things simple - later, after things have settled down again, one could spin off 'Data' and 'Typeable' into their own package ('data-reflection', 'introspection', ..). Or one could re-integrate 'Data' into 'base' to get smaller 'build-depends' (and less accurate Cabal dependencies..). But while you're looking into improving things, they need to be changeable, and 'base' usually isn't. Claus

The issue is: SYB is being moved out of base into its own package. However, the Data class is, in a way, tied to base since it depends on the deriving mechanism.
My understanding is that the deriving mechanism would still work if class 'Data' was moved into 'syb', but changes in 'Data' would still need to be matched in the deriving mechanism (which isn't auto-generated from 'base', either). As long as 'syb' remains a core library, we can thus focus on assigning modules to 'syb' or 'base' by functionality.
Therefore, it was suggested that the entire Data.Generics.Basics module [2] should remain in base. This module defines the Data class and several associated functions and datatypes. I don't think anyone objected to this so far: please correct me if I'm wrong, or object now.
Assuming this is based on 'Data.Generics.Basics' and 'Data.Typeable' being of more general use than the rest of 'syb' (justifying a preferred dependency on 'base' rather than 'syb'), not any implementation constraints, I don't object in general. It does suggest a separate 'data-reflect' package for these two modules, but that could be left for later. However, if 'Data' is in 'base', and the 'data' types are in 'base', then the 'Data' instances for those 'data' types should probably also be in base (*) (the instance for 'Array a b' ought to move to 'array'). And the short-term issue with this is that these instances, their location, and their importers, need some revision, while 'base' wants to be stable. The hope was that splitting off 'syb' from 'base' would contain the changes in a package with named maintainer, outside 'base'. Wouldn't it be easier to have all of 'Data' in 'syb', at least until 'Data' and 'Typeable' move into their own package? But if you can find a way to make the 'Data'-in-'base' route work, I'm not going to object.
Then it was also suggested that Data.Generics.Instances [3] could stay in base (perhaps inside Basics as well). This, however, would prevent dealing with the "dubious" Data instances [4], and this was one of the motivating factors to split SYB from base. This refers concretely to the instances:
Rearranging the list slightly, for easier reference: -- these have (or produce) substructures of type 'a', which aren't -- traversed by the current Data instances (contrary to what one -- would expect, say, from a generic 'fmap' over these types)
instance (Data a, Data b) => Data (b -> a) instance Typeable a => Data (IO a) instance (Typeable s, Typeable a) => Data (ST s a) instance Typeable a => Data (STM a) instance Typeable a => Data (IORef a) instance Typeable a => Data (TVar a) instance Typeable a => Data (MVar a)
-- here, the 'a' is a phantom type, without matching substructures
instance Typeable a => Data (Ptr a) instance Typeable a => Data (StablePtr a) instance Typeable a => Data (ForeignPtr a)
-- here, the 'a' corresponds to substructures that should only -- be visible through the abstract interface, on top of which a -- 'data'-like view can be provided
instance (Data a, Integral a) => Data (Ratio a)
In addition, a longer list of instances offer only runtime errors for some 'Data' operations (most notably for 'gunfold', though abstract types in general have a problem with reflection support). Are these necessary or would they profit from closer investigation? If the latter, those instances should probably not be in 'base'.
These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed.
Well, there are alternative instances that would at least improve traversal support [3], but that wouldn't work for queries, I think.
However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance.
Indeed, the implicit presence of these instances is the main issue, and reducing their presence and propagation affects 'base' and other core and extra libaries, so needs to happen soon.
Claus, please correct me if I'm wrong, but if the 11 "dubious" instances (or perhaps less, given your message in [5]) go in the syb package and the remaining, "standard" ones stay in base, we: - Mantain backwards compatibility regarding SYB in 6.10, and - Can still deal with the issue by releasing a new version of the syb package later, independently of GHC.
issues to consider, of the top of my head: - to what extent can core libraries be updated independent of 'base'? - unless 'base' can now be updated (there are two versions of 'base' in ghc head), 'base' must not depend on 'syb' - which other core libraries depend on 'syb'? are they updateable? - the current importers of (parts of) 'Data.Generics' need to be revised [1] - instances cannot be deprecated - since all instances are in one module, one could deprecate the module, but are module deprecations propagated to their importers automatically? - would 'Data.Generics' need to be deprecated, in favour of a version that does not implicitly re-export any/all instances? [2] Maintaining strict backwards-compatibility in 6.10 while still allowing for changes in 'syb' is going to be difficult, if only because clients might depend on 'Data.IntSet' and the like to re-export all current 'Data' instances, which we certainly want to stop. My 'syb-utils' [2] has alternatives to 'Data.Generics' that export either only standard instances or no instances, which would allow to deprecate all 'Data.Generics*' modules that are less specific about their instance exports, but would require use of alternative module names..
Since the deadline for 6.10 is approaching I'm assuming that we should try to minimize the changes there, while keeping future development in the syb package as open as possible.
Definitely. But some choices need to be made now. Mainly what goes where, how to handle deprecation, and how to reduce implicit instance propagation. Claus [1] http://article.gmane.org/gmane.comp.lang.haskell.libraries/9957 [2] http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/#syb-utils [3] http://www.haskell.org/pipermail/libraries/2008-July/010319.html (*) this isn't a firm rule, either: recently, it was decided to keep the 'Data' instances for 'ghc' types out of 'ghc'..

The issue is: SYB is being moved out of base into its own package.
However, the Data class is, in a way, tied to base since it depends on the deriving mechanism.
My understanding is that the deriving mechanism would still work if class 'Data' was moved into 'syb', but changes in 'Data' would still need to be matched in the deriving mechanism (which isn't auto-generated from 'base', either). As long as 'syb' remains a core library, we can thus focus on assigning modules to 'syb' or 'base' by functionality.
So, here's a (possible) summary from a general perspective. (1) Some people want to keep some parts of the SYB functionality in 'base', because these parts are closely linked to some parts of GHC. This is desired for convenience (and perhaps test coverage?). (2) Some people want to remove some parts of the SYB functionality from 'base', because they want to be able to maintain and release SYB separately. (3) Some people in group #2 are not sure what should be left in 'base' or extracted into 'syb.' My observations: (A) I don't see 'syb' ever becoming something other than a core library for GHC, considering it's close family ties. (B) I expect 'syb' to get updated and released more often than GHC. This is especially true considering the newfound interest. (C) I expect the 'syb' library will be tested using the current (and possibly past?) release(s) of GHC, because that's what releases will use in general. If something in a development version of GHC breaks SYB, then there may need to be a new 'syb' release for when that version of GHC is released. At that point, there may be a need for a temporary fork if other work is ongoing. (C) From a user's perspective I don't understand the splitting of SYB. Why is it that I can derive Data.Generics.Data, but I cannot actually use other functions built for it? So, given all of the above (assuming it's correct), it seems to me that the benefit leans towards migrating everything SYB-related into the 'syb' library. Is the motivation/argument for group #1 very strong? Hope this helps, Sean

Sean Your analysis is good, but missing the following You can build stuff on class Data *other than* SYB. That's a motivation for not identifying Data with SYB. That's really the argument for keeping Data in 'base', so that others can build on it without depending on the full glory of SYB. (A weaker argument is that GHC "knows" about Data, to support 'deriving'. But that's less important.) Simon From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Sean Leather Sent: 01 September 2008 21:23 To: Claus Reinke Cc: José Pedro Magalhães; ross@soi.city.ac.uk; Simon Peyton-Jones; libraries@haskell.org; generics@haskell.org; igloo@earth.li Subject: Re: Splitting SYB from the base package in GHC 6.10 The issue is: SYB is being moved out of base into its own package. However, the Data class is, in a way, tied to base since it depends on the deriving mechanism. My understanding is that the deriving mechanism would still work if class 'Data' was moved into 'syb', but changes in 'Data' would still need to be matched in the deriving mechanism (which isn't auto-generated from 'base', either). As long as 'syb' remains a core library, we can thus focus on assigning modules to 'syb' or 'base' by functionality. So, here's a (possible) summary from a general perspective. (1) Some people want to keep some parts of the SYB functionality in 'base', because these parts are closely linked to some parts of GHC. This is desired for convenience (and perhaps test coverage?). (2) Some people want to remove some parts of the SYB functionality from 'base', because they want to be able to maintain and release SYB separately. (3) Some people in group #2 are not sure what should be left in 'base' or extracted into 'syb.' My observations: (A) I don't see 'syb' ever becoming something other than a core library for GHC, considering it's close family ties. (B) I expect 'syb' to get updated and released more often than GHC. This is especially true considering the newfound interest. (C) I expect the 'syb' library will be tested using the current (and possibly past?) release(s) of GHC, because that's what releases will use in general. If something in a development version of GHC breaks SYB, then there may need to be a new 'syb' release for when that version of GHC is released. At that point, there may be a need for a temporary fork if other work is ongoing. (C) From a user's perspective I don't understand the splitting of SYB. Why is it that I can derive Data.Generics.Data, but I cannot actually use other functions built for it? So, given all of the above (assuming it's correct), it seems to me that the benefit leans towards migrating everything SYB-related into the 'syb' library. Is the motivation/argument for group #1 very strong? Hope this helps, Sean

On Tue, Sep 2, 2008 at 12:50, Simon Peyton-Jones wrote:
Sean Your analysis is good, but missing the following
You can build stuff on class Data **other than** SYB. That's a motivation for not identifying Data with SYB.
That's really the argument for keeping Data in 'base', so that others can build on it without depending on the full glory of SYB.
Ah, okay. Then, that is a stronger argument. Thanks, Sean

On Mon, Sep 01, 2008 at 02:49:01PM +0200, José Pedro Magalhães wrote:
with the "dubious" Data instances
FWIW, I don't believe in "dubious" instances. There are two reasons one might want to not have an instance: * You want to use a different instance for the same type, e.g. you want the (Ratio t) instance to descend into the t rather than just treating the type abstractly. This sounds nice in theory, but it is flawed in practice. Instances are global in Haskell, so if one library needs Ratio instance 1, and another library needs Ratio instance 2, then those libraries cannot be used in the same program. * You don't want to actually use Data with that type, and you want the compiler to tell you if a bug in your code means that the instance would be used ("Can't find instance Data (IO a)"). However, in my (admittedly not vast) experience, this isn't a problem that tends to crop up in practice. Also, note that the presence of these instances doesn't affect correct programs. Also, I've just spent a couple of minutes staring at the SYB and SBY-With-Class Data classes; am I right in thinking that neither can be implemented on top of the other? Here's part of the classes: ------------------------------------------------ {-# LANGUAGE UndecidableInstances, OverlappingInstances, FlexibleInstances, MultiParamTypeClasses, Rank2Types, ScopedTypeVariables, EmptyDataDecls, KindSignatures #-} import Data.Typeable class Typeable a => Data1 a where gfoldl1 :: (forall b c. Data1 b => w (b -> c) -> b -> w c) -> (forall g. g -> w g) -> a -> w a class (Typeable a, Sat (ctx a)) => Data2 ctx a where gfoldl2 :: Proxy ctx -> (forall b c. Data2 ctx b => w (b -> c) -> b -> w c) -> (forall g. g -> w g) -> a -> w a data Proxy (a :: * -> *) class Sat a where dict :: a ------------------------------------------------ Could uniplate etc have been built on top of SYBWC's Data, instead of SYB's Data? Is SYB's Data the basic building block only because there are more instances for it in the wild, and GHC can derive them? Thanks Ian

Ian Lynagh
* You want to use a different instance for the same type.
Instances are global in Haskell, so if one library needs Ratio instance 1, and another library needs Ratio instance 2, then those libraries cannot be used in the same program.
For me it's not so much about the libraries. I've run into cases (not in SYB) where I want my program to use a different definition of an instance than the one provided by a library, while still using the other stuff provided by that library. The pervasive globality of instances does seem to me to be a Haskell bug; it seems to me like they ought to be controlled using the same module export control rules as anything else. But I imagine there's some reason I don't understand that makes this hard or impossible...

On Tue, Sep 2, 2008 at 11:59 AM, Bart Massey
Ian Lynagh
writes: * You want to use a different instance for the same type.
Instances are global in Haskell, so if one library needs Ratio instance 1, and another library needs Ratio instance 2, then those libraries cannot be used in the same program.
For me it's not so much about the libraries. I've run into cases (not in SYB) where I want my program to use a different definition of an instance than the one provided by a library, while still using the other stuff provided by that library.
The pervasive globality of instances does seem to me to be a Haskell bug; it seems to me like they ought to be controlled using the same module export control rules as anything else. But I imagine there's some reason I don't understand that makes this hard or impossible...
I think there are some guarantees made by the type system that aren't
valid if class instances aren't global, but I can't remember the
details.
More practically, there is more than one way of implementing classes:
dictionary passing, partial evaluation (see "Dictionary-free
overloading by partial evaluation" by Mark Jones), and explicit type
parameters (used by JHC). The latter two enable optimizations that are
not possible with dictionary passing, but only work if instances are
global.
--
Dave Menendez

On Tue, Sep 2, 2008 at 11:59 AM, Bart Massey
wrote: Ian Lynagh
writes: * You want to use a different instance for the same type.
Instances are global in Haskell, so if one library needs Ratio instance 1, and another library needs Ratio instance 2, then those libraries cannot be used in the same program.
For me it's not so much about the libraries. I've run into cases (not in SYB) where I want my program to use a different definition of an instance than the one provided by a library, while still using the other stuff provided by that library.
The pervasive globality of instances does seem to me to be a Haskell bug; it seems to me like they ought to be controlled using the same module export control rules as anything else. But I imagine there's some reason I don't understand that makes this hard or impossible...
I missed this the first time around, but consider the following case (I ran into this specific issue trying to formulate arithmetic using first-class modules and implicits): class Multiplication alpha beta where type Product alpha beta (*) :: alpha -> beta -> Product alpha beta class AdditiveGroup alpha where zero :: alpha (+) :: alpha -> alpha -> alpha (-) :: alpha -> alpha -> alpha class (AdditiveGroup alpha, Multiplication alpha alpha, Product alpha alpha ~ alpha) => Field alpha where one :: alpha (/) :: alpha -> alpha -> alpha class (AdditiveGroup vector, Field scalar, Multiplication scalar vector, Product scalar vector ~ vector) => VectorSpace scalar vector And consider the definition multiply :: VectorSpace scalar scalar => scalar -> scalar -> scalar multiply = (*) VectorSpace scalar scalar has two Multiplication scalar scalar super-classes: one direct, and one inherited from Field. With type classes, this isn't a problem, because the program can only contain one instance of Multiplication scalar scalar anyway (for fixed scalar), so any instance must of VectorSpace scalar scalar necessarily picks up only one definition of (*). If you had named instances, and relaxed that uniqueness constraint, I'm not even sure how you'd express that when you needed it here. jcc

Hello,
On Tue, Sep 2, 2008 at 14:35, Ian Lynagh
FWIW, I don't believe in "dubious" instances.
Indeed not changing the instances is definitely still an option. Maybe it's best, for now, to focus on choosing a splitting solution which allows for all possible flexibility in the future development of SYB. After 6.10 is out, we then have all the proposed changes to SYB (including the dubious instances issue) analized and discussed before making the changes. I find Simon PJ's argument for keeping Data in base ("so that others can build on it without depending on the full glory of SYB") rather strong, and I guess any "interesting" change to the Data class would require a change in the internal deriving mechanism. I'm not saying that Data is perfect and should not be changed, but maybe it can be in base without fundamentally diminishing the oportunites for development in SYB. Do you agree with me, Claus? Also, I've just spent a couple of minutes staring at the SYB and
SBY-With-Class Data classes; am I right in thinking that neither can be implemented on top of the other?
I believe that is the case. syb-with-class is generally interpreted as another library for generic programming. Their interfaces differ (namely because syb-with-class requires abstraction over type classes, and since this is not possible it uses some trick). Could uniplate etc have been built on top of SYBWC's Data, instead of
SYB's Data? Is SYB's Data the basic building block only because there are more instances for it in the wild, and GHC can derive them?
I guess it could, but that would have implications on its usability. I guess SYB's Data is seen as a basic building block for generics because it's easy to use (and the automatic derivation plays a big role on that). Plus, it comes with GHC. A recent report compares various libraries for generic programming in Haskell and their expressiveness [1]. Thanks, Pedro [1] http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-010.pdf

Indeed not changing the instances is definitely still an option. Maybe it's best, for now, to focus on choosing a splitting solution which allows for all possible flexibility in the future development of SYB. After 6.10 is out, we then have all the proposed changes to SYB (including the dubious instances issue) analized and discussed before making the changes.
I find Simon PJ's argument for keeping Data in base ("so that others can build on it without depending on the full glory of SYB") rather strong, and I guess any "interesting" change to the Data class would require a change in the internal deriving mechanism. I'm not saying that Data is perfect and should not be changed, but maybe it can be in base without fundamentally diminishing the oportunites for development in SYB. Do you agree with me, Claus?
I find it curious to see you ask that question, as I've spent quite a bit of time answering it, and providing information needed to prepare a workable plan. Strangely, I had assumed you'd actually use that material to help you understand the issues involved in any decision made now. But my question haven't been answered, and now you simply repeat your question without reference to my previous answers and suggestions (have you tried playing with 'syb-utils', to see how a trying to fix the instance issues in a separate package on top of 'base' would work, or not?). Instead of repeating myself in full, I'd prefer you'd refer to my previous emails (ask if they were unclear), but here is the gist of it (with no claim of completeness): 0 if 'Data' stays in 'base', and the types are in 'base' there is no good reason for the instances to be elsewhere, right? (personally, I still like Ashley's suggestion of putting 'Data', 'Typeable' and 'Dynamic' in a separate package) 1 'base' traditionally cannot be updated between ghc releases. I asked whether that had changed, because otherwise leaving anything in 'base' will keep you from changing it. So if the technical limitations fixing 'base' are not gone, your adventure ends right there. 2 the "standard"/"dubious" separation of instances was entirely preliminary; following the recent discussion, I would suggest to split the instances into three groups, in three separate modules: [standard]: fully implemented 'Data' instances (no runtime errors). one should probably reclassify 'Ratio a' in here. [partial]: partially implemented instances (usually for abstract types, which 'Data' doesn't handle well; whether that can be mended without changing the class remains to be seen); these include 'Array a b', 'ThreadId', etc (previously in 'Standard') and the pointer types (previously in 'Dubious'); if these instances can be completed, existing clients will simply work better (fewer runtime crashes) [misfits]: 'IO a', 'b -> a', and other types that enclose a substructure type in a contravariant context ("realworld", "b", "state", etc); not only are the existing 'Data' instances incomplete, they skip substructures on both transformations and queries, and the type of query operators in 'Data' simply does not permit a complete implementation of substructure queries for these types (I think), so these instances are fairly hopeless at the moment, and should only be available if explicitly requested. 3 if the implicit re-export of misfit instances from 'Data.Generic*' isn't stopped now, any attempt to fix 'syb' is doomed (until the next ghc release). unless anyone wants to claim that it is technically possible to implement complete 'Data' instances for these types (in which case I'm all ears;-), the existing incomplete instances should not be available by default (requiring a specific import instead). I'd prefer the other partial instances also to be under explicit import only, so that noone can run into runtime crashes without having been warned (just because they use 'gunfold' instead of 'gfold', say, and happen to work with arrays instead of lists), but it seems I'm in the minority on this point. 4 even after providing selective import of non-standard instances, the existing importers of 'Data.Generic*' in core and extra libs need to be cleaned up now, so that they only import (and thus re-export) the miminum set of instances they depend on (in particular, no misfits); again, failure to do that now will doom any attempt to fix the 'syb' instances later. If the misfit instances are moved out of 'Data.Generic.Instances' into a module not imported anywhere else, while the other partial instances remain, no changes might be needed to the existing 'Data.Generics*' importers, but that needs to be checked. I have no idea whether these would give you sufficient flexibility to work on 'syb' after the ghc release, but I'm pretty sure that they are part of the necessary minimum of issues that need to be addressed now. Even after working around #2182, you can observe the issues when using my 'syb-utils' package, eg, the 'Data.Generics.GPS' module uses 'Data.IntMap', so it accidentally re-exports all the old instances, even though it carefully avoids importing them itself. If it wasn't for #2356, that would never work (and if you happen to import 'Data.Generics.GPS' before importing 'Data.Generics.Instances.*', instead, it still might not, at least while building the package). Claus PS I'll update my 'syb-utils' package to reflect the new instance split, as given in 2 above.

I have no idea whether these would give you sufficient flexibility to work on 'syb' after the ghc release
I think I've missed something: What is the work that can't be done before the instances are moved?
If I understand José's intentions correctly, the idea is to keep the 'Data' class intact this round, but to give the 'syb' package maintainer and users the flexibility to try and improve the situation with 'Data' instances; ie, either - improve the currently partial and incomplete instances, or - not provide/import those instances that can't be completed But if the existing incomplete instances are not moved out of the way, their implicit import/export will always interfere with whatever new instances 'syb' might try to provide. So it seems as if we need one of: - ability to update 'base', changing the instances in place - ability to selectively import the incomplete instances, so that we can chose to import either the old ones or the new ones (or neither, if fixing turns out to be impossible) Unless you want to keep everything as it is, that is; or have I missed something? Claus

On Wed, Sep 03, 2008 at 12:29:09AM +0100, Claus Reinke wrote:
If I understand José's intentions correctly, the idea is to keep the 'Data' class intact this round, but to give the 'syb' package maintainer and users the flexibility to try and improve the situation with 'Data' instances; ie, either
- improve the currently partial and incomplete instances, or - not provide/import those instances that can't be completed
But if the existing incomplete instances are not moved out of the way, their implicit import/export will always interfere with whatever new instances 'syb' might try to provide.
I think the area of contention is pretty small. As I understand it, no-one wants to change the instances you call [standard], for the types [] Maybe Either tuples () Bool Ordering Char Double Float Handle Integer Int Int8 Int16 Int32 Int64 Word Word8 Word16 Word32 Word64 DataType TyCon TypeRep So they might as well go with the Data class. That leaves the [partial] instances that would have to be taken out of syb to make most of the other packages independent of it: Ratio Complex Ptr StablePtr ForeignPtr Assuming something sensible is done for the first two, would there really be any need to change these?

I think the area of contention is pretty small. As I understand it, no-one wants to change the instances you call [standard], for the types
[] Maybe Either tuples () Bool Ordering Char Double Float Handle Integer Int Int8 Int16 Int32 Int64 Word Word8 Word16 Word32 Word64 DataType TyCon TypeRep
This list already includes some [partial] instances - just knowing that these types have 'Data' instances isn't sufficient to use 'Data' operations on them safely (while avoiding runtime errors).
So they might as well go with the Data class. That leaves the [partial] instances that would have to be taken out of syb to make most of the other packages independent of it:
Ratio Complex Ptr StablePtr ForeignPtr
'Ratio' has a complete 'Data' instance, it just isn't consistent; the 'gfoldl' operation assumes a nullary constructor/no visisble constructors (abstract type) while the 'gunfold' operation assumes a binary constructor (concrete type, internal constructor). 'Complex' seems to be missing an instance?
Assuming something sensible is done for the first two, would there really be any need to change these?
That depends on whether the [partial] 'Data' instances can be completed in a sensible way for these types. If they can't, then the way forward would be to split the 'Data' class into those operations supported by all current instances and those only supported by the types in [standard]. That way, one would the presence of instances would at least be indicative of the presence of instance methods, providing more static type safety. But if the 'Data' class and some instances are in 'base', there'd be no changing them (*). Claus (*) of course, it would be nice if ghc's deriving mechanism was based on a derived instance specification in the module providing the class, rather than being hardcoded into the compiler itself; so even with 'Data' out of 'base', that part currently cannot be changed without updating ghc. or if class aliases allowed the generated old-style 'Data' instances to be re-interpreted as 'DataA' and 'DataB' instances.. but none of that is realistic for the current split decision.

| Instead of repeating myself in full, I'd prefer you'd refer to my previous | emails (ask if they were unclear), but here is the gist of it (with no claim | of completeness): Claus, I found your summary very helpful. Thank you! Although you may have said it all before, I find it very difficult to keep a whole email thread in my head, and I find compact summaries really helpful. | I think the area of contention is pretty small. As I understand it, | no-one wants to change the instances you call [standard], for the types Ross's comments (reproduced below) look sensible to me. In the short term we probably don't know enough to be sure of doing the Permanently Right Thing. But time is short for 6.10, so I think for that near-term goal we should concentrate on something modest. My criterion is: base: stuff that (a) we don't expect to want to change and (b) that others might want to build on in a different way syb: stuff that may change There is a tension between (a) and (b); and the base/syb boundary may well be fuzzy. Concretely, I propose that we follow more or less Ross's suggestion: base: the Data class itself and [standard] instances, and any [partial] instances that we feel confident won't change syb: leftover [partial] instances, and [misfits] plus lots of SYB library code My reading of Claus's message is that, modulo discussion of some details of [partial], he'd go with that plan. I urge *against* making a new package for Data/Typeable, for now at least. The gain is slight or absent, and there is a cost in terms of new package dependencies, and the possibility that we may later decide to cut the cake another way. We recently decided to un-do the proposed splitting out of packages for 'getopt', 'st', 'concurrent', 'timeout' etc. for this very reason. See http://www.haskell.org/pipermail/libraries/2008-August/010543.html Is that acceptable to others? If so, someone (Jose) needs to sort out the details. 19 Sept is GHC's release candidate, so we need this done well before then. Simon | -----Original Message----- | From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Ross Paterson | Sent: 03 September 2008 09:14 | To: libraries@haskell.org | Cc: José Pedro Magalhães | Subject: Re: Splitting SYB from the base package in GHC 6.10 | | On Wed, Sep 03, 2008 at 12:29:09AM +0100, Claus Reinke wrote: | > If I understand José's intentions correctly, the idea is to keep the | > 'Data' class intact this round, but to give the 'syb' package maintainer | > and users the flexibility to try and improve the situation with 'Data' | > instances; ie, either | > | > - improve the currently partial and incomplete instances, or | > - not provide/import those instances that can't be completed | > | > But if the existing incomplete instances are not moved out of | > the way, their implicit import/export will always interfere with | > whatever new instances 'syb' might try to provide. | | I think the area of contention is pretty small. As I understand it, | no-one wants to change the instances you call [standard], for the types | | [] Maybe Either tuples | () Bool Ordering Char Double Float Handle | Integer Int Int8 Int16 Int32 Int64 Word Word8 Word16 Word32 Word64 | DataType TyCon TypeRep | | So they might as well go with the Data class. That leaves the [partial] | instances that would have to be taken out of syb to make most of the | other packages independent of it: | | Ratio Complex | Ptr StablePtr ForeignPtr | | Assuming something sensible is done for the first two, would there really | be any need to change these?

Hello,
On Wed, Sep 3, 2008 at 13:50, Simon Peyton-Jones
My reading of Claus's message is that, modulo discussion of some details of [partial], he'd go with that plan.
Is that acceptable to others? If so, someone (Jose) needs to sort out the details. 19 Sept is GHC's release candidate, so we need this done well before then.
Are the details sorted out? Summarizing: - Data.Generics.Basics stays in base; - All other SYB modules besides Data.Generics.Instances go into a new 'syb' package; - Data.Generics.Instances is split in two, one part remaining in base and the other moving to the syb package. Regarding the instances, two aspects have to be considered: 1. Avoiding the implicit re-export of Data instances from modules in base; 2. Deciding which instances go where. Regarding (1), the following changes need to be done: - Ratio has to be fixed to have a consistent instance: either it's seen as an abstract datatype (therefore with undefined gunfold) or a gfoldl has to be defined matching its gunfold; - Complex should have its derived instance restored (or otherwise a manual instance with a proper gfoldl); - Other places that have to be changed: - The Data instance of Data.Array should be moved from Data.Generics.Instances into Array, to avoid syb dependency; - The imports of SYB in the following modules should be fixed to avoid bringing into scope all the instances: ./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs - The following modules use of Ratio instances, which is not problematic if the instance for Ratio is fixed. Therefore no change is needed here: ./haskell-src/Language/Haskell/Syntax.hs ./template-haskell/Language/Haskell/TH/Quote.hs ./template-haskell/Language/Haskell/TH/Syntax.hs - These modules depend on an instance for ForeignPtr Word8. Instances for types with phantom types stay in base, since no suitable instance can be given (see (2) below). Therefore no change is needed here: ./bytestring/Data/ByteString/Internal.hs ./bytestring/Data/ByteString/Lazy/Internal.hs Regarding (2), the separation of the 44 Data instances in Data.Generics.Instances is the following: - These 26 [standard] instances are uncontroversial and can accompany the Data class: [a] (Maybe a) (Either a b) () (,) (,,) (,,,) (,,,,) (,,,,,) (,,,,,,) Bool Ordering Char Double Float Integer Int Int8 Int16 Int32 Int64 Word Word8 Word16 Word32 Word64 - These 2 instances could also be [standard] and stay in base, after fixing (see (1) above): Ratio Complex - These 2 instances have phantom types. No good suggestion for what to do with them has shown up, therefore they stay in base as they are: Ptr ForeignPtr - These 6 instances have abstract datatypes. Unless there is some reason to keep them in base, I suggest they go into the syb package: DataType TyCon TypeRep Handle ThreadId StablePtr - These 7 go in the syb package for future discussion: (a -> b) (IO a) (ST s a) (STM a) (IORef a) (TVar a) (MVar a) - The instance for Array is moved into Data.Array. Any remarks? Thanks, Pedro

On Mon, Sep 15, 2008 at 01:28:30PM +0200, José Pedro Magalhães wrote:
Are the details sorted out? Summarizing:
* Data.Generics.Basics stays in base; * All other SYB modules besides Data.Generics.Instances go into a new 'syb' package; * Data.Generics.Instances is split in two, one part remaining in base and the other moving to the syb package.
Sounds good. Those that remain in base need no longer be orphans. Regarding Complex, I think your point that it exports (:+) determines that the derived instance is appropriate.
* These 6 instances have abstract datatypes. Unless there is some reason to keep them in base, I suggest they go into the syb package:
DataType TyCon TypeRep Handle ThreadId StablePtr
I'd lean slightly toward keeping with their definitions in base, but it's not a blocker like the others. What shall we call the module defining the Data class? Data.Data?

Hi Pedro, On Tue, Sep 16, 2008 at 09:41:33AM +0100, Ross Paterson wrote:
On Mon, Sep 15, 2008 at 01:28:30PM +0200, José Pedro Magalhães wrote:
Are the details sorted out? Summarizing:
* Data.Generics.Basics stays in base; * All other SYB modules besides Data.Generics.Instances go into a new 'syb' package; * Data.Generics.Instances is split in two, one part remaining in base and the other moving to the syb package.
Sounds good. Those that remain in base need no longer be orphans. Regarding Complex, I think your point that it exports (:+) determines that the derived instance is appropriate.
* These 6 instances have abstract datatypes. Unless there is some reason to keep them in base, I suggest they go into the syb package:
DataType TyCon TypeRep Handle ThreadId StablePtr
I'd lean slightly toward keeping with their definitions in base, but it's not a blocker like the others.
What shall we call the module defining the Data class? Data.Data?
I'm a little lost as to exactly what changes you want. Can you send a darcs patch against the base and syb repos please?: http://darcs.haskell.org/ghc-6.10/packages/base/ http://darcs.haskell.org/ghc-6.10/packages/syb/ Currently everything is in the syb package. Thanks Ian

Hello Ian,
On Mon, Sep 22, 2008 at 21:54, Ian Lynagh
I'm a little lost as to exactly what changes you want. Can you send a darcs patch against the base and syb repos please?: http://darcs.haskell.org/ghc-6.10/packages/base/ http://darcs.haskell.org/ghc-6.10/packages/syb/ Currently everything is in the syb package.
That's fine, but do I need to build the entire compiler to be able to compile the base package? Currently if I try to `runhaskell setup.hs configure' on base I get `attempting to use module `Prelude' (Prelude.hs) which is not loaded'... Thanks, Pedro

Hi Pedro, On Tue, Sep 23, 2008 at 03:03:19PM +0200, José Pedro Magalhães wrote:
On Mon, Sep 22, 2008 at 21:54, Ian Lynagh
wrote: I'm a little lost as to exactly what changes you want. Can you send a darcs patch against the base and syb repos please?: http://darcs.haskell.org/ghc-6.10/packages/base/ http://darcs.haskell.org/ghc-6.10/packages/syb/ Currently everything is in the syb package.
That's fine, but do I need to build the entire compiler to be able to compile the base package?
If you want to move the Data class then you'll definitely have to rebuild GHC, and to change the line gENERICS = mkSybModule (fsLit "Data.Generics.Basics") to gENERICS = mkBaseModule (fsLit "<wherever you put it>") in compiler/prelude/PrelNames.lhs
Currently if I try to `runhaskell setup.hs configure' on base I get `attempting to use module `Prelude' (Prelude.hs) which is not loaded'...
You can't compile programs (like Setup.hs) in a base tree, as it will try to use the modules from the base tree rather than those from the installed base package. Thanks Ian

Hello Ian,
On Mon, Sep 22, 2008 at 21:54, Ian Lynagh
I'm a little lost as to exactly what changes you want. Can you send a darcs patch against the base and syb repos please?: http://darcs.haskell.org/ghc-6.10/packages/base/ http://darcs.haskell.org/ghc-6.10/packages/syb/ Currently everything is in the syb package.
I'm attaching patches to 4 repositories that split SYB between the syb and
base packages. (It's my first time doing this so I hope all is fine, but if
not just let me know.)
Basically, a new module Data.Data in base contains the previous
Data.Generics.Basics and most of the instances from Data.Generics.Instances.
Other changes are:
On Mon, Sep 15, 2008 at 13:28, José Pedro Magalhães
- Ratio has to be fixed to have a consistent instance: either it's seen as an abstract datatype (therefore with undefined gunfold) or a gfoldl has to be defined matching its gunfold;
Ratio had its gfoldl defined to match its gunfold. Since GHC.Ratio exports
the constructor (:%), I guess Ratio cannot be considered abstract.
- Complex should have its derived instance restored (or otherwise a manual instance with a proper gfoldl);
Complex had its derived instance restored.
- Other places that have to be changed: - The Data instance of Data.Array should be moved from Data.Generics.Instances into Array, to avoid syb dependency;
This was not done, since I am unsure if it would break some sort of Hugs compatibility. If the instance for Array is in Data.Data (note that it is not an orphan anymore), then it's provided independently of where the Array comes from. If it would be in Data.Array, then Hugs would have to provide an instance in its Array library too.
- The imports of SYB in the following modules should be fixed to avoid bringing into scope all the instances:
./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs
This was not done, given the discussion in [1] regarding instance visibility and orphans.
The following "dubious" instances are now in the syb package (in the Data.Generics.Instances module): DataType TyCon TypeRep Handle ThreadId StablePtr (a -> b) (IO a) (ST s a) (STM a) (IORef a) (TVar a) (MVar a). All the other instances are in the base package, Data.Data module. Ian, do you think this could be incorporated in any possible upcoming beta versions/release candidates before the final release? I did some sanity checking but it would be good if a wider audience could test it. Thanks, Pedro [1] http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9929/focus=27

Hi Pedro, On Sat, Oct 04, 2008 at 05:17:41PM +0200, José Pedro Magalhães wrote:
I'm attaching patches
Thanks!
Ian, do you think this could be incorporated in any possible upcoming beta versions/release candidates before the final release? I did some sanity checking but it would be good if a wider audience could test it.
I've applied it to HEAD and 6.10. For some reason the checksum on the base patch was wrong, but it looks OK anyway. Thanks Ian

On Sat, Oct 04, 2008 at 05:17:41PM +0200, José Pedro Magalhães wrote:
Basically, a new module Data.Data in base contains the previous Data.Generics.Basics and most of the instances from Data.Generics.Instances. Other changes are:
On Mon, Sep 15, 2008 at 13:28, José Pedro Magalhães
wrote: + Ratio has to be fixed to have a consistent instance: either it's seen as an abstract datatype (therefore with undefined gunfold) or a gfoldl has to be defined matching its gunfold;
Ratio had its gfoldl defined to match its gunfold. Since GHC.Ratio exports the constructor (:%), I guess Ratio cannot be considered abstract.
I don't think GHC.Ratio counts as part of the public interface. But Ratio isn't a showstopper.
o The imports of SYB in the following modules should be fixed to avoid bringing into scope all the instances:
./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs
This was not done, given the discussion in [1] regarding instance visibility and orphans.
I've pushed patches to HEAD to do this for array, containers, packedstring and template-haskell. I think we should be aiming to remove all dependencies of boot libs on syb, and possibly to making syb an extra-lib. The only boot lib left depending on syb is bytestring, but I understand that has other issues.

Hello,
On Sun, Oct 5, 2008 at 12:34, Ross Paterson
On Sat, Oct 04, 2008 at 05:17:41PM +0200, José Pedro Magalhães wrote:
Basically, a new module Data.Data in base contains the previous Data.Generics.Basics and most of the instances from Data.Generics.Instances. Other changes are:
On Mon, Sep 15, 2008 at 13:28, José Pedro Magalhães
wrote: + Ratio has to be fixed to have a consistent instance: either it's seen as an abstract datatype (therefore with undefined gunfold) or a gfoldl has to be defined matching its gunfold;
Ratio had its gfoldl defined to match its gunfold. Since GHC.Ratio exports the constructor (:%), I guess Ratio cannot be considered abstract.
I don't think GHC.Ratio counts as part of the public interface. But Ratio isn't a showstopper.
Sorry, I meant GHC.Real. But I am not sure on this definition either; no one before had suggested whether to remove or add functionality to this instance,just that the previous state was inconsistent. Thanks, Pedro

The following "dubious" instances are now in the syb package (in the Data.Generics.Instances module): DataType TyCon TypeRep Handle ThreadId StablePtr (a -> b) (IO a) (ST s a) (STM a) (IORef a) (TVar a) (MVar a). All the other instances are in the base package, Data.Data module.
Contrary to that list, 'Data (a->b)' is still in Data.Data? Claus

Hello,
On Sun, Oct 5, 2008 at 15:40, Claus Reinke
The following "dubious" instances are now in the syb package (in the Data.Generics.Instances module): DataType TyCon TypeRep Handle ThreadId StablePtr (a -> b) (IO a) (ST s a) (STM a) (IORef a) (TVar a) (MVar a). All the other instances are in the base package, Data.Data module.
Contrary to that list, 'Data (a->b)' is still in Data.Data?
Oops, my mistake. Thanks for noticing! Attaching patches. Thanks, Pedro

Hello Ian, I'm attaching some more patches to fix the haddock documentation (somewhat) and the export list of Data.Generics.Instances. Thanks, Pedro

Hello Ian,
Yet another haddock documentation fix: the link from Data.Typeable was
broken (went to Data.Generics, which is no longer in base; corrected to
Data.Data).
Thanks,
Pedro
On Sun, Oct 12, 2008 at 11:39, Ian Lynagh
Hi Pedro,
On Fri, Oct 10, 2008 at 02:31:00PM +0200, José Pedro Magalhães wrote:
I'm attaching some more patches to fix the haddock documentation
(somewhat)
and the export list of Data.Generics.Instances.
Thanks; applied to HEAD and 6.10.
Thanks Ian

On Mon, Oct 06, 2008 at 10:00:40AM +0200, José Pedro Magalhães wrote:
On Sun, Oct 5, 2008 at 15:40, Claus Reinke
wrote: The following "dubious" instances are now in the syb package (in the Data.Generics.Instances module): DataType TyCon TypeRep Handle ThreadId StablePtr (a -> b) (IO a) (ST s a) (STM a) (IORef a) (TVar a) (MVar a). All the other instances are in the base package, Data.Data module.
Contrary to that list, 'Data (a->b)' is still in Data.Data?
Oops, my mistake. Thanks for noticing! Attaching patches.
Applied to 6.10 and HEAD. Thanks Ian

Another attempt: These [standard] instances are uncontroversial and can accompany the Data class: [] Maybe Either () tuples Bool Ordering Char Double Float Integer Int Int8 Int16 Int32 Int64 Word Word8 Word16 Word32 Word64 These instances could also be [standard], after fixing: Ratio Complex These instances are [partial]: DataType TyCon TypeRep Handle Ptr StablePtr ForeignPtr
That depends on whether the [partial] 'Data' instances can be completed in a sensible way for these types. If they can't, then the way forward would be to split the 'Data' class into those operations supported by all current instances and those only supported by the types in [standard]. That way, one would the presence of instances would at least be indicative of the presence of instance methods, providing more static type safety. But if the 'Data' class and some instances are in 'base', there'd be no changing them (*).
Unless anyone can imagine a completion for these types, it seems the fix must await restructuring of the Data class, so these might as well accompany the Data class too. Stuff in base can be changed, just not easily or quickly. And given that Data is tied up with GHC, that may be unavoidable.

Hello,
On Wed, Sep 3, 2008 at 13:57, Ross Paterson
Another attempt:
These [standard] instances are uncontroversial and can accompany the Data class:
[] Maybe Either () tuples Bool Ordering Char Double Float Integer Int Int8 Int16 Int32 Int64 Word Word8 Word16 Word32 Word64
These instances could also be [standard], after fixing:
Ratio Complex
So Ratio would have to be given a proper gfoldl definition, to fits its
unfold:
gunfold k z c | constrIndex c == 1 = k (k (z (:%)))
Complex has a derived instance. In Data.Complex:
data (RealFloat a) => Complex a
= !a :+ !a
# if __GLASGOW_HASKELL__
deriving (Eq, Show, Read, Data)
# else
deriving (Eq, Show, Read)
# endif
Was there a problem with this?
Regarding the other places that have to be changed [1]:
./array/Data/Array.hs
The Data instance of Array should be moved from Data.Generics.Instances into
Array, to avoid syb dependency.
./containers/Data/IntMap.hs
./containers/Data/IntSet.hs
./containers/Data/Map.hs
./containers/Data/Sequence.hs
./containers/Data/Set.hs
./containers/Data/Tree.hs
./network/Network/URI.hs
./packedstring/Data/PackedString.hs
As far as I can see, these are not problematic, but the imports should be
fixed to avoid bringing into scope all the instances.
./haskell-src/Language/Haskell/Syntax.hs
./template-haskell/Language/Haskell/TH/Quote.hs
./template-haskell/Language/Haskell/TH/Syntax.hs
The use of Ratio instances here is not problematic if the instance for Ratio
is fixed?
./bytestring/Data/ByteString/Internal.hs
./bytestring/Data/ByteString/Lazy/Internal.hs
These depend on an instance for ForeignPtr Word8. What to do here?
Reclassify instances for types with phantom types into "standard", since
there is nothing to traverse?
On Wed, Sep 3, 2008 at 13:23, Claus Reinke
(*) of course, it would be nice if ghc's deriving mechanism was based on a derived instance specification in the module providing the class, rather than being hardcoded into the compiler itself; so even with 'Data' out of 'base', that part currently cannot be changed without updating ghc. or if class aliases allowed the generated old-style 'Data' instances to be re-interpreted as 'DataA' and 'DataB' instances.. but none of that is realistic for the current split decision.
I fully agree that the deriving mechanism should be customizable. That would make generic programming in general much nicer in Haskell. However, I don't think it's realistic to expect such changes for 6.10... Thanks, Pedro [1] http://article.gmane.org/gmane.comp.lang.haskell.libraries/9957

On Thu, Sep 04, 2008 at 09:44:34AM +0200, José Pedro Magalhães wrote:
Complex has a derived instance. In Data.Complex:
data (RealFloat a) => Complex a = !a :+ !a # if __GLASGOW_HASKELL__ deriving (Eq, Show, Read, Data) # else deriving (Eq, Show, Read) # endif
Was there a problem with this?
I was looking at the HEAD, where the instance was moved into Data.Generics.Instances, and used the default definition of gfoldl. Presumably if the Data class is in base, there's no problem with restoring the derived instance.
Regarding the other places that have to be changed [1]:
./array/Data/Array.hs
The Data instance of Array should be moved from Data.Generics.Instances into Array, to avoid syb dependency.
The instance needs only the Data class, so it would be independent of syb. Its proper place is the array package, but since GHC defines Array in GHC.Arr it might define the instance there too.
./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs
As far as I can see, these are not problematic, but the imports should be fixed to avoid bringing into scope all the instances.
These packages should be able to avoid depending on syb.
./haskell-src/Language/Haskell/Syntax.hs ./template-haskell/Language/Haskell/TH/Quote.hs ./template-haskell/Language/Haskell/TH/Syntax.hs
The use of Ratio instances here is not problematic if the instance for Ratio is fixed?
./bytestring/Data/ByteString/Internal.hs ./bytestring/Data/ByteString/Lazy/Internal.hs
I assume so.
These depend on an instance for ForeignPtr Word8. What to do here? Reclassify instances for types with phantom types into "standard", since there is nothing to traverse?
That was the suggestion. The types in question are Ptr, ForeignPtr and StablePtr. (I think the Var's are similar, but it's not essential to decide about them now.)

2008/9/4 José Pedro Magalhães
./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs
As far as I can see, these are not problematic, but the imports should be fixed to avoid bringing into scope all the instances.
I see some partial instances. For example, Data.Set throws exceptions
for toConstr and gunfold.
--
Dave Menendez

Hello,
On Thu, Sep 4, 2008 at 11:49, Ross Paterson
On Thu, Sep 04, 2008 at 09:44:34AM +0200, José Pedro Magalhães wrote:
Complex has a derived instance. In Data.Complex:
data (RealFloat a) => Complex a = !a :+ !a # if __GLASGOW_HASKELL__ deriving (Eq, Show, Read, Data) # else deriving (Eq, Show, Read) # endif
Was there a problem with this?
I was looking at the HEAD, where the instance was moved into Data.Generics.Instances, and used the default definition of gfoldl. Presumably if the Data class is in base, there's no problem with restoring the derived instance.
Oh, I didn't know this had changed. I was looking in 6.8.3. Then yes, it
should probably return to where it was.
On Thu, Sep 4, 2008 at 18:25, David Menendez
2008/9/4 José Pedro Magalhães
: ./containers/Data/IntMap.hs ./containers/Data/IntSet.hs ./containers/Data/Map.hs ./containers/Data/Sequence.hs ./containers/Data/Set.hs ./containers/Data/Tree.hs ./network/Network/URI.hs ./packedstring/Data/PackedString.hs
As far as I can see, these are not problematic, but the imports should be fixed to avoid bringing into scope all the instances.
I see some partial instances. For example, Data.Set throws exceptions for toConstr and gunfold.
Yes, but those are defined instances, not imported instances. I believe that is not this problem (it is a problem, just not the one we're focusing on right now). Those would have to wait for a possible "break up" of the Data class, as Claus suggested. Thanks, Pedro

José Pedro Magalhães wrote:
Complex has a derived instance. In Data.Complex:
data (RealFloat a) => Complex a = !a :+ !a # if __GLASGOW_HASKELL__ deriving (Eq, Show, Read, Data) # else deriving (Eq, Show, Read) # endif
Was there a problem with this?
Yes, isn't it exactly the same problem as Ratio? The derived instance 'leaks' the implementation detail of a complex number as a real and imaginary component. As part of a large structure I might want to apply some operation to all the 'Doubles' therein, but certainly not to the components of any Complex Doubles. You can imagine particular situations where I want to apply only to the real component, or something more subtle. You can also imagine that an alternative implementation of Complexes in polar coordinates (with a suitable solution to the principle value problem) is supposed to be abstractly the same, but would gain a totally different Data instance. Jules

Hello,
On Fri, Sep 5, 2008 at 14:34, Jules Bean
José Pedro Magalhães wrote:
Complex has a derived instance. In Data.Complex:
data (RealFloat a) => Complex a = !a :+ !a # if __GLASGOW_HASKELL__ deriving (Eq, Show, Read, Data) # else deriving (Eq, Show, Read) # endif
Was there a problem with this?
Yes, isn't it exactly the same problem as Ratio?
The derived instance 'leaks' the implementation detail of a complex number as a real and imaginary component. As part of a large structure I might want to apply some operation to all the 'Doubles' therein, but certainly not to the components of any Complex Doubles. You can imagine particular situations where I want to apply only to the real component, or something more subtle.
You can also imagine that an alternative implementation of Complexes in polar coordinates (with a suitable solution to the principle value problem) is supposed to be abstractly the same, but would gain a totally different Data instance.
I think the question here is if the datatype if abstract or not. Complex exports its (:+) constructor, so the implementation details are not hidden. Therefore it is correct to traverse its arguments. Ratio, however, does not export its (:%) constructor: Ratio is abstract, while Complex is not. I guess then Complex should have a Data instance as it is above. But regarding Ratio, do we want to change its gfoldl to match its current gunfoldl (as suggested before), or go the other way around, and change its gunfold, toConstr and dataTypeOf to match its gfoldl? If we change its gfoldl, the instance is fully defined, but does not respect the data abstraction. If we go the other way around, the instance becomes partial (gunfold and toConstr return errors), but respects the data abstraction. In any case, the behaviour would differ from the current. Any opinions on this? Thanks, Pedro

Hello Claus, Wednesday, September 3, 2008, 12:46:09 AM, you wrote:
(personally, I still like Ashley's suggestion of putting 'Data', 'Typeable' and 'Dynamic' in a separate package)
it looks like a good idea. we don't need Data in the base, just in some core library (i.e. library definitely installed with ghc) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, Sep 02, 2008 at 09:46:09PM +0100, Claus Reinke wrote:
2 the "standard"/"dubious" separation of instances was entirely preliminary; following the recent discussion, I would suggest to split the instances into three groups, in three separate modules:
[standard]: fully implemented 'Data' instances (no runtime errors). one should probably reclassify 'Ratio a' in here.
[partial]: partially implemented instances (usually for abstract types, which 'Data' doesn't handle well; whether that can be mended without changing the class remains to be seen); these include 'Array a b', 'ThreadId', etc (previously in 'Standard') and the pointer types (previously in 'Dubious'); if these instances can be completed, existing clients will simply work better (fewer runtime crashes)
If the Data class, your [standard] instances (including Ratio and Complex) and the instances for FFI pointer types are in base, then the packages you listed (array, bytestring, containers, haskell-src, network, packedstring and template-haskell) need no longer depend on syb, and syb need no longer be a boot package. That would give maximal freedom to develop syb independently, wouldn't it? (Changes to Data can't be independent of GHC, and the instances of the types listed are canonical, aren't they?) This wouldn't allow you to hide the partial instances for the pointer types, but it would be hard to stop them leaking anyway, as they're used by several of the above packages.

following the recent discussion, I would suggest to split the instances into three groups, in three separate modules: [standard]: fully implemented 'Data' instances (no runtime errors). [partial]: partially implemented instances (usually for abstract types,
If the Data class, your [standard] instances (including Ratio and Complex) and the instances for FFI pointer types are in base, then the packages you listed (array, bytestring, containers, haskell-src, network, packedstring and template-haskell) need no longer depend on syb, and syb need no longer be a boot package. That would give maximal freedom to develop syb independently, wouldn't it? (Changes to Data can't be independent of GHC, and the instances of the types listed are canonical, aren't they?)
This wouldn't allow you to hide the partial instances for the pointer types, but it would be hard to stop them leaking anyway, as they're used by several of the above packages.
That would be my guess, yes. It isn't quite "maximal" freedom because some of the things that one might want to improve would not even be in 'syb', but would stay in 'base'. But with all this surprising inertia, more freedom doesn't seem realistic anymore. Claus
participants (11)
-
Bart Massey
-
Bulat Ziganshin
-
Claus Reinke
-
David Menendez
-
Ian Lynagh
-
Jonathan Cast
-
José Pedro Magalhães
-
Jules Bean
-
Ross Paterson
-
Sean Leather
-
Simon Peyton-Jones