Splitting SYB from the base package in GHC 6.10

Hello all, I'm initiating this discussion per suggestion of SP Jones and following from [1]. The issue is: SYB is being moved out of base into its own package. However, the Data class is, in a way, tied to base since it depends on the deriving mechanism. Therefore, it was suggested that the entire Data.Generics.Basics module [2] should remain in base. This module defines the Data class and several associated functions and datatypes. I don't think anyone objected to this so far: please correct me if I'm wrong, or object now. Then it was also suggested that Data.Generics.Instances [3] could stay in base (perhaps inside Basics as well). This, however, would prevent dealing with the "dubious" Data instances [4], and this was one of the motivating factors to split SYB from base. This refers concretely to the instances: instance (Data a, Data b) => Data (a -> b) instance Typeable a => Data (IO a) instance Typeable a => Data (Ptr a) instance Typeable a => Data (StablePtr a) instance Typeable a => Data (IORef a) instance Typeable a => Data (ForeignPtr a) instance (Typeable s, Typeable a) => Data (ST s a) instance Typeable a => Data (TVar a) instance Typeable a => Data (MVar a) instance Typeable a => Data (STM a) instance (Data a, Integral a) => Data (Ratio a) These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed. However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance. Claus, please correct me if I'm wrong, but if the 11 "dubious" instances (or perhaps less, given your message in [5]) go in the syb package and the remaining, "standard" ones stay in base, we: - Mantain backwards compatibility regarding SYB in 6.10, and - Can still deal with the issue by releasing a new version of the syb package later, independently of GHC. Since the deadline for 6.10 is approaching I'm assuming that we should try to minimize the changes there, while keeping future development in the syb package as open as possible. Finally, there are module naming issues, which are probably secondary to the issue above and can be dealt with separately and later. Thanks, Pedro [1] The base library and GHC 6.10: http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9929 [2] http://www.haskell.org/ghc/dist/stable/docs/libraries/base/Data-Generics-Bas... [3] http://www.haskell.org/ghc/dist/stable/docs/libraries/base/Data-Generics-Ins... [4] http://www.haskell.org/pipermail/generics/2008-June/000347.html [5] http://article.gmane.org/gmane.comp.lang.haskell.libraries/9957

Hello Jose, Monday, September 1, 2008, 4:49:01 PM, you wrote:
These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed.
afaiu, this solution isn't specific to SYB, so it doesn't make sense to move this into SYB. alternatively, we can move these instances into separate module in order to allow user control whether these instances are imported -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

The issue is: SYB is being moved out of base into its own package. However, the Data class is, in a way, tied to base since it depends on the deriving mechanism.
My understanding is that the deriving mechanism would still work if class 'Data' was moved into 'syb', but changes in 'Data' would still need to be matched in the deriving mechanism (which isn't auto-generated from 'base', either). As long as 'syb' remains a core library, we can thus focus on assigning modules to 'syb' or 'base' by functionality.
Therefore, it was suggested that the entire Data.Generics.Basics module [2] should remain in base. This module defines the Data class and several associated functions and datatypes. I don't think anyone objected to this so far: please correct me if I'm wrong, or object now.
Assuming this is based on 'Data.Generics.Basics' and 'Data.Typeable' being of more general use than the rest of 'syb' (justifying a preferred dependency on 'base' rather than 'syb'), not any implementation constraints, I don't object in general. It does suggest a separate 'data-reflect' package for these two modules, but that could be left for later. However, if 'Data' is in 'base', and the 'data' types are in 'base', then the 'Data' instances for those 'data' types should probably also be in base (*) (the instance for 'Array a b' ought to move to 'array'). And the short-term issue with this is that these instances, their location, and their importers, need some revision, while 'base' wants to be stable. The hope was that splitting off 'syb' from 'base' would contain the changes in a package with named maintainer, outside 'base'. Wouldn't it be easier to have all of 'Data' in 'syb', at least until 'Data' and 'Typeable' move into their own package? But if you can find a way to make the 'Data'-in-'base' route work, I'm not going to object.
Then it was also suggested that Data.Generics.Instances [3] could stay in base (perhaps inside Basics as well). This, however, would prevent dealing with the "dubious" Data instances [4], and this was one of the motivating factors to split SYB from base. This refers concretely to the instances:
Rearranging the list slightly, for easier reference: -- these have (or produce) substructures of type 'a', which aren't -- traversed by the current Data instances (contrary to what one -- would expect, say, from a generic 'fmap' over these types)
instance (Data a, Data b) => Data (b -> a) instance Typeable a => Data (IO a) instance (Typeable s, Typeable a) => Data (ST s a) instance Typeable a => Data (STM a) instance Typeable a => Data (IORef a) instance Typeable a => Data (TVar a) instance Typeable a => Data (MVar a)
-- here, the 'a' is a phantom type, without matching substructures
instance Typeable a => Data (Ptr a) instance Typeable a => Data (StablePtr a) instance Typeable a => Data (ForeignPtr a)
-- here, the 'a' corresponds to substructures that should only -- be visible through the abstract interface, on top of which a -- 'data'-like view can be provided
instance (Data a, Integral a) => Data (Ratio a)
In addition, a longer list of instances offer only runtime errors for some 'Data' operations (most notably for 'gunfold', though abstract types in general have a problem with reflection support). Are these necessary or would they profit from closer investigation? If the latter, those instances should probably not be in 'base'.
These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed.
Well, there are alternative instances that would at least improve traversal support [3], but that wouldn't work for queries, I think.
However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance.
Indeed, the implicit presence of these instances is the main issue, and reducing their presence and propagation affects 'base' and other core and extra libaries, so needs to happen soon.
Claus, please correct me if I'm wrong, but if the 11 "dubious" instances (or perhaps less, given your message in [5]) go in the syb package and the remaining, "standard" ones stay in base, we: - Mantain backwards compatibility regarding SYB in 6.10, and - Can still deal with the issue by releasing a new version of the syb package later, independently of GHC.
issues to consider, of the top of my head: - to what extent can core libraries be updated independent of 'base'? - unless 'base' can now be updated (there are two versions of 'base' in ghc head), 'base' must not depend on 'syb' - which other core libraries depend on 'syb'? are they updateable? - the current importers of (parts of) 'Data.Generics' need to be revised [1] - instances cannot be deprecated - since all instances are in one module, one could deprecate the module, but are module deprecations propagated to their importers automatically? - would 'Data.Generics' need to be deprecated, in favour of a version that does not implicitly re-export any/all instances? [2] Maintaining strict backwards-compatibility in 6.10 while still allowing for changes in 'syb' is going to be difficult, if only because clients might depend on 'Data.IntSet' and the like to re-export all current 'Data' instances, which we certainly want to stop. My 'syb-utils' [2] has alternatives to 'Data.Generics' that export either only standard instances or no instances, which would allow to deprecate all 'Data.Generics*' modules that are less specific about their instance exports, but would require use of alternative module names..
Since the deadline for 6.10 is approaching I'm assuming that we should try to minimize the changes there, while keeping future development in the syb package as open as possible.
Definitely. But some choices need to be made now. Mainly what goes where, how to handle deprecation, and how to reduce implicit instance propagation. Claus [1] http://article.gmane.org/gmane.comp.lang.haskell.libraries/9957 [2] http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/#syb-utils [3] http://www.haskell.org/pipermail/libraries/2008-July/010319.html (*) this isn't a firm rule, either: recently, it was decided to keep the 'Data' instances for 'ghc' types out of 'ghc'..

The issue is: SYB is being moved out of base into its own package.
However, the Data class is, in a way, tied to base since it depends on the deriving mechanism.
My understanding is that the deriving mechanism would still work if class 'Data' was moved into 'syb', but changes in 'Data' would still need to be matched in the deriving mechanism (which isn't auto-generated from 'base', either). As long as 'syb' remains a core library, we can thus focus on assigning modules to 'syb' or 'base' by functionality.
So, here's a (possible) summary from a general perspective. (1) Some people want to keep some parts of the SYB functionality in 'base', because these parts are closely linked to some parts of GHC. This is desired for convenience (and perhaps test coverage?). (2) Some people want to remove some parts of the SYB functionality from 'base', because they want to be able to maintain and release SYB separately. (3) Some people in group #2 are not sure what should be left in 'base' or extracted into 'syb.' My observations: (A) I don't see 'syb' ever becoming something other than a core library for GHC, considering it's close family ties. (B) I expect 'syb' to get updated and released more often than GHC. This is especially true considering the newfound interest. (C) I expect the 'syb' library will be tested using the current (and possibly past?) release(s) of GHC, because that's what releases will use in general. If something in a development version of GHC breaks SYB, then there may need to be a new 'syb' release for when that version of GHC is released. At that point, there may be a need for a temporary fork if other work is ongoing. (C) From a user's perspective I don't understand the splitting of SYB. Why is it that I can derive Data.Generics.Data, but I cannot actually use other functions built for it? So, given all of the above (assuming it's correct), it seems to me that the benefit leans towards migrating everything SYB-related into the 'syb' library. Is the motivation/argument for group #1 very strong? Hope this helps, Sean

Sean Your analysis is good, but missing the following You can build stuff on class Data *other than* SYB. That's a motivation for not identifying Data with SYB. That's really the argument for keeping Data in 'base', so that others can build on it without depending on the full glory of SYB. (A weaker argument is that GHC "knows" about Data, to support 'deriving'. But that's less important.) Simon From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Sean Leather Sent: 01 September 2008 21:23 To: Claus Reinke Cc: José Pedro Magalhães; ross@soi.city.ac.uk; Simon Peyton-Jones; libraries@haskell.org; generics@haskell.org; igloo@earth.li Subject: Re: Splitting SYB from the base package in GHC 6.10 The issue is: SYB is being moved out of base into its own package. However, the Data class is, in a way, tied to base since it depends on the deriving mechanism. My understanding is that the deriving mechanism would still work if class 'Data' was moved into 'syb', but changes in 'Data' would still need to be matched in the deriving mechanism (which isn't auto-generated from 'base', either). As long as 'syb' remains a core library, we can thus focus on assigning modules to 'syb' or 'base' by functionality. So, here's a (possible) summary from a general perspective. (1) Some people want to keep some parts of the SYB functionality in 'base', because these parts are closely linked to some parts of GHC. This is desired for convenience (and perhaps test coverage?). (2) Some people want to remove some parts of the SYB functionality from 'base', because they want to be able to maintain and release SYB separately. (3) Some people in group #2 are not sure what should be left in 'base' or extracted into 'syb.' My observations: (A) I don't see 'syb' ever becoming something other than a core library for GHC, considering it's close family ties. (B) I expect 'syb' to get updated and released more often than GHC. This is especially true considering the newfound interest. (C) I expect the 'syb' library will be tested using the current (and possibly past?) release(s) of GHC, because that's what releases will use in general. If something in a development version of GHC breaks SYB, then there may need to be a new 'syb' release for when that version of GHC is released. At that point, there may be a need for a temporary fork if other work is ongoing. (C) From a user's perspective I don't understand the splitting of SYB. Why is it that I can derive Data.Generics.Data, but I cannot actually use other functions built for it? So, given all of the above (assuming it's correct), it seems to me that the benefit leans towards migrating everything SYB-related into the 'syb' library. Is the motivation/argument for group #1 very strong? Hope this helps, Sean

On Tue, Sep 2, 2008 at 12:50, Simon Peyton-Jones wrote:
Sean Your analysis is good, but missing the following
You can build stuff on class Data **other than** SYB. That's a motivation for not identifying Data with SYB.
That's really the argument for keeping Data in 'base', so that others can build on it without depending on the full glory of SYB.
Ah, okay. Then, that is a stronger argument. Thanks, Sean
participants (5)
-
Bulat Ziganshin
-
Claus Reinke
-
José Pedro Magalhães
-
Sean Leather
-
Simon Peyton-Jones