Re: Splitting SYB from the base package in GHC 6.10

These instances are defined in such a way that they do not traverse the datatype. In fact, there is no other possible implementation, and this implementation at least allows for datatypes which contain both "regular" and "dubious" elements to still have their "regular" elements traversed. However, this implies that a user cannot redefine such instances even in the case where s/he knows extra information about these types that would allow for a more useful instance definition, for instance.
|These two statements appear to be contradictory. Perhaps an example of |a possible instance would help. "no other possible implementation" is an overstatement, though an easy one to make: those 'Data' instances are incomplete because better instances are hard to come by. One can perhaps do little improvements, like replace the effective 'gmapT = id' for 'IO a' and 'b -> a' with something like[1]: gmapT f fun = f . fun -- instead of gmapT f fun = fun gmapT f io = (return . f) =<< io -- instead of gmapT f io = io but that still doesn't make those instances complete. If it wasn't for the partial uses, like skipping 'IO a' and 'b -> a' as parts of derived 'Data' instances, one wouldn't want these instances at all, imho (at least not in their current form). Then there are abstract types, for which the current default when implementing reflection is to assume "no constructors", hence no basis for 'gunfold', hence more incomplete 'Data' instances and runtime errors. It might be possible to experiment with associating exactly one, abstract, constructor with each abstract type instead, but that isn't something I'd like to bake in without more experience. Another way to look at it: 'Data' tries to do too much in a single class, and the consequence are all those half-implemented 'Data' instances. The probable long-term solution is to split 'Data' into 2 or 3 classes, so that we know that a any type instantiating 'DataGfoldl' really supports 'gfoldl' b any type instantiating 'DataGunfold' really supports 'gunfold' c any type instantiating 'DataReflect' really supports 'Data' reflection Currently, too many types instantiate 'Data' without supporting b or c (-> runtime errors), and a few instances don't even support a. All of which suggests that 'Data' should probably leave 'base', as it needs to evolve further? |Claus argued that -> and the monads could be treated by analogy |with Show for these types. I had mentioned 'Text.Show.Functions' as an example of "improper" instances provided for optional import to support 'deriving Show'. But when I read your sentence, my first thought was: perhaps there's also a way to apply the showList trick? That would neatly avoid either changing the 'deriving' mechanism or having dummy instances. More reason for moving everything to 'syb', keeping it flexible for a while. |There is an additional problem with types like ThreadId, Array, ST, STM, |TVar and MVar: they're notionally defined in other packages, even though |they're actually defined in partially-hidden GHC.* modules in base and |re-exported. Would it be sufficient for 'syb' to depend on both 'base' and those notional source packages? It would be useful to keep the instances in 'syb' until the 'Data' story has settled down, after which the instances ought to move to their 'data' type source packages. Claus [1] http://www.haskell.org/pipermail/libraries/2008-July/010319.html

Hello,
On Mon, Sep 1, 2008 at 21:04, Claus Reinke
"no other possible implementation" is an overstatement, though an easy one to make: those 'Data' instances are incomplete because better instances are hard to come by. One can perhaps do little improvements, like replace the effective 'gmapT = id' for 'IO a' and 'b -> a' with something like[1]:
gmapT f fun = f . fun -- instead of gmapT f fun = fun
gmapT f io = (return . f) =<< io -- instead of gmapT f io = io
But wouldn't these introduce additional inconsistencies? At least if introduced in the library itself. I am used to think that gmapT is implemented using gfoldl, and is only inside the Data class to allow for more efficient implementations, and not for alternative implementations...
Another way to look at it:
'Data' tries to do too much in a single class, and the consequence are all those half-implemented 'Data' instances. The probable long-term solution is to split 'Data' into 2 or 3 classes,
so that we know that
a any type instantiating 'DataGfoldl' really supports 'gfoldl' b any type instantiating 'DataGunfold' really supports 'gunfold' c any type instantiating 'DataReflect' really supports 'Data' reflection
Currently, too many types instantiate 'Data' without supporting b or c (-> runtime errors), and a few instances don't even support a.
All of which suggests that 'Data' should probably leave 'base', as it needs to evolve further?
Just for my understanding, can you give me an example of a datatype which currently has (b) but not (c) and vice-versa? Anyway, I guess keeping Data inside base does not preclude such splitting of Data: for backward compatibility the original Data would have to remain available, right?
|Claus argued that -> and the monads could be treated by analogy |with Show for these types.
I had mentioned 'Text.Show.Functions' as an example of "improper" instances provided for optional import to support 'deriving Show'.
But when I read your sentence, my first thought was: perhaps there's also a way to apply the showList trick? That would neatly avoid either changing the 'deriving' mechanism or having dummy instances.
More reason for moving everything to 'syb', keeping it flexible for a while.
By "everything" do you mean all instances or all the "dubious" ones? IIRC, the argument for having the "standard" instances in base is that leaving Data alone without any instances would mean that in most cases you would have to import SYB anyway to get any functionality. Or are there other reasons? Thanks, Pedro

gmapT f fun = f . fun -- instead of gmapT f fun = fun
But wouldn't these introduce additional inconsistencies? At least if introduced in the library itself. I am used to think that gmapT is implemented using gfoldl, and is only inside the Data class to allow for more efficient implementations, and not for alternative implementations...
Well, I'd like to define 'gmapT' in terms of 'gfoldl' (in a non-trivial, sensible way). The default for gfoldl is 'gfoldl _ z = z', but that doesn't help much here since 'z's type is rather too polymorphic to be of use: 'forall c g . g -> c g'. I've wondered occasionally whether requiring 'Typeable g' there would help. The next try is to expand our function, so that we can pretend we have some constructor to work on in 'gfoldl': -- fun ==> \x->fun x ==> (\fun x->fun x) fun Then we can do (using scoped type variables to fix the 'a' and 'b'): gfoldl k z fun = z (\fun x->fun x) `k` fun -- gmapT f fun = f . fun gmapT f fun = unId $ gfoldl (k f) (Id) fun where k f (Id c) x = Id (c (case (cast x :: Maybe (a -> b)) of Just x -> fromJust $ cast (f . x) Nothing -> x)) but whether that is very enlightening, I wouldn't want to say;-)
Just for my understanding, can you give me an example of a datatype which currently has (b) but not (c) and vice-versa?
b ('toConstr'&co) usually comes with c ('gunfold'). I've defined some 'Data' instances which implemented b without c, but I don't think that is typical. My reason for splitting the functionality in three ('gfoldl', 'toConstr', 'gunfold') was just to be systematic, hoping in particular for implementations of 'gunfold' (or, more generally, constructing 'data' from parts) that do not depend on reflection.
Anyway, I guess keeping Data inside base does not preclude such splitting of Data: for backward compatibility the original Data would have to remain available, right?
It used to be the case that 'base' could not be updated, so anything in it would be fixed until the next ghc release. Preserving the original 'Data' would also preserve the original clients and incomplete instances, which is not what one would want (instead, one would want to instantiate just those component classes whose methods can be implemented and used without runtime errors, preserving compatibility of non-failing code). But that is all far future, 6.12 or so, not urgent now. I just mentioned it because there is very little about SYB that I'm sure about, and this is another example of something that might be worth looking into. And the more you keep in 'base', the less you can improve.
More reason for moving everything to 'syb', keeping it flexible for a while.
By "everything" do you mean all instances or all the "dubious" ones? IIRC, the argument for having the "standard" instances in base is that leaving Data alone without any instances would mean that in most cases you would have to import SYB anyway to get any functionality. Or are there other reasons?
Note the "for a while" there. If you are at liberty to change 'base' and users can update 'base' without waiting for the next ghc release, then you can do the changes in 'base'. Otherwise, everything that might change should be in a package you can change and users can update. Making that package 'syb' keeps things simple - later, after things have settled down again, one could spin off 'Data' and 'Typeable' into their own package ('data-reflection', 'introspection', ..). Or one could re-integrate 'Data' into 'base' to get smaller 'build-depends' (and less accurate Cabal dependencies..). But while you're looking into improving things, they need to be changeable, and 'base' usually isn't. Claus
participants (2)
-
Claus Reinke
-
José Pedro Magalhães