On Tue, Jan 4, 2011 at 11:51 AM, Christian Maeder
classes and data types are usually created independently and one cannot expect all data type maintainers to continuously adapt their packages to provide instances for new classes.
That is a downside of the approach I'm proposing. It's small in comparison to the downsides of the alternatives. Note that the opposite approach, having the package that defines the type class define the instances, would mean that the maintainers of that package would have to add a new instance for every data type (that makes sense) that gets added to Hackage. Since the number of data types is much larger than the number of type classes, my proposal distributes the work among many more maintainers. The problem with having the package defining the class depend on the packages defining the data types is that this approach doesn't scale. If we applied it consistently the deepseq package should depend on every package on Hackage that defines a data type (i.e. every package on Hackage), as NFData instances make sense for just about every data type. The current deepseq packages picks two at semi-random (i.e. due to legacy reasons.)
Will the next step be to move the Binary instances from the binary package to containers, too? (There are plenty of other serialization classes!)
Aside: I think the binary package should be split into two packages. It currently exports two completely separate pieces of functionality: * Low-level primitives for reading/writing machine native types: integers (unsigned or not) and binary blobs. * A specific and undocumented data format, via the Binary class. I don't think a type class approach is necessary the right thing. However, if we stick with one I suggest that the package provides instances for the data type in base, as the package (and indeed almost all packages) have to depend on base anyway. If the data format is to be extended to serialize containers the instances should be defined in terms of another type class and not in terms of a particular type. Tying a data format to a particular implementation of a container doesn't sound like a great idea. Here's a sketch of what this could look like: class IMap where empty :: ... insert :: ... instance IMap m => Binary m where get = do -- decode data and call insert repeatedly. The binary package would depend on the package that defines this data type.
Orphan instances should be avoided as they can cause hard to prevent and hard to fix breakages in large code bases.
The problem with large code bases are only duplicate orphaned instances that are added only as non-separated parts of other code.
If all code would be based on the same instances (provided by a central package on hackage!) I see no problem.
If I understand you correctly you're suggesting that the problem is solved by convention e.g. that all instances belonging to a particular type class is found in a particular package and no one should define an instance of this type class elsewhere? Is that a correct interpretation? I think there are at least three issues with this approach: * The package that defines the instance might have to depend on all of Hackage (see above). This is the same as when the package that defines the type class also defines the instances. * I don't think this is enforceable, even by convention. Where do you put instances for data types not on Hackage (e.g. in some company's source control repo)? * You need to tell everyone to not define instances for data types in the same package as they define the data type as all instances are supposed to be in the special instances package. If they do add the instance in the package that defines the data type, things will break when the *-instances package adds the same instance.
If NFData is such an important class it should go into the base package. Since other data types depend on base anyway, then there is no need to change the dependency from "base" to "base, deepseq" for many data packages.
Any packages dependency problem can be solved by having only one package (e.g. base). But all Haskell code (not even all type classes people might ever define) can live in base. We can try to avoid the question of where to declare what by say that some type classes are "important" and goes into base and pretend other type classes don't exist. This doesn't really answer the question of where to define type classes and their instances, in general, though.
Is it possible to find out where (if) NFData instances of container types are actually used (for hackage packages)?
They are used in a bunch of Criterion benchmarks for different packages at the moment. Johan