
On Sat, Jan 23, 2010 at 4:57 PM, Jeremy Shaw
On Sat, Jan 23, 2010 at 7:57 AM, Neil Mitchell
wrote: No, that's definitely not correct, or even remotely scalable as we increase the number of abstract types in disparate packages.
Yes.. happstack is facing another aspect of this scalability issue as well. We have a class, Serialize, which is used to serialize and deserialize data. It builds on the binary library, but adds the ability to version your data types and migrate data from older versions to newer versions. This has a serious scalability issue though, because it requires that each type a user might want to serialize has a Serialize instance. So do we: 1. provide Serialize instances for as many data types from libraries on hackage as we can, resulting in depending on a large number of packages that people are required to install, even though they will only use a small fraction of them. 2. convince people that Serialize deserves the same status as Data, and then convince authors to create Serialize instances for their type? It would be nice, but authors will start complaining if they are asked to provide a zillion other instances for their types as well. And they will be annoyed if they their library has to depend on a bunch of other libraries, just so they can provide some instances that only a small fraction of their users might use. So, this method does not scale as the number of 'interesting' classes grows. 3. let individual users define the Serialize instances as they need them. Unfortunately, if two different library authors defined a Serialize instance for Text in their libraries, you could not use both libraries in your application because of the conflicting Serialize instances. So this method does not scale when the number of libraries using the Serialize class grows. Not really sure what the work around is. #1 could work if there was some way to just selectively install the pieces as you need them. But the only way to do this now would be to create a lot of cabal packages which just defined a single instance -- happstack-text, happstack-map, happstack-time, happstack-etc. One for each package that has types we want to create a serialization instance for... Any other suggestions? - jeremy
The only safe rule is: if you don't control the class, C, or you don't control the type constructor, T, don't make instance C T. Application writers can often relax that rule as the set of dependencies for the whole application is known and in many cases any reasonable instance for a class C and constructor T is acceptable. Under those conditions, the worst-case scenario is that the application writer may need to remove an instance declaration when migrating to new versions of the dependencies. When you control a class C, you should make as many (relevant) type constructors instances of it as is reasonably possible, i.e. without adding any extensive dependencies. So at the very least, all standard type constructors. Similarly for those who control a type constructor T. This is for convenience. These correspond to solutions #1 and #2 only significantly weakened. Definitely, making a package depend on tons of other packages just to add instances is NOT the correct solution. The library writers depending on a package for a class and another package for a type are the problem case. There are three potential solutions in this case which basically are reduce the problem to one of the above three cases. Either introduce a new type and add it to a class, introduce a new class and add the types to it, or try to push the resolution of such things onto the application writer. The first two options have the benefit that they also protect you from the upstream libraries introducing instances that won't work for you. These two options have the drawback that they are usually less convenient to use. The last option has the benefit that it usually corresponds to having a more flexible/generic library, in some cases you can even go so far as to remove your dependence on the libraries altogether. One solution to this problem though it can't be done post-hoc usually, is to simply not use the class mechanism except as a convenience. This has the benefit that it usually leads to more flexibility and it helps to realize the third option above. Using Monoid as an example, one can provide functions of the form: f :: m -> (m -> m -> m) -> ... and then also provide f' = f mempty mappend :: Monoid m => ... The parameters can be collected into a record as well. You could even systematize this into: class C a where getCDict :: CDict a, and then write f :: CDict a -> ... and f' = f getCDict :: C a => ... Whatever one does, do NOT add instances of type constructors you don't control to classes you don't control. This can lead to cases where two libraries can't be used together at all.