
On 10 June 2010 12:38, sterl
There's a big range of issues here, and to be honest I'm not sure if our ability to distinguished between them is helped by the title of this thread, which somewhat begs the question. That is to say, it isn't clear to me that calling the proposed changes to the fgl "rewriting a library" is necessarily accurate -- it seems more the case that these are incremental improvements of a library that require breaking API changes.
Except it is a re-write in the truest sense of the word: we started completely from scratch. We did compare our API with the current API (in an attempt to keep function names, etc. the same where possible because I'm hopeless at choosing names) but we didn't exactly take the class as-is and then change it. On the other hand, we were both familiar with the current version of FGL and how it's layed out, so there's probably some implicit influences from there as well. Oh the other hand, it can also be considered as incremental improvements: we wanted to keep the terminology and fundamental concepts as similar as possible to avoid having a jarrring change in how its used. Instead, we focused on improving the current version: using explicit data types for Context and Edge (for why Edge needs a data type of its own, read the Graph section of "Fun with type functions" by Oleg, SPJ and Chung-chieh Shan) rather than tuple aliases; allowing restrictions on the label types (though we've just come across a problem where this doesn't play nicely with mapping functions); increasing the scope for per-instance optimisations, etc. So in a sense we did a re-write that happened to come close to the current definition. This is not to say that this is because the current API is close to an ideal perfect API, but rather because we were focussing on developing something _like_ the current version without worrying about compatability too much.
So on the concrete issue at hand, I'd be for the new fgl version being developed under some new provisional name, and taking pains to provide a compatibility layer where possible. Then, after we see what the changes really are, coming to some informed decision on whether to rebrand it as the real fgl version 6. If so, the old stable fgl can be put up on hackage as fgl98, which lets packages which want to stick with it do so while avoiding any possibility of the dread diamond dependency.
Considering the "rename the old version" issues first: * It won't solve the problem of people not specifying correct constraints on the version of fgl used, since it means they'd have to edit their dependencies to use fgl98 or whatever anyway. * Calling it "fgl98" is on a slippery slope: what happens when GHC-6.14 comes out with Haskell2010 support? Do we then release an fgl2010 version as well? (I believe Ross brought this problem up already). * I'm wanting people to move _off_ of the old version of fgl. The only real advantage (though how practical this will be in the real world is debateable IMHO) is that the current version doesn't use any extensions whereas the new one does (and they're needed to provide the asked-for functionality of letting instance writers constrain the types of labels - i.e. the reason why Set isn't an instance of Functor - and to have custom Node types). Since (like it or not) for the most part when people write Haskell code they use GHC and GHC supports these extensions, I do not think this is that much of a problem (I am open to being convinced otherwise about this though; I think it would be _great_ if there were other Haskell compilers that were as good as if not better than GHC in terms of runtime, etc. ... until I start considering how to manage two different compilers in Gentoo, etc. :p). As for having a temporary name for the testing releases, I am open to doing so, but this in affect pollutes the package name-space with packages that shouldn't/wouldn't be used. I would prefer to host it elsewhere and just tell people to grab a copy and see what they think rather than use a temporary name and then change it later when its "stabilised". It would be preferable IMO that if we were going to change package names then it should be done once and then not changed again.
More broadly, we have to accept that breaking API changes are an irritating but necessary fact of life. As much as the parsec and quickcheck issues have caused some modest pain, there's been equal hassle from things like the strictness behavior of binary, or even the type change in tagsoup. Splitting out Category from Arrow caused me probably the most hassle. In retrospect it was the right thing to do. But how it was done was particularly abrupt and painful. Exceptions got it right in pretty much every respect, but still migration necessarily took some work. We want our packages to grow, including our core packages. Otherwise we get fragmentation and duplicated effort. When we want to grow, but don't know exactly how, then we get experimentation. But experimentation without some organization can lead to the wrong sort of fragmentation -- like the mtl mess, whose resolution now thankfully seems to be in hand.
Right. It's this abrupt change that I'm trying to avoid by publically warning people ahead of time that they should fix their package dependencies and then have a series of preview releases to see what people think. I think to an extent the base-3 to -4 transition coupled with exceptions was a jarring/coming-of-age point for the Haskell community in terms of dependencies. We'd already had the split base issue for base-2 to -3, but that mainly involved using the split-base flag in our .cabal files and adding dependencies on containers, arrays, etc. where needed (and could almost have been automated). However, with the transition to base-4 we really started to get serious about proper versioned dependencies (which is what I was trying to avoid by starting this whole chain of emails) because developers were blindly specifying either just base or "base >= 3" (and in some cases silly things like "base < 5"; this has also occurred in packages that came out after GHC 6.10.1 was released by people that should know better resulting in packages that didn't build with base-4). In cases like QuickCheck, Parsec, etc. this version dependency issue is only half the problem (the other half being diamond dependencies). But with fgl, the problem isn't that severe since there are very few libraries that use FGL; most usages seem to be for applications. As such, the diamond dependency problem isn't that much of a consideration in this case. However, the mtl vs. transformers issue is to an extent a problem here: if users have both versions of fgl installed (which ghc-pkg lets you do), then there will be issues with developers trying to use one but not the other. For this problem, the best solution is probably to make a concerted effort with all package maintainers that use fgl to do a mass upgrade release at the same time the new version of fgl is publically released (in terms of actually being worth using rather than "hey, how about we do it this way? you happy with this now?" preview releases).
Some lessons I think we can learn from the past about changes to widely-used stable APIs: * Clear and documented upgrade paths.
We're planning on writing suitable upgrade documentation.
* Preferably a compat layer (Exceptions and Parsec both did a killer job with this).
Probably not going to happen here unfortunately. However, there are a few pseudo-compatability options that can help resolve this: * When I get the generic graph class written (in about a months' time at AusHack), people should start migrating their code to as low a class in the hierarchy as they can (so if they don't need the inductive nature of fgl, then there's no reason for them to specify doing so in their type signatures). * If you're writing an application rather than a library for graphs, pick an appropriate graph type and stick with it (using various type aliases where necessary). That way, rather than having to have polymorphic type signatures with type family notation (so stuff like "Num (EdgeLabel g), NodeLabel g ~ ()") you can just use the actual type or an alias of the graph type you're using. We might be able to provide this type of alias notation for a default graph type, but it will probably require a different module to be imported than what is currently use; i.e. new fgl still won't be a drop-in replacement for old fgl.
* No, or demonstrably minimal performance regressions.
In this case, the actual library itself is just a type class with a couple of default instances so there should be no regressions. In fact, we're increasing the scope of per-instance optimisations so the default graph type (based upon what is currently in Data.Graph.Inductive.PatriciaTree) may end up being faster in some situations (e.g. mapping over the labels).
* Strong release notes and other documentation, either duplicating or supplementing what existed prior.
Definitely. We're even considering using the new instance-level documentation that Haddock 2.7 provides.
* For particularly long-lived stable APIs, forking off a maintenance-mode-only version may make good sense, especially when the subset of language extensions used differs significantly.
I'm hoping that in this case that won't be neccessary. What might happen is that along with the preview releases (whether fgl-6.x or otherwise), we might slowly start backporting some features (e.g. usage of the generic graph class library) to the 5.y series.
Some lessons to us API consumers who write somewhat-less-core packages: * Upper version bounds.
Pretty please? :p I'm really looking forward to when Cabal supports PVP opt-in so that Hackage will complain if you don't have proper bounds on packages that follow the PVP.
* If at all possible, don't move to the fancy new thing until the fancy new thing is fully baked, and on track to widespread adoption. (early adopters of new mtl implementations, I'm looking at you :-))
To an extent, this is a bit of a mixed bag; in cases like mtl vs transformers, if we didn't have the early adopters then we would have no impetus for _anyone_ to use the new version. That said, in this case, DON'T USE THE NEW VERSION OF FGL UNTIL WE SAY IT'S OK (probably the 7.x series)!!!!!!!!!!!!!!!!!!!!!!!
* If at all possible, try to stay compatible with at least the prior GHC version as well as the current.
At the moment, this is rather easy to do unless you want to take advantage of (or are being bitten by) the new locale-aware stuff in GHC 6.12.
* Don't pull in big packages for small reasons unless really necessary -- minor duplication of trivial code is often the lesser evil.
I would argue that for this reason big packages may want to consider being split up into smaller, more manage-able smaller packages. For example, we're going to split off the Data.Graph.Inductive.Query.* modules into an fgl-algorithms package to make them easier to maintain, etc.
Some lessons for folks exploring new variants: * Don't step on already-used module names.
This depends on the situation; transformers (+ monads-fd) was meant to serve as a drop-in replacement for mtl; this, however, obviously causes problems when people are trying to use one or the other in ghci and have both installed. In this case, for fgl we're wanting to do a library upgrade, so IMO it makes sense to use the same module names.
Some technical issues that will help as time goes on (many already underway): * Depreciation of packages on hackage/redirects. (Makes it easier to establish upgrade / migration / transition paths).
There is already some support for this: packages on Hackage can be explicitly marked as being deprecated and as such won't appear on the default package listing (IIUC) but will still be pulled in by cabal-install if necessary.
* Tree organization of packages on hackage. (Reduces the noise generated by lots of small packages, and so encourages splitting things out).
Not sure what you mean by this. If you're talking about per-category trees, this wont' quite work: some packages will appear in multiple categories (e.g. data structures + graphs) and as such this won't be a tree.
* Wikilike documentation features on hackage (lets users contribute and share upgrade paths, etc. more directly and simply -- hopefully will help with community documentation of packages in general).
Coming soon (as soon as someone codes it)!
* The "local usage" annotation for cabal files to help avoid the dread diamond dependency.
I understand that this requires support in ghc-pkg first.
* A DSL to describe transforms of Haskell programs for at least simple API migrations. Yes, this is a bit more "out there" but it's a great space to explore. The upside is not only better tools to help authors migrate their code, but a strong representation of what exactly the API changes are. So even if the spec language describes things that can't be applied automatically, it can still formalize what authors need to do. A standard format for an API change log as a hackage plugin would be a good start to this.
I would be wary of anything that tried to automagically upgrade my code, since there would most likely be subtleties that it won't get right. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com