
On Fri, Apr 29, 2005 at 10:39:13AM +0100, Simon Marlow wrote:
On 28 April 2005 16:02, John Goerzen wrote:
Yes, but they'll all be hardlinked together, so no matter how many copies you get, the old history is only stored on disk once.
If I do 'darcs get' to get a bunch of different repositories from cvs.haskell.org to my local filesystem, they won't all end up hard-linked together, surely?
Not automatically in that case, no. But you could use darcs optimize --relink to restore them to linked status. Or better yet: 1) Check out the most recent common ancestor 2) darcs get it n times across the local filesystem (resulting in a bunch of hardlinked patches) 3) darcs pull the appropriate repo that you want in each one of them
I know it's not ideal, but trying to manually convert each part of the CVS repo, one at a time, is far, far less ideal. Not something I care to attempt, anyway :-)
I don't understand why converting each part of the repo separately is so hard? (but I've never used the tools - I'm just curious about why it's so difficult).
These are the main issues. 1) Logical projects have changed names/paths within the repo. To properly preserve the full history of each individual project will require research and manual intervention to get right. For instance, the directory currently known as greencard used to be known as green-card. If I blindly convert only the greencard directory, the history in green-card will be forever lost to the darcs repo. Tracking all the history and spending the time to do this right could be very time-consuming. If we don't care about the full history, it becomes easier. 2) Bidirectional mirroring with CVS. It's a complex enough thing to set up to begin with, and it may not be practical to do a bidirectional mirror of only part of a CVS repo. (I don't know yet.) Here's another thought: perhaps having fptools in darcs doesn't require all the CVS history; maybe we just start with a big import, have a very specific cut-over date, and keep the CVS repo around in read-only mode after that if there's a need to find older history. That would make it pretty easy to split up into separate darcs repos.
But what worries me is: if I just want to check out e.g. Haddock, I have to get the entire fptools repo (350M+, wasn't it?). I can build a source distribution with just the bits I want, but I can't get a darcs tree with anything but the whole lot.
True. But OTOH, they will only need to download about 18MB of data. (14MB for the latest checkpoint, plus about another 4MB for the inventory file + more recent patches.) This expands to roughly 304MB on-disk, since darcs by default create two copies of the checked-out files (a pristine tree and the working tree). By way of comparison, the Linux kernel source comes as a 35MB tar.bz2 and, when built, consumes a little more space. So I don't consider it to be really out of line with what people would expect to do to participate with a major project these days. Of course, downloading a 50K checkpoint plus 10K of extra data for a small project like happy would be faster. But I'd say that the more long-term question isn't technical but organizational: how do we think fptools development will shake out in the next few years? Will we still see a lot of cross-project commits? Or will we see fragmentation, where invidual projects get adopted by different people? Also, I think it's easier to split a darcs repo than it is to join them.
So, here's two potential solutions:
1. Make it possible to 'darcs get' just part of a tree. Patches that don't touch any files in the "live" parts of the tree are discarded. (I don't know if this is possible, or how difficult it is).
That's an interesting question. It's not a darcs feature now, but I also don't know how hard it is.
2. Create separate repositories for GHC, Happy, Haddock etc., and duplicate the shared fptools structure in each project. Each time we modify something in the shared part of the tree, we pull the patch into the other trees. (is it possible to cherry-pick from a tree that doesn't have a common ancestor? If not, can we make the repositories appear to have common ancestry?).
No, you can't cherry-pick if there's no common ancestor, but you can make this appear to have a common ancestor. The idea is basically to start each one from a repo that has only the common parts, in their own directory, and then merge in the relevant patches to make each unique project. I do that with my sgml-common system, which is a set of scripts and support for building documentation and manpages from DocBook SGML sources. I use it in several of my projects and it works well. -- John