Re: [Haskell-cafe] fptools in darcs now available

29 Apr 2005

      On Fri, Apr 29, 2005 at 10:39:13AM +0100, Simon Marlow wrote:
...
On 28 April 2005 16:02, John Goerzen wrote:
...
Yes, but they'll all be hardlinked together, so no matter how many
copies you get, the old history is only stored on disk once.
If I do 'darcs get' to get a bunch of different repositories from
cvs.haskell.org to my local filesystem, they won't all end up
hard-linked together, surely?
Not automatically in that case, no.  But you could use darcs optimize
--relink to restore them to linked status.  Or better yet:

1) Check out the most recent common ancestor
2) darcs get it n times across the local filesystem (resulting in a
bunch of hardlinked patches)
3) darcs pull the appropriate repo that you want in each one of them
...
...
I know it's not ideal, but trying to manually convert each part of the
CVS repo, one at a time, is far, far less ideal.  Not something I care
to attempt, anyway :-)
I don't understand why converting each part of the repo separately is so
hard?  (but I've never used the tools - I'm just curious about why it's
so difficult).
These are the main issues.

1) Logical projects have changed names/paths within the repo.  
   To properly preserve the full history of each individual project will
   require research and manual intervention to get right.

   For instance, the directory currently known as greencard used to be
   known as green-card.  If I blindly convert only the greencard
   directory, the history in green-card will be forever lost to the
   darcs repo.  Tracking all the history and spending the time to do
   this right could be very time-consuming.

   If we don't care about the full history, it becomes easier.

2) Bidirectional mirroring with CVS.  It's a complex enough thing to set
   up to begin with, and it may not be practical to do a bidirectional
   mirror of only part of a CVS repo.  (I don't know yet.)

Here's another thought: perhaps having fptools in darcs doesn't require
all the CVS history; maybe we just start with a big import, have a very
specific cut-over date, and keep the CVS repo around in read-only mode
after that if there's a need to find older history.  That would make it
pretty easy to split up into separate darcs repos.
...
But what worries me is: if I just want to check out e.g. Haddock, I have
to get the entire fptools repo (350M+, wasn't it?).  I can build a
source distribution with just the bits I want, but I can't get a darcs
tree with anything but the whole lot.
True.  But OTOH, they will only need to download about 18MB of data.
(14MB for the latest checkpoint, plus about another 4MB for the
inventory file + more recent patches.)  This expands to roughly 304MB
on-disk, since darcs by default create two copies of the checked-out
files (a pristine tree and the working tree).

By way of comparison, the Linux kernel source comes as a 35MB tar.bz2
and, when built, consumes a little more space.

So I don't consider it to be really out of line with what people would
expect to do to participate with a major project these days.

Of course, downloading a 50K checkpoint plus 10K of extra data for a
small project like happy would be faster.

But I'd say that the more long-term question isn't technical but
organizational: how do we think fptools development will shake out in
the next few years?  Will we still see a lot of cross-project commits?
Or will we see fragmentation, where invidual projects get adopted by
different people?

Also, I think it's easier to split a darcs repo than it is to join them.
...
So, here's two potential solutions:
1. Make it possible to 'darcs get' just part of a tree.  Patches
     that don't touch any files in the "live" parts of the tree
     are discarded.  (I don't know if this is possible, or how
     difficult it is).
That's an interesting question.  It's not a darcs feature now, but I
also don't know how hard it is.
...
2. Create separate repositories for GHC, Happy, Haddock etc., and
     duplicate the shared fptools structure in each project.  Each
     time we modify something in the shared part of the tree, we
     pull the patch into the other trees.  (is it possible to
     cherry-pick from a tree that doesn't have a common ancestor?
     If not, can we make the repositories appear to have common
     ancestry?).
No, you can't cherry-pick if there's no common ancestor, but you can
make this appear to have a common ancestor.  The idea is basically to
start each one from a repo that has only the common parts, in their own
directory, and then merge in the relevant patches to make each unique
project.  I do that with my sgml-common system, which is a set of
scripts and support for building documentation and manpages from DocBook
SGML sources.  I use it in several of my projects and it works well.

-- John