[Haskell-cafe] a newbies confusion with repositories - darcs or git

1 Mar 2009

      Hi Günther,
...
But since I have the ambition to become a real haskeller I was gonna
make myself acquainted with darcs. Should I skip that and head straight
for git?
My extremely biased opinion is that you skip git and go straight for darcs ;-)

Why I love darcs
================
Git is a fine revision control system; I like it and I intend to learn more
about it.  On the other hand, darcs does some unique, which makes it similar
to Haskell (vs. other programming languages) in way, that one often has a
hard time explaining to folks what it is that makes it so special (the good
news is that the Haskell community is getting better and better at this! and
hope that the same will happen for darcs).

So let's give this a shot.  Why darcs?  In my opinion, the number one killer
feature of darcs is that it knows how to merge changes without doing *any*
guessing.  It doesn't say "well, I have this patch that's supposed to tweak
line number 4, but oops, I think I see that the line must have moved down to
line number 7 because it looks sort of similar".  Instead darcs knows with
100% certainty where the right place to apply your patch is.  This is because
it uses the full history of patches where everything goes.  There is a lot of
talk about a theory of patches, and this is largely what it's about in the
darcs world, finding a precise way to talk about patches.  See why some
Haskellers love it?  It's not just that it's written in Haskell; it's that the
whole thing is driven by something which is very clean, powerful and which
delivers a concrete impact in the real world.

We love this kind of precision not just because we're theory-fans, but because
it delivers the goods.  Being able to apply and unapply patches precisely means
that we can merge changes all we want -- we know that darcs will always
preserve user intent -- plus we can adopt all sorts of new behaviours like
cherry-picking incoming patches ("I'll take patch A, C and E; but not B and D
please; oh, you tell me I can't have E without D? Very well, then") or
cherry-picking changes that you want to undo ("Oops, I actually didn't mean to
write that thing in line 58 and 98, but I still want to keep that thing in line
68, so I'll darcs revert them"... or "Oops, actually I've shouldn't have pulled
in change C, but I still want to keep changes D E, so I'll just say 'darcs
obliterate -p C' to get rid of it").  These kinds of operations happen seamlessly
in darcs.  There is no fuss, there is no worry; everything you can do in darcs,
you can un-do.  Anything you un-do in darcs, you can re-do.

Darcs past, present and future
==============================
So darcs has a clean theory which makes it possible to use a revision control
system in a new seamless way.  Why isn't everybody using it?

Conflicts: It used to be the case that darcs had theoretical problems dealing
with large conflicts (representing a conflict would take exponential space
depending on the patches involved).  It took the darcs community many years to
address, but we eventually did, by releasing darcs 2.  Darcs 2 repositories add
enough information to the representation of conflicts which allows us to avoid
the exponential blow-up most of the time.  For the most part, the problem is
solved.  There are still cases where this blow-up can happen (we call them
"conflict fights"), but they are well understood and can be avoided.  In the
meantime, we have some folks in the community who are doing some long-term work
refining the darcs theory yet again.

Performance: In addition to the conflicts problem, darcs has suffered from a
lack of optimisation.  (In fact, many users may have complained about patch
theory problems, when they may really have been suffering more from performance
issues or a mixture of the two).  We have had and still have some practical
day-to-day performance issues.  For example, some of my larger repositories
suffer when I use them on an NFS share.  Rather than things being
instantaneous, they would take a few seconds and I would get annoyed.  Or
perhaps, darcs would make too many connections over a network to fetch a lot of
little files instead of sending them over in one pack.  Now that darcs 2 is
out, we have been focusing the bulk of our energy on tackling these basic
performance problems.   For other Haskellers in the room, I should point out
that hacking darcs performance would be a great way for you to hone your skills
and to give back to the community.  No patch theory needed!  Anyway, I wouldn't
be too worried about performance issues.  Darcs performs just fine for small to
medium sized repositories.  The darcs darcs repository has over 7000 patches
over 6 years with over 160 contributors, no sweat.

Windows: darcs support for Windows used to be a bit uneven.  There were a few
details here and there that none of us had time to get to (especially since
none of us were using Windows).  Things have gotten quite a lot better since,
and we now have a dedicated Windows Czar who is helping us to bring our Windows
support completely up to speed.

GUIs: This too is changing, although it may take a while.  Most of us are quite
happy to use the command line interface to darcs, but we would love to see what
kind of graphical interfaces folks will come out with in the future.  In the
meantime, we have recently exposed the darcs Haskell modules a sort of darcs
library, which we hope will be useful to folks working on third party tools.
Note that TortoiseDarcs has recently had a new release supporting darcs 2

Hosting:  Aside from the haskell.org services, there was no hosting for darcs,
commercial or otherwise.  Now we have patch-tag.com which provides free hosting
for open source projects and paid commercial hosting for private projects.

So putting all this together:  Darcs does precision merging, which allows you
to do cool practical things like seamless merging and cherry picking (these
make *my* life easier anyway).  There used to be conflicts issues, but they are
mostly solved.  There are performance problems, but they only affect large
repositories and they are getting better.  There was poor Windows support, this
has improved lots and has a dedicated czar.  There is TortoiseDarcs if you're
interested in a GUI.  Hosting is available on haskell.org and patch-tag.com.

Darcs and Haskell?
==================
As a final note, darcs as a hacking project would be a good project to
get interested in as a Haskeller.  It's very much a real world program,
so there are plenty of things to learn from it and plenty of things to
teach it.  We need lots of help.  For example, we could use an army of
Haskellers to scrub out our 2003 code and help us hunt down those pesky
performance bugs.

We now have a stable and more sustainable development team (I'm working on 20%
darcs time at my job with the University of Brighton and have committed myself
to spending no more than 8 hours a week on darcs to avoid burnout; a lot of our
jobs have been assigned to dedicated managers -- hello!) and are adopting some
new practices (hacking sprints, 6-month time based release schedule) to keep
development running more smoothly.

The future is bright.  Come join us!

Cheers,

Eric

P.S. Darcs will be hosting its second hacking sprint in Utrecht this
     17-19 April as part of the Haskell Hackathon.  It'll be a great
     chance to meet some darcs hackers and fix some easy bugs :-)

P.P.S. All darcs patches require an inverse.  Can a git user please
       write an inverse of this email so that Günther can make a
       more informed choice?

-- 
Eric Kow http://www.nltg.brighton.ac.uk/home/Eric.Kow
PGP Key ID: 08AC04F9

[Haskell-cafe] a newbies confusion with repositories - darcs or git

Eric Y. Kow