
We are happy to announce the first prerelease version of darcs 2! Darcs 2 will feature numerous improvements, and this prerelease will also feature a few regressions, so we're looking for help, from both Haskell developers and users willing to try this release out. Read below, to see how you can benefit from this new release, and how you can help us to make the final darcs 2 release the best ever! (for the latter, see http://wiki.darcs.net/index.html/DarcsTwo/HowToHelp) (for an expanded version of this announcement, see http://wiki.darcs.net/index.html/DarcsTwo) Darcs 2 features user-visible changes two broad categories, and several under-the-hood improvements designed to improve code stability and safety. The user-visible changes are a new "hashed" repository format, and the new darcs-2 conflict handling. The new "hashed" repository format can be used in a manner that is interchangeable with older darcs--although older versions of darcs cannot read the hashed format, darcs 2 can allows you to exchange patches between repositories in new and old formats. The new conflict handling benefits from the new hashed format, but also requires a repository conversion that is not backwards-compatible, so projects switching to darcs-2 format will have require that all their users upgrade to darcs 2. === Getting darcs 2 === You can get a prerelease version of darcs 2 either by getting the latest unstable darcs darcs get http://darcs.net/repos/unstable or by downloading the prerelease tarball from http://darcs.net/darcs-2.0.0pre1.tar.gz. Once you've compiled your new darcs, you could take it for a test drive by getting a fresh copy of darcs with the hashed repository format: darcs get http://darcs.net/repos/unstabled-hashed = Hashed repository format = We expect that most testers of darcs 2 will only try the hashed repository format. While we'd prefer to also have many users testing out actual darcs-2 format repositories, the two codebases have much in common, so tests of the hashed format will greatly help us in improving darcs 2 as a whole. The hashed repository format has a number of changes that are visible to users. 1. The hashed format allows for greater atomicity of operations. This makes for greater safety and simultaneously greater efficiency. These benefits, however, have not been fully realized in this release. For instance, with a hashed repository, there is no need for darcs push to require a repository lock, so you could record patches while waiting for a push to finish (for instance, if it's waiting on the test suite). 2. The _darcs/pristine directory no longer holds the pristine cache. This disallows certain hackish short-cuts, but also dramatically reduces the danger of third-party programs (e.g. DreamWeaver) recursing into the pristine cache and corrupting darcs repositories. 3. Darcs get is now much faster, and always operates in a "lazy" fashion, meaning that patches are downloaded only when they are needed. This gives us much of the benefits of --partial repositories, without most of their disadvantages. This approach, however, does have several new dangers. First, some operations may unexpectedly require the download of large numbers of patches, which could be slow (but you could always interrupt with ^C). Secondly, if the source repository disappears, or you lose network connectivity, some operations may fail. I do not believe these dangers will prove particularly problematic, but we may need to fine-tune the user interface to make it more clear what is going on. 4. Darcs now supports caching of patches and file contents to reduce bandwidth and save disk space. See below for how to enable this. In my opinion, this is actually the most exciting new feature, as it greatly speeds up a number of operations, and is essentially transparent. The only reason we don't enable it by default is because I'm uncomfortable creating a large directory in ~/.darcs/cache without the user's explicit consent. === Creating a repository in the hashed format === Creating a hashed repository is as easy as darcs get --hashed oldrepository newrepository or alternatively you could create a fresh repository with darcs initialize --hashed You can push, pull and send patches at will between hashed and old-fashioned repositories, so you should be able to experiment with this format even on projects that you do not control. === Enabling a global cache === It is very simple to enable a global cache. Simply execute $ mkdir -p $HOME/.darcs/cache $ echo cache:$HOME/.darcs/cache > $HOME/.darcs/sources This will cause darcs to store hard links in ~/.darcs/cache. It is always safe to delete this directory. = Darcs-2 merging = The future of darcs is in the darcs-2 repository format, which features a new merge algorithm that introduces two major user-visible changes 1. It should no longer be possible to confuse darcs or freeze it indefinitely by merging conflicting changes. However, this feature '''needs to be tested''', so please, do your worst, and let us know how darcs can handle it! 2. Identical primitive changes no longer conflict. This is a long-requested feature, and has far-reaching implications. See below (the section on "new semantics") for a discussion of these implications. === Creating a repository in the darcs-2 format === Converting an existing repository to the darcs-2 format is as easy as darcs convert oldrepository newrepository However, the convert command does run rather slowly. Moreover, you should ideally only perform this command once per project, as the conversion is not reversible, and its result is dependent on the order of patches in your repository. Of course, you can experiment all you like, but projects should switch to darcs-2 format in unison, and only after the final release of darcs 2. You can also create a fresh repository with darcs initialize --darcs-2 == Changes in semantics == When using the darcs-2 format, darcs treats identical primitive patches as the''same''patch. This has dramatic implications in how darcs-2 will define dependencies. In particular, dependencies (except those explicitly created by the use with --add-deps) are always dependencies on a given''primitive''patch, not on a given named patch. This means that the change named "foo" may in effect depend on''either the change named "bar" or the change named "baz"''. This prerelease of darcs 2 has not been fully converted to always take advantage of these new semantics--it will not cause corruption, but under unusual circumstances, could exit with an error. We need to decide how to handle these semantics in the user interface. === A simple example === Let me illustrate what could happen with a story. Steve creates changes "A" and "B": steve$ echo A > foo steve$ darcs add foo steve$ darcs record -m A steve$ echo B > foo steve$ darcs record -m B Meanwhile, Monica also decides she'd like a file named foo, and she also wants it to contain A, but she also wants to make some other changes: monica$ echo A > foo monica$ darcs add foo monica$ echo Z > bar monica$ darcs add bar monica$ darcs record -m AZ At this point, Monica pulls from Steve: monica$ darcs pull ../steve but she decides she prefers her AZ change, to Steve's A change, and being a harsh person, she decides to obliterate his change: monica$ darcs obliterate --match 'exact A' --all At this point, darcs 1 would complain, pointing out that patch B depends on patch A. However, darcs 2 will happily obliterate patch A, because patch AZ provides the primitive patches that B depends upon. At this point, however, we run into the limitations of this prerelease version of darcs 2: If Steve pulls from Monica, his darcs will fail, because the common set of patches (which is only B) cannot exist without either A or AZ. I plan to fix this behavior, but the internal API for doing so is not at all clear to me, which is why I'm looking for input from others. But note that this situation can only occur if users take advantage of the new semantics, which I suspect will be relatively seldom, until we give them tools to more easily do so (see below). === A few implications === At first this may look like a regression. Certainly, it took me a long conversation (with Steve, immortalized in the above example--but in truth, Monica would never be so unkind as to obliterate Steve's change) to determine that this behavior is actually a Good Thing, and that the potential confusion among users is a relatively small danger. The main lesson regarding these new semantics is that''patches depends on primitive patches, not on named patches''. A named patch is really just a set of primitive patches. Once we train darcs to take advantage of this feature, several tantalizing possibilities open up: 1. As the above example illustrates, in certain circumstances we can obliterate patches that are depended upon by other patches. We could automate this, by enabling obliterate (perhaps given a flag?) to break apart the patch it's trying to obliterate, and leave behind only those primitive patches which are depended upon by other patches. Perhaps we could call this new feature "atomization". Of course, this applies equally well to unrecord. 2. Recognizing that amend-record is equivalent to unrecord followed by record in a clean repository, it becomes clear that with atomization we could amend even patches that later patches depend upon. I guess these two (three?) examples are all that come to mind, but they're big examples, features that have been requested many times over the years. Of course, there would be some debris left over, a portion of the patch that got atomized, but this debris would be minimal, and seems to me to be necessary. ======================================== The version of this announcement on the wiki has more discussion of implementation plans and how you can contribute. I urge you to join in the fun of testing and optimizing this new version of darcs! David