
I am pleased to announce the availability of the second prerelease of darcs two, darcs 2.0.0pre2. This release fixes several severe performance bugs that were present in the first prerelease. These issues were identified and fixed thanks to the helpful testing of Simon Marlow and Peter Rockai. We also added support for compilation under ghc 6.4, so even more users should be able to test this release. As before, some information about the prerelease is available at: http://wiki.darcs.net/index.html/DarcsTwo You can either use darcs to grab the latest darcs, or you can download a tarball at: http://darcs.net/darcs2.0.0pre2.tar.gz A few outstanding performance issues are: 1. darcs whatsnew performance has dropped for hashed repositories due to no longer tracking file-modification times. We need to reenable this feature, but it's not quite clear how best to do so. 2. darcs get on a hashed repository is not as fast as the older darcs get --partial. We could fix this by enabling the downloading of a single file _darcs/pristine.hashed.tar.gz. This eliminates the potential benefits of caching file downloads on darcs get (currently a second "get" of the same remote repository is almost free if you've enabled a global cache), so we may want to make this behavior optional. I hope we can get even more testing with this release, and look forward to finding and fixing any remaining performance regressions (or bugs, of course)! David

David Roundy wrote:
I am pleased to announce the availability of the second prerelease of darcs two, darcs 2.0.0pre2.
Thanks! Continuing my performance tests, I tried unpulling and re-pulling a bunch of patches in a GHC tree. I'm unpulling about 400 patches using --from-tag, and then pulling them again from a local repo. Summary: darcs2 is about 10x slower than darcs1 on unpull, and on pull it is 100x slower in user time but only 20x slower in elapsed time. In both cases, the repository was on an NFS filesystem. In the darcs2 case, the repository I was pulling from was on the local disk, and I'm also using a cache (NFS-mounted). The darcs2 repository has been optimized, but the darcs1 repository has not (at lesat, not recently). I did all of these a couple of times to eliminate the effects of cache preloading etc., the times reported are from the second run. ------- darcs 1: $ time darcs unpull --from-tag 2007-09-25 -a Finished unpulling. 35.17s real 5.77s user 1.00s system 19% darcs unpull --from-tag 2007-09-25 -a $ time darcs pull ~/ghc-HEAD -a Pulling from "/home/simonmar/ghc-HEAD"... 33.51s real 3.62s user 1.05s system 13% darcs pull ~/ghc-HEAD -a ------- darcs 2: $ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 385.22s real 52.18s user 12.62s system 16% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull /64playpen/simonmar/ghc-darcs2 -a Finished pulling and applying. 668.75s real 290.74s user 15.03s system 45% darcs2 pull /64playpen/simonmar/ghc-darcs2 -a Cheers, Simon

Thanks for the timings. Alas, I'm leaving in the morning for vacation, so I'm not sure when I'll have time to profile these operations. And I'm still puzzling over how to speed up darcs get (i.e. the long discussion of http pipelining, which will not, of course, do anything to help the poor folks stuck with ssh...). The 100x slower pull is indeed a bit puzzling. I can't help but wonder if it's the hash-checking (but that seems very unlikely)... David On Mon, Dec 17, 2007 at 12:29:20PM +0000, Simon Marlow wrote:
Continuing my performance tests, I tried unpulling and re-pulling a bunch of patches in a GHC tree. I'm unpulling about 400 patches using --from-tag, and then pulling them again from a local repo. Summary: darcs2 is about 10x slower than darcs1 on unpull, and on pull it is 100x slower in user time but only 20x slower in elapsed time.
In both cases, the repository was on an NFS filesystem. In the darcs2 case, the repository I was pulling from was on the local disk, and I'm also using a cache (NFS-mounted). The darcs2 repository has been optimized, but the darcs1 repository has not (at lesat, not recently). I did all of these a couple of times to eliminate the effects of cache preloading etc., the times reported are from the second run.
------- darcs 1:
$ time darcs unpull --from-tag 2007-09-25 -a Finished unpulling. 35.17s real 5.77s user 1.00s system 19% darcs unpull --from-tag 2007-09-25 -a
$ time darcs pull ~/ghc-HEAD -a Pulling from "/home/simonmar/ghc-HEAD"... 33.51s real 3.62s user 1.05s system 13% darcs pull ~/ghc-HEAD -a
------- darcs 2:
$ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 385.22s real 52.18s user 12.62s system 16% darcs2 unpull --from-tag 2007-09-25 -a
$ time darcs2 pull /64playpen/simonmar/ghc-darcs2 -a Finished pulling and applying. 668.75s real 290.74s user 15.03s system 45% darcs2 pull /64playpen/simonmar/ghc-darcs2 -a
-- David Roundy Department of Physics Oregon State University

On Mon, Dec 17, 2007 at 12:29:20PM +0000, Simon Marlow wrote:
David Roundy wrote:
I am pleased to announce the availability of the second prerelease of darcs two, darcs 2.0.0pre2.
Thanks!
Continuing my performance tests, I tried unpulling and re-pulling a bunch of patches in a GHC tree. I'm unpulling about 400 patches using --from-tag, and then pulling them again from a local repo. Summary: darcs2 is about 10x slower than darcs1 on unpull, and on pull it is 100x slower in user time but only 20x slower in elapsed time.
I'm not seeing this behavior right now, but am unsure whether it's because of something in my testing, or if I've improved something. I definitely fixed a problem in the no-patches-to-pull case, where we were unnecessarily reading the entire repository because of inadequate laziness. Another possibility is that the problem is one that shows up because of the way the repositories were generated. Incidentally, I *think* that with the latest convert, optimize should be a noop, and if it's not, I'd be interested to hear (but haven't gotten around to testing...). I suspect that if you perform the same sequence in the reverse direction (unpull and repull, but running the commands in the other repository), then you'll see much better performance. My suspicion is that the trouble is that optimize only optimizes for changes since the last tag, so by unpulling this many patches you're going back into "unoptimized" territory. I'm toying with making optimize do a "deep" optimize, but then it'll always be O(N^2), which is a little scary. On the other hand, since we now auto-optimize, making the real thing more expensive shouldn't hurt as much (since it'll not be needed very often). Anyhow, could you retry this test with the above change in methodology, and let me know if (a) the pull is still slow the first time and (b) if it's much faster the second time (after the reverse unpull/pull)? Thanks! David (who is doing darcs hacking in the morning, before the other grownups wake up)

David Roundy wrote:
Anyhow, could you retry this test with the above change in methodology, and let me know if (a) the pull is still slow the first time and (b) if it's much faster the second time (after the reverse unpull/pull)?
I think I've done it in both directions now, and it got faster, but still much slower than darcs1: $ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 58.68s real 50.64s user 6.36s system 97% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull -a ../ghc-darcs2 Pulling from "../ghc-darcs2"... Finished pulling and applying. 53.28s real 44.62s user 7.10s system 97% darcs2 pull -a ../ghc-darcs2 This is still an order of magnitude slower than darcs1 for the same operation. (these times are now on the local filesystem, BTW) Cheers, Simon

On Thu, Jan 03, 2008 at 11:11:40AM +0000, Simon Marlow wrote:
David Roundy wrote:
Anyhow, could you retry this test with the above change in methodology, and let me know if (a) the pull is still slow the first time and (b) if it's much faster the second time (after the reverse unpull/pull)?
I think I've done it in both directions now, and it got faster, but still much slower than darcs1:
$ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 58.68s real 50.64s user 6.36s system 97% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull -a ../ghc-darcs2 Pulling from "../ghc-darcs2"... Finished pulling and applying. 53.28s real 44.62s user 7.10s system 97% darcs2 pull -a ../ghc-darcs2
This is still an order of magnitude slower than darcs1 for the same operation. (these times are now on the local filesystem, BTW)
Is this with the latest darcs-unstable? I made some improvements shortly before Christmas (or was it after Christmas?) that ought to improve the speed of pulls dramatically. We were doing O(N^2) operations in our handling of "pending" changes, which I fixed (I think). So I'll wait on investigating this until you've confirmed which version this was tested with. And thanks for the testing! -- David Roundy Department of Physics Oregon State University

David Roundy wrote:
On Thu, Jan 03, 2008 at 11:11:40AM +0000, Simon Marlow wrote:
David Roundy wrote:
Anyhow, could you retry this test with the above change in methodology, and let me know if (a) the pull is still slow the first time and (b) if it's much faster the second time (after the reverse unpull/pull)? I think I've done it in both directions now, and it got faster, but still much slower than darcs1:
$ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 58.68s real 50.64s user 6.36s system 97% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull -a ../ghc-darcs2 Pulling from "../ghc-darcs2"... Finished pulling and applying. 53.28s real 44.62s user 7.10s system 97% darcs2 pull -a ../ghc-darcs2
This is still an order of magnitude slower than darcs1 for the same operation. (these times are now on the local filesystem, BTW)
Is this with the latest darcs-unstable? I made some improvements shortly before Christmas (or was it after Christmas?) that ought to improve the speed of pulls dramatically. We were doing O(N^2) operations in our handling of "pending" changes, which I fixed (I think). So I'll wait on investigating this until you've confirmed which version this was tested with. And thanks for the testing!
This is using a binary I compiled up from the latest sources yesterday, so it should have those improvements. Cheers, Simon

On Thu, Jan 03, 2008 at 11:11:40AM +0000, Simon Marlow wrote:
Anyhow, could you retry this test with the above change in methodology, and let me know if (a) the pull is still slow the first time and (b) if it's much faster the second time (after the reverse unpull/pull)?
I think I've done it in both directions now, and it got faster, but still much slower than darcs1:
$ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 58.68s real 50.64s user 6.36s system 97% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull -a ../ghc-darcs2 Pulling from "../ghc-darcs2"... Finished pulling and applying. 53.28s real 44.62s user 7.10s system 97% darcs2 pull -a ../ghc-darcs2
This is still an order of magnitude slower than darcs1 for the same operation. (these times are now on the local filesystem, BTW)
I've recently found the problem leading to this slowdown (I believe) and get about an order-of-magnitude improvement in the speed of a pull of 400 patches in the ghc repository. It turned out to be an issue that scaled with the size (width) of the repository, not with the number of patches (which had been the obvious suspect), which was causing trouble when applying to the pristine cache. At this point, darcs-2 outperforms darcs-1 on most tests that I've tried, so it'd be a good time to find some more performance problems, if you can... and I don't doubt that there are more out there. -- David Roundy Department of Physics Oregon State University

David Roundy wrote:
On Thu, Jan 03, 2008 at 11:11:40AM +0000, Simon Marlow wrote:
Anyhow, could you retry this test with the above change in methodology, and let me know if (a) the pull is still slow the first time and (b) if it's much faster the second time (after the reverse unpull/pull)? I think I've done it in both directions now, and it got faster, but still much slower than darcs1:
$ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 58.68s real 50.64s user 6.36s system 97% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull -a ../ghc-darcs2 Pulling from "../ghc-darcs2"... Finished pulling and applying. 53.28s real 44.62s user 7.10s system 97% darcs2 pull -a ../ghc-darcs2
This is still an order of magnitude slower than darcs1 for the same operation. (these times are now on the local filesystem, BTW)
I've recently found the problem leading to this slowdown (I believe) and get about an order-of-magnitude improvement in the speed of a pull of 400 patches in the ghc repository. It turned out to be an issue that scaled with the size (width) of the repository, not with the number of patches (which had been the obvious suspect), which was causing trouble when applying to the pristine cache.
At this point, darcs-2 outperforms darcs-1 on most tests that I've tried, so it'd be a good time to find some more performance problems, if you can... and I don't doubt that there are more out there.
Certainly a lot faster, nice work! Though it's still not as fast as darcs-1 here. New figures: $ time darcs2 unpull --from-tag 2007-09-25 -a Finished unpulling. 18.83s real 15.27s user 1.53s system 89% darcs2 unpull --from-tag 2007-09-25 -a $ time darcs2 pull ../ghc-darcs2-other -a Finished pulling and applying. 10.38s real 7.69s user 1.50s system 88% darcs2 pull ../ghc-darcs2-other - I repeated the darcs-1 timings for comparison: $ time darcs unpull --from-tag 2007-09-25 -a Finished unpulling. 8.04s real 7.14s user 0.90s system 99% darcs unpull --from-tag 2007-09-25 -a $ time darcs pull ~/ghc-HEAD -a Finished pulling and applying. 7.90s real 4.90s user 0.98s system 74% darcs pull ~/ghc-HEAD -a In this case darcs-1 is pulling more patches (530 vs. 400), because I'm using the latest GHC HEAD repo. Also the darcs-1 repository being pulled from is on a different, NFS mounted, filesystem, whereas the darcs-2 timings were made using repos on the same local filesystem. In all cases I tried things a few times to let caches etc. fill up. Can you repeat these? Cheers, Simon
participants (2)
-
David Roundy
-
Simon Marlow