darcs hacking sprint 4 report

Hi everybody,
Here's our report from ZuriHac. It's been also posted to the Darcs blog
with a couple of photos stolen from Johan's blog, some Darcs rebase
scribbling and a screenshot of Darcsden's intriguing fork-tracking
feature...
http://blog.darcs.net/2010/03/darcs-hacking-sprint-4-report.html
The Fourth Darcs Hacking Sprint took place last weekend (19 to 21 March)
as part of the Zurich Haskell Hackathon. We had a very productive
sprint, a bit of code written, polished off many key discussions, had a
little beer and a lot of fun.
Overview
======================================================================
In this sprint, we worked on finishing some performance work for the
upcoming Darcs 2.5 release this summer (hashed storage, patch index,
global caches, inventory hashing); planning our work for the Darcs 2.6
release next year (smart servers, cache cleanup, darcs rebase) and
working with new users of the Darcs library.
Issues resolved
---------------
* issue643 darcs send -o output - Guillaume Hoffmann
* issue1473 annotate command line - Stefan Wehr
* issue1456 portable darcs dist - Guillaume Hoffmann
New Darcs Hackers
======================================================================
We're always happy to work with new Darcs developers. At this sprint,
we were joined by four new contributors.
Guillaume Hoffmann
------------------
Guillaume has been writing our Darcs Weekly News articles for a year
now. Over the weekend he got his first taste of Darcs hacking, knocking
out three ProbablyEasy bugs (darcs dist internals, darcs send -o UI,
darcs apply with gzipped patch bundles). Guillaume reports that he can
see himself doing more of this in the future!
Steven Keuchel
--------------
Steven worked on a new feature to display the file contents hashed
associated with any patch. This makes it easier for third party tools
to inspect the patch files behind Darcs.
Stefan Wehr and David Leuschner
--------------------------------
Stefan and David mostly worked on the Darcs Patch Manager, but to warm
up, they tackled a couple of ProbablyEasy bugs, particularly a bug in
darcs annotate that was affecting Redmine
Hacking continued...
======================================================================
Bugfix: Darcs on Windows shares
-------------------------------
Salvatore tracked down the Windows regression on 2.4 that make
Darcs not work on windows shares.
Performance: Fast darcs annotate
--------------------------------
Benedikt Schmidt continued his work on the patch index (formerly known
as the filecache). The patch index keeps track of which patches affect
which files. This index will bring a big boost to darcs annotate
performance, particularly for files which are affected by relative small
number of patches.
Performance: Global cache
-------------------------
Luca continued his work on breaking up the global cache
($HOME/.darcs/cache) into buckets for faster access. Working with
Reinier and Petr, Luca has developed an approach to migrating from old
style caches to the new style bucketed ones. He has also improved the
implementation to use hard links, to avoid disk space doubling and to
preserve backwards compatibility with prior versions of Darcs.
Windows installer
-----------------
Salvatore put together a nice Windows installer using the `bamse package
http://hackage.haskell.org/package/bamse`_. It looks like we will be
able to use this for the planned Darcs 2.5 release this summer. This
work will also open the door to nicer integration with Windows tools,
for example, using a bundled Tortoise SSH for better experience working
with SSH passphrases.
Interactive cherry picking
--------------------------
Florent improved the quality of the Darcs cherry picking code, making it
easier to fine tune our user interface and some day support graphical
interfaces via the Darcs library. Witnessed list zippers for the win?
Interactive diff
----------------
Florent also started work on adding Darcs's interactive cherry picking
to darcs diff, making it possible to choose a set of patches to view as
a diff.
Performance: Hashed storage completion
--------------------------------------
Darcs has a representation of file and directory trees called slurpies.
Petr polished off his work to replace the slurpies with his more
efficient, general purpose hashed-storage library. Slurpies are going
away, and Darcs will be faster for it. He and Ganesh also discussed
how to gracefully transition from repositories created before the
hashed-storage refactor.
Performance: Using tags when writing patches
--------------------------------------------
Petr ported work by David Roundy to solve a `scalability
regression http://bugs.darcs.net/issue1106`_ in hashed repositories.
For darcs commands that write out patches, we had a naive hashing
operation that does not account for the fact that patches behind tags
cannot be modified. Darcs was unnecessarily traversing the entire
sequence of patches (ie. O(n) time) when it could easily have been
just traversing the sequence since the last tag.
UTF-8 metadata
--------------
Reinier continued to improve the encoding of Darcs patch metadata.
Darcs is completely agnonstic with respect to the encoding of your
files. Unfortunately, this agnostism extends to patch metadata (patch
name, patch author), making it difficult for people to collaborate
across different locales. To address this problem, Reinier has been
working to make Darcs store its patch metadata in a single encoding
(UTF-8) while gracefully supporting older patches (with metadata in
potentially any encoding).
Discussions
======================================================================
Release process
---------------
The Darcs 2.4 release was quite a tricky one to navigate. We found that bugs
were only being flushed out on release candidate time and sometimes after the
release proper.
We would like to encourage more people to try out Darcs work in progress and
give us feedback early in the release process. After chatting about this,
Reinier (with Ganesh, Eric and Petr) decided that as Release Manager, he would
put out a Darcs alpha every 4 weeks.
In the future we may investigate automatic nightly builds via the buildbot
and a platform support policy such as the one used by Tahoe.
Darcs patch index (fast darcs annotate)
---------------------------------------
Benedikt updated us on the recent status of his ongoing patch index work
(formerly known as the filecache). We discussed the things that make
the patch index convincing (permanant, repo-local, unique identifiers
for files) the interaction between the patch index and the type
witnesses and also ways of tuning the patch index performance and
keeping it small.
We're looking forward to sharing the new patch index optimisation with
you in upcoming releases. Darcs annotate may become a lot more useful
in the next couple of releases!
Readable darcs annotate
-----------------------
Fast darcs annotate won't be useful if nobody can read it. Benedikt and
Eric worked on designing a better output format darcs annotate. Taking
a page from git blame, there will be one line per source file line, with
columns for patch identifier, author name, date and finally the line.
One of the design questions was how we should best refer to darcs
patches, the current best candidate being a prefix of the darcs patch
metadata hash.
Fast darcs over networks
------------------------
Darcs get over networks is slow, painfully slow. Petr has suggested two
priorities for improving the performance of network operations. The
first would be to introduce a `darcs optimize --http
http://bugs.darcs.net/issue1771`_ feature which would optimise the
Darcs repository for fetching over a network (for example, by creating
a "snapshot" of the pristine cache to be fetched in one go). The second
priority would be develop a `smart server
http://bugs.darcs.net/issue1773`_ that would provide darcs clients
with only the files they need and in the optimal number of chunks.
The two ideas combined would make an excellent Google Summer of Code
project.
Darcs rebase
------------
Prior to the sprint, Ganesh has been working on a `darcs rebase
http://wiki.darcs.net/Ideas/RebaseDesign`_ feature which will help
Darcs users work with long term branches, and other cases where patch
commutation by itself is not enough. At the sprint, Ganesh explained
his work to everyone interested. Together we settled on a rough plan
for the user interface. It looks like our new rebase command will offer
a typically Darcs-ish twist: interactive cherry picking.
Darcs library
-------------
Ganesh and Florent talked with three teams building software in the
Darcs ecosystem (DPM: Stephan Wehr and David Leuschner, Mac Darcs record
GUI: Benedikt Huber and David Markvica, DarcsDen: Alex Suraci). There
was a surprising degree of commonality.
The conversations have given us a much stronger sense of direction with
the Darcs library. In particular, Ganesh is convinced that we should
commit to our use witnesses - at the very least getting them completely
finished so we can run with them, probably turning them on by default,
and quite possibly dropping the non-witnesses builds.
Default switches
----------------
We held a quick roundtable discussion to settle some decisions on
Darcs default switches that have been hanging in the air. Our decisions
for Darcs 2.5:
* --no-set-scripts-executable [unchanged]
* pull/push/send --no-set-default
* send --edit-description
* record --no-test
* check --no-test
Performance presentations
-------------------------
Petr and Benedkit gave lighting talks, showing some of our recent performance
work to the Haskell community. Some exciting numbers from Benedikt's work
(`notes http://wiki.darcs.net/Ideas/PatchIndex`_) include a 6 second darcs
annotate on a file in the GHC repository (previously this did not complete
within a half hour). With Petr's work, we are able to [TODO numbers].
Google Summer of Code
----------------------
We discussed our priorities for this year's Google Summer of Code. We have
decided that we would focus our attention on performance issues. If we had two
GSoC students this year, we would be mainly interested in dividing them between
network performance
developing a smart server for much faster darcs get and pull over a network
local performance
performing a comprehensive overhaul of the Darcs hashed file cache handling
We also discussed ways to make the best use of our students' time. The Darcs
team has participated in GSoC twice and learning a lot from the experience.
This year we would like to see if we could publish some clear guidelines both
on what we expect from GSoC students and what they can expect from us. Watch
the mailing list for more discussion on this topic.
Budding Ecosystem
======================================================================
We were pleasantly suprised to find ourselves with users of the (still
unstable) Darcs API. These new arrivals give us the feeling that the
collection of `related software
http://wiki.darcs.net/RelatedSoftware`_ is coalescing into a new
Darcs ecosystem.
Darcs Patch Manager
-------------------
David Leuschner and Stefan Wehr worked on an exciting new patch management
program for project maintainers. The Darcs Patch Manager (DPM) offers a new
way for repository maintainers to keep track of incoming Darcs patches,
including their amendements and dependencies. ::
$ dpm -r MAIN_REPO -s DPM_DB list
very cool feature [State: OPEN]
2481 Tue Mar 16 17:50:23 2010 Dave Devloper
participants (1)
-
Eric Y. Kow