
It's time to consider again whether we should migrate GHC development from darcs to (probably) git. From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading. The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future http://wiki.darcs.net/Roadmap Rebase support is coming, and it does work, though the workflow is a bit laborious. Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git. The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!). We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute? Cheers, Simon

On 10 January 2011 11:19, Simon Marlow
Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Well, as a sometime-contributor I would certainly be happier hacking on GHC if it were git based. When working on a GHC branch, it is rather irritating to spend time work around the inevitable Darcs bugs rather than hacking on the code. However, I remember the last time this came up there were some issues that might make migration painful. From the top of my head: 1) Some people expressed concern that they would have to use two revision control systems to work on GHC, because not all GHC dependencies would be git-based. 2) There was also concern that Git isn't so great on Windows. I have heard that this is less of an issue now, but I never personally suffered from any problems, so can't be sure. (FWIW I used Git on Windows industrially ~1 year ago for 3 months and didn't have problems, though the people around me occasionally had issues with e.g. case insensitivity causing obscure error messages). 3) The git transition has the potential to make Ian's workflow (i.e. maintaining branches for old GHC releases) harder. AFAIK currently Ian just "darcs pulls" a particular patch from GHC HEAD into e.g. the GHC 7 branch. Darcs automatically works out which of the earlier patches are dependent on that patch and deals with pulling them in as well (if they are not already present). If GHC HQ transitioned to Git but committed all patches to master then Ian's job might be significantly harder because he would have to use "git cherry-pick" to pick out any bug fix patches that should be merged back to e.g. GHC 7. Unfortunately, Git does not provide any mechanism for automatically working out which earlier patches commits on the cherry-picked commit, so this will fail quite often. Ian will then have to manually identify the dependents. To make Ian's life easier GHC HQ could adopt a new workflow. For example, bug fixers could fix their bugs on *new* branches (one peg bug) which start from the last major GHC release (right now, GHC 7). After fixing the bug on that branch, they can then merge the branch into master. Now when Ian merges a bugfix to a GHC 7 patchlevel release he just needs to merge that bug fixing branch into the ongoing GHC 7 branch. Naturally other workflows are possible and I'm sure other list members will chime in with their own favourites :-) Has GHC HQ thought about these workflow issues? Are you happy with any changes that might be required to your workflows? Ultimately I'm quite concerned with keeping GHC HQ happy (as you guys do the lions share of the work!). I feel we should only make the switch if the most frequent committers (i.e. Simon, Simon and Ian) are *totally happy* with it and any associated workflow changes that may be required. Cheers, Max

On 10/01/2011 13:02, Max Bolingbroke wrote:
On 10 January 2011 11:19, Simon Marlow
wrote: Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Well, as a sometime-contributor I would certainly be happier hacking on GHC if it were git based. When working on a GHC branch, it is rather irritating to spend time work around the inevitable Darcs bugs rather than hacking on the code.
However, I remember the last time this came up there were some issues that might make migration painful. From the top of my head:
1) Some people expressed concern that they would have to use two revision control systems to work on GHC, because not all GHC dependencies would be git-based.
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
2) There was also concern that Git isn't so great on Windows. I have heard that this is less of an issue now, but I never personally suffered from any problems, so can't be sure. (FWIW I used Git on Windows industrially ~1 year ago for 3 months and didn't have problems, though the people around me occasionally had issues with e.g. case insensitivity causing obscure error messages).
Again, it would be a prerequisite that all our workflows work on Windows too. We'd have to do some research to check for problems.
3) The git transition has the potential to make Ian's workflow (i.e. maintaining branches for old GHC releases) harder.
AFAIK currently Ian just "darcs pulls" a particular patch from GHC HEAD into e.g. the GHC 7 branch. Darcs automatically works out which of the earlier patches are dependent on that patch and deals with pulling them in as well (if they are not already present).
If GHC HQ transitioned to Git but committed all patches to master then Ian's job might be significantly harder because he would have to use "git cherry-pick" to pick out any bug fix patches that should be merged back to e.g. GHC 7. Unfortunately, Git does not provide any mechanism for automatically working out which earlier patches commits on the cherry-picked commit, so this will fail quite often. Ian will then have to manually identify the dependents.
To make Ian's life easier GHC HQ could adopt a new workflow. For example, bug fixers could fix their bugs on *new* branches (one peg bug) which start from the last major GHC release (right now, GHC 7). After fixing the bug on that branch, they can then merge the branch into master. Now when Ian merges a bugfix to a GHC 7 patchlevel release he just needs to merge that bug fixing branch into the ongoing GHC 7 branch.
Naturally other workflows are possible and I'm sure other list members will chime in with their own favourites :-)
I don't think the dependencies get very deep in most cases, and my impression is that we often don't want to pull the dependencies anyway, so darcs forces us to merge the patch manually (Ian would be able to say for sure how often this happens). However, if it turned out that we had to change this workflow it wouldn't be the end of the world. Fixing bugs on the stable branch rather than HEAD would be a slight inconvenience, but is arguably the right thing anyway.
Has GHC HQ thought about these workflow issues? Are you happy with any changes that might be required to your workflows?
Ultimately I'm quite concerned with keeping GHC HQ happy (as you guys do the lions share of the work!). I feel we should only make the switch if the most frequent committers (i.e. Simon, Simon and Ian) are *totally happy* with it and any associated workflow changes that may be required.
Speaking for myself, I tend slightly towards making the switch, becuase I'm keen to make branching less painful. However, I think if it were just the three of us, there probably wouldn't be enough motivation to overcome the cost of switching, but if there is enough interest from the rest of the community that might just swing it. Cheers, Simon

Please please consider Mercurial if migration from darcs is inevitable :) P.

On Mon, Jan 10, 2011 at 2:34 PM, Pavel Perikov
Please please consider Mercurial if migration from darcs is inevitable :)
While Mercurial is a fine choice, I think there are more Haskellers that use Git than Mercurial. Probably because GitHub is such an awesome service. Johan

On 10.01.2011, at 16:40, Johan Tibell wrote:
While Mercurial is a fine choice, I think there are more Haskellers that use Git than Mercurial. Probably because GitHub is such an awesome service.
Interesting. It will be great to see any numbers (really, just curious). bitbucket seems to be ok too :) For me who got used to darcs Mercurial just seemed so much leaner, simpler etc.. And it presumably have better support on Windows btw (I personally only use Macs though).

On Mon, Jan 10, 2011 at 2:43 PM, Pavel Perikov
On 10.01.2011, at 16:40, Johan Tibell wrote:
While Mercurial is a fine choice, I think there are more Haskellers that use Git than Mercurial. Probably because GitHub is such an awesome service.
Interesting. It will be great to see any numbers (really, just curious).
No real numbers. I've just observed what other Haskellers talk about and where I usually find projects (when they are not in Darcs). We could probably pull the numbers of Hackage. Johan

On 10.01.2011, at 18:59, Johan Tibell wrote:
I've just observed what other Haskellers talk about and where I usually find projects (when they are not in Darcs). We could probably pull the numbers of Hackage.
Probably most valuable are the opinions of GHC development team of course :) Git really seem to be more popular, Mercurial just seem more streamlined to me :) P.

On Mon, Jan 10, 2011 at 5:08 PM, Pavel Perikov
Probably most valuable are the opinions of GHC development team of course :) Git really seem to be more popular, Mercurial just seem more streamlined to me :)
Their preference if of course very important, but they partly wanted to make the change to get more contributors so in my opinion it makes sense to switch to something that the majority of Haskellers use. I'm not trying to get into a Git vs Mercurial argument here. I have more important things to do, like writing code. :) Johan

On Mon, Jan 10, 2011 at 5:34 AM, Pavel Perikov
Please please consider Mercurial if migration from darcs is inevitable :)
For what it's worth, Mercurial generally interoperates quite well with git and github, using the hg-git plugin. As a longtime Mercurial user and an occasional GHC contributor, it wouldn't be a practical problem for me if GHC were to switch to git.

On 11.01.2011, at 0:29, Bryan O'Sullivan wrote:
For what it's worth, Mercurial generally interoperates quite well with git and github, using the hg-git plugin. As a longtime Mercurial user and an occasional GHC contributor, it wouldn't be a practical problem for me if GHC were to switch to git.
Good news, Bryan! Thanks for your books, btw!

On Mon, Jan 10, 2011 at 01:27:17PM +0000, Simon Marlow wrote:
On 10/01/2011 13:02, Max Bolingbroke wrote:
2) There was also concern that Git isn't so great on Windows. I have heard that this is less of an issue now, but I never personally suffered from any problems, so can't be sure. (FWIW I used Git on Windows industrially ~1 year ago for 3 months and didn't have problems, though the people around me occasionally had issues with e.g. case insensitivity causing obscure error messages).
Again, it would be a prerequisite that all our workflows work on Windows too. We'd have to do some research to check for problems.
The environment provided by msysgit is reasonably usable, and performance hasn't been much trouble for me personally. The binaries from it can also be used in syntevo's SmartGit product, which has a free license for non-commercial purposes, so if the gitk tools of msysgit are considered inadequate, SmartGit tends to get the job done. The only problem I've had with msysgit is that it has its own set of compilers and msys tools, which might conflict with the existing toolchains used with GHC, something which I'm unsure if it's solvable as Git has a horrible tendency to rely on having shell tools to do its ill deeds. As for whether to change or not, I'm neutral as I'm not a contributor nor have any plans to be. -- Lars Viklund | zao@acc.umu.se

On Mon, Jan 10, 2011 at 01:27:17PM +0000, Simon Marlow wrote:
I don't think the dependencies get very deep in most cases, and my impression is that we often don't want to pull the dependencies anyway, so darcs forces us to merge the patch manually (Ian would be able to say for sure how often this happens).
I'm not sure OTTOMH, but I think that it's more common that another patch gets pulled in than a manual merge is done. Other patches are generally comment changes, whitespace fixes, or things like build system tweaks that weren't worth merging for the sake of it. Thanks Ian

On Mon, Jan 10, 2011 at 01:27:17PM +0000, Simon Marlow wrote:
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
I think it's hard to know how well it's going to work in advance (I was going to try redoing all the GHC 7 merges with git, but that won't tell us much if they would have been recorded differently in a git workflow), so perhaps we should change only the GHC repo at first, and see how it goes? That way, if we decide it's worse, we haven't done a lot of work migrating dependencies or setting up mirrors, and to roll back we only need to migrate the new git patches into the old darcs repo. Thanks Ian

Hello,
I have been working on a GHC branch for the last few months and, for me,
switching to git would be a win because I find it quite difficult to keep my
branch and HEAD synchronized. I allocate about a day, probably about once a
month, to redo my repository so that it is in sync with HEAD.
My background is that I use many VCSes for work (although lately, mostly
git), and git for my non-work projects. I am by no means an advanced VCS
user. I do like git's graph-based non-mutable history model and it has been
the only VCS where I've been able to work out how to do something more or
less "from first principles". I don't really use the command line interface
much, I tend to use mostly tools like "gitk", and "git gui".
-Iavor
On Mon, Jan 10, 2011 at 10:46 AM, Ian Lynagh
On Mon, Jan 10, 2011 at 01:27:17PM +0000, Simon Marlow wrote:
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
I think it's hard to know how well it's going to work in advance (I was going to try redoing all the GHC 7 merges with git, but that won't tell us much if they would have been recorded differently in a git workflow), so perhaps we should change only the GHC repo at first, and see how it goes?
That way, if we decide it's worse, we haven't done a lot of work migrating dependencies or setting up mirrors, and to roll back we only need to migrate the new git patches into the old darcs repo.
Thanks Ian
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 10/01/2011, at 13:27, Simon Marlow wrote:
On 10/01/2011 13:02, Max Bolingbroke wrote:
However, I remember the last time this came up there were some issues that might make migration painful. From the top of my head:
1) Some people expressed concern that they would have to use two revision control systems to work on GHC, because not all GHC dependencies would be git-based.
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
I'm not sure how that is going to work. It might well be possible to build GHC using only git. But most GHC developers also contribute to various libraries which are often quite intimately linked to GHC. In particular, GHC patches are often accompanied by library patches. Unless all those libraries switch to git, too, we'll have to use both git and darcs which would be *really* annoying. Personally, I rather dislike git, mostly for the reasons that Malcolm already mentioned. Compared to darcs, it seems to get in the way much too often. It also seems to make finding buggy patches rather hard. But maybe I just don't know how to use it properly. In any case, a switch to git wouldn't deter me from contributing to GHC, but neither would a switch to any other VCS. I would certainly swear more often while developing, though. Roman

I agree with Roman's position. I would prefer to stay with darcs (it has its advantages and disadvantages, but has definitely been improving much in the past). In any case, all of GHC including all dependencies must be available and patchable with a *single* VCS. Mixing VCS' will lead to madness. Manuel PS: This talk about contributing to a project if it changes its VCS seems a bit lame to me. You contribute to a project in a serious way because you care about the project and because you need whatever improvements you are implementing, not because you like the VCS. Roman Leshchinskiy:
On 10/01/2011, at 13:27, Simon Marlow wrote:
On 10/01/2011 13:02, Max Bolingbroke wrote:
However, I remember the last time this came up there were some issues that might make migration painful. From the top of my head:
1) Some people expressed concern that they would have to use two revision control systems to work on GHC, because not all GHC dependencies would be git-based.
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
I'm not sure how that is going to work. It might well be possible to build GHC using only git. But most GHC developers also contribute to various libraries which are often quite intimately linked to GHC. In particular, GHC patches are often accompanied by library patches. Unless all those libraries switch to git, too, we'll have to use both git and darcs which would be *really* annoying.
Personally, I rather dislike git, mostly for the reasons that Malcolm already mentioned. Compared to darcs, it seems to get in the way much too often. It also seems to make finding buggy patches rather hard. But maybe I just don't know how to use it properly. In any case, a switch to git wouldn't deter me from contributing to GHC, but neither would a switch to any other VCS. I would certainly swear more often while developing, though.
Roman
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I'm not sure if your statement regarding the decoupling between contributors and VCSes holds water. The VCS is definitely a factor, but certainly not the only one. I've been demotivated by VCSes before and it has directly impacted whether I continued my involvement. Granted that the VCS was SCCS, but still...
Sample size of one, not the basis for a system of government and statements regarding watery bints lying in ponds apply.
-scooter
Sent from my Verizon Wireless BlackBerry
-----Original Message-----
From: Manuel M T Chakravarty
On 10/01/2011, at 13:27, Simon Marlow wrote:
On 10/01/2011 13:02, Max Bolingbroke wrote:
However, I remember the last time this came up there were some issues that might make migration painful. From the top of my head:
1) Some people expressed concern that they would have to use two revision control systems to work on GHC, because not all GHC dependencies would be git-based.
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
I'm not sure how that is going to work. It might well be possible to build GHC using only git. But most GHC developers also contribute to various libraries which are often quite intimately linked to GHC. In particular, GHC patches are often accompanied by library patches. Unless all those libraries switch to git, too, we'll have to use both git and darcs which would be *really* annoying.
Personally, I rather dislike git, mostly for the reasons that Malcolm already mentioned. Compared to darcs, it seems to get in the way much too often. It also seems to make finding buggy patches rather hard. But maybe I just don't know how to use it properly. In any case, a switch to git wouldn't deter me from contributing to GHC, but neither would a switch to any other VCS. I would certainly swear more often while developing, though.
Roman
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Mon, 10 Jan 2011, Roman Leshchinskiy wrote:
It also seems to make finding buggy patches rather hard.
Have a look at `git bisect`.
Tony.
--
f.anthony.n.finch

On 11/01/2011, at 16:14, Tony Finch wrote:
On Mon, 10 Jan 2011, Roman Leshchinskiy wrote:
It also seems to make finding buggy patches rather hard.
Have a look at `git bisect`.
I'm aware of git bisect. It doesn't do what I want. I usually have a pretty good idea of which patch(es) might have caused a problem and I want to unpull it and its dependencies. This is easy in darcs; I have no idea how to do that in git. Roman

On 11 January 2011 19:07, Roman Leshchinskiy
On 11/01/2011, at 16:14, Tony Finch wrote:
On Mon, 10 Jan 2011, Roman Leshchinskiy wrote:
It also seems to make finding buggy patches rather hard.
Have a look at `git bisect`.
I'm aware of git bisect. It doesn't do what I want. I usually have a pretty good idea of which patch(es) might have caused a problem and I want to unpull it and its dependencies. This is easy in darcs; I have no idea how to do that in git.
This form of dependency tracking is done manually in Git via topic/feature branches. Undoing the patch would the mean undoing the merge, which can be done via "git rebase -i". (The -i part is just for a nicer user interface). Now whether manual dependency tracking is better than darcs' automatic tracking is another question.

On 11/01/2011 19:07, Roman Leshchinskiy wrote:
On 11/01/2011, at 16:14, Tony Finch wrote:
On Mon, 10 Jan 2011, Roman Leshchinskiy wrote:
It also seems to make finding buggy patches rather hard.
Have a look at `git bisect`.
I'm aware of git bisect. It doesn't do what I want. I usually have a pretty good idea of which patch(es) might have caused a problem and I want to unpull it and its dependencies. This is easy in darcs; I have no idea how to do that in git.
We can't even do this reliably with darcs. Several times I've tried to unpull one of Simon's patches to work around a bug, and the dependencies end up being more than just the textual dependencies. Then I have to fall back to unpulling by date, which is what git would do. And then sometimes there's the separate problem that you have to retreat the library repos too, and there you have to unpull by date and some guesswork too. Cheers, Simon

We can't even do this reliably with darcs. Several times I've tried to unpull one of Simon's patches to work around a bug, and the dependencies end up being more than just the textual dependencies. Then I have to fall back to unpulling by date, which is what git would do. And then sometimes there's the separate problem that you have to retreat the library repos too, and there you have to unpull by date and some guesswork too.
Perhaps it is possible to take the guesswork out of this latter problem? For all the repos to be linked, maintain a single file "patch-history.txt", add a posthook to all repos so that every commit will be logged as a line in patch-history.txt: repo-id : patch-id : short commit message, or other greppable info Then, if you have a patch id in the GHC repo, you just have to search backward from that id in patch-history.txt until you have matching last-patch ids for the other repos. That search (and darcs-all (un-)pulling up to the patch ids for all repos) could probably be scripted, so it would become a single command (input: repo-id/patch-id for a patch in one of the repos; output: list of repo-ids/patch-ids identifying a consistent set of repo states). Could that be made to work? Claus

Hello,
On Mon, Jan 10, 2011 at 12:49 PM, Roman Leshchinskiy
On 10/01/2011, at 13:27, Simon Marlow wrote:
It would be a prerequisite to switching that a GHC developer only has to use one VCS. So we either migrate dependencies to git, or mirror them in GHC-specific git branches.
I'm not sure how that is going to work. It might well be possible to build GHC using only git. But most GHC developers also contribute to various libraries which are often quite intimately linked to GHC. In particular, GHC patches are often accompanied by library patches. Unless all those libraries switch to git, too, we'll have to use both git and darcs which would be *really* annoying.
If GHC and the libraries on which it depends were in git (migrated, or mirrored), then we could use git sub-modules to track the dependencies between changes to GHC and changes to the libraries. Roughly, the workflow would be like this: 1. Make a change to the library and commit it. 2. Make a change to GHC. 3. Make a GHC commit which records the change and the dependency on the commit in the library repository. This is useful because when someone gets the changes to GHC, they would know that they need to update their library as well (and there is tool support to make all updates automatically). This kind of dependency is not at all obvious with our current workflow. The same method works for going back to a previous state of the project, where one can "rewind" the libraries to their old versions too. -Iavor

On 11/01/2011, at 21:41, Iavor Diatchki wrote:
If GHC and the libraries on which it depends were in git (migrated, or mirrored), then we could use git sub-modules to track the dependencies between changes to GHC and changes to the libraries.
Roughly, the workflow would be like this: 1. Make a change to the library and commit it. 2. Make a change to GHC. 3. Make a GHC commit which records the change and the dependency on the commit in the library repository.
What about dependencies which go the other way? Actually, the dependency is often mutual: the GHC change won't work without the library change and the library change won't work without the GHC change. Does git support this?
This is useful because when someone gets the changes to GHC, they would know that they need to update their library as well (and there is tool support to make all updates automatically). This kind of dependency is not at all obvious with our current workflow.
IMO, darcs-all works pretty well. I don't think I ever really had problems with missing library patches.
The same method works for going back to a previous state of the project, where one can "rewind" the libraries to their old versions too.
This would be useful. Unfortunately, git's rewinding seems rather crippled compared to darcs. Roman

On 11/01/11 21:57, Roman Leshchinskiy wrote:
On 11/01/2011, at 21:41, Iavor Diatchki wrote:
If GHC and the libraries on which it depends were in git (migrated, or mirrored), then we could use git sub-modules to track the dependencies between changes to GHC and changes to the libraries.
Roughly, the workflow would be like this: 1. Make a change to the library and commit it. 2. Make a change to GHC. 3. Make a GHC commit which records the change and the dependency on the commit in the library repository.
What about dependencies which go the other way? Actually, the dependency is often mutual: the GHC change won't work without the library change and the library change won't work without the GHC change. Does git support this?
As I understand it, the GHC repo would specify the required version of the library repo. Right now with darcs we don't get to do this, so if you want to back out the GHC tree to a previous state, it's impossible to back the libraries up to the right point too (I've found this quite annoying when tracking down regressions in the past). With submodules, when you make a combined GHC/library change, the relationship between the two changes would be recorded in the GHC repo, which is exactly what you want.
This is useful because when someone gets the changes to GHC, they would know that they need to update their library as well (and there is tool support to make all updates automatically). This kind of dependency is not at all obvious with our current workflow.
IMO, darcs-all works pretty well. I don't think I ever really had problems with missing library patches.
I often see problems where someone has done 'darcs pull' rather than './darcs-all pull' and ended up with a weird compilation error as a result. If we could eliminate this source of errors, it would be a major win. If submodules actually work for what we want to do, this would be a good reason to move to git, I think.
The same method works for going back to a previous state of the project, where one can "rewind" the libraries to their old versions too.
This would be useful. Unfortunately, git's rewinding seems rather crippled compared to darcs.
In what way? Cheers, Simon BTW, I just translated the GHC darcs repo into git using darcs-fastconvert (cabal install darcs-fastconvert). It took less than 10 minutes and seems to have done the right thing. I'll try to put this up tomorrow for people to play with.

On 11/01/2011, at 22:20, Simon Marlow wrote:
On 11/01/11 21:57, Roman Leshchinskiy wrote:
IMO, darcs-all works pretty well. I don't think I ever really had problems with missing library patches.
I often see problems where someone has done 'darcs pull' rather than './darcs-all pull' and ended up with a weird compilation error as a result. If we could eliminate this source of errors, it would be a major win.
A quick look at the docs seems to indicate that we'd need to do git pull git submodule update which doesn't look like a win over darcs-all. Also, I completely fail to understand what git submodule update does. It doesn't seem to pull all patches from the master repo. The git submodule docs are even worse than the rest of the git docs which is rather discouraging.
This would be useful. Unfortunately, git's rewinding seems rather crippled compared to darcs.
In what way?
Thomas says that it doesn't do automatic dependency tracking which looks like a huge weakness to me. Personally, I haven't been able to successfully unpull non-consecutive chunks of patches with git so far but I only tried 2 or 3 times before giving up. Roman

On Tue, Jan 11 2011, Roman Leshchinskiy wrote:
On 11/01/2011, at 22:20, Simon Marlow wrote:
On 11/01/11 21:57, Roman Leshchinskiy wrote:
This would be useful. Unfortunately, git's rewinding seems rather crippled compared to darcs.
In what way?
Thomas says that it doesn't do automatic dependency tracking which looks like a huge weakness to me. Personally, I haven't been able to successfully unpull non-consecutive chunks of patches with git so far but I only tried 2 or 3 times before giving up.
I think the confusion might just be in terminology and model. Going back to earlier versions in git is trivial, much easier than darcs. Remember, git doesn't store patches. Git stores full snapshots of the tree, with a digraph of dependencies. The trees and history are immutable. The main advantages to darcs are that it can manipulate the sequence of patches better than git. The main advantage of git is that every version is accurately named. If two people have a commit with a given hash, they will have exactly the same files and history. Most projects seem to want most of the history to be immutable, and only do manipulation on recent stuff. "Removal" of earlier patches is an additional patch that removes it rather than removing it from the history. David

The main advantages to darcs are that it can manipulate the sequence of patches better than git.
The main advantage of git is that every version is accurately named. If two people have a commit with a given hash, they will have exactly the same files and history.
I've been wondering about this darcs disadvantage, and have a question: In my understanding, the unorderedness of patch history in darcs is there to make distributed repos easier (fewer constraints: same set of patches, but not same order; can mix local commits and pulls from various repos, no need for a central repo), and because darcs has a causal rather than a temporal view of patch history (which patch depends on which other patches, instead of which patch came first). Now, the GHC workflow does single out one central (set of) repo(s) that receives all patches that ever make it into production use. Currently, there is no requirement that all patches in remote repos come in via that central repo, so there is no ordering guarantee for remote repos, but darcs makes no efforts to permute the patches in the central repos. So, shouldn't it be possible to use the central repos as reference point for patch ordering? To restore an earlier combination of repo states, refer to the central repos, and their (otherwise irrelevant for darcs) ordering of patch history. darcs-all could then record/ restore the state of a set of repos, by referring to the relation of patches in the central version of these repos. And if one really wanted to enforce the same patch ordering on all repos, one could change the workflow: never mix local and common patches, always remove local patches after pushing to the central repos, then pull those patches again from the central repos (to get them in the same order as everyone else). In other words, always keep a branch/repo that only pulls from the central repos (no other source of patches). One could still have other branches/repos for development/testing, but the pristine copy of the central repos would reflect the reference order of patches (wouldn't it?). Would this help with the problem of finding a consistent set of older revisions of the GHC/library repos? Claus

On Wed, 12 Jan 2011, Claus Reinke wrote:
In my understanding, the unorderedness of patch history in darcs is there to make distributed repos easier (fewer constraints: same set of patches, but not same order; can mix local commits and pulls from various repos, no need for a central repo),
Apart from variable patch ordering all of that is true of all DVCSs.
and because darcs has a causal rather than a temporal view of patch history (which patch depends on which other patches, instead of which patch came first).
You can emulate darcs's patch re-ordering in git if you put each independent sequence of patches on a separate branch. Then you can re-merge the branches in whatever order you want. This is a fairly common git workflow.
In other words, always keep a branch/repo that only pulls from the central repos (no other source of patches).
It is normal in git to keep a pristine branch for each remote repository
that you pull from - git sets these branches up by default. There can be
many remotes in a git repository.
Tony.
--
f.anthony.n.finch

You can emulate darcs's patch re-ordering in git if you put each independent sequence of patches on a separate branch. Then you can re-merge the branches in whatever order you want. This is a fairly common git workflow.
What happens after the merges? Does one maintain the branches somehow, or does one lose the (in-)dependency information? Claus

On Wed, 12 Jan 2011, Claus Reinke wrote:
What happens after the merges? Does one maintain the branches somehow, or does one lose the (in-)dependency information?
Remember that a branch in git is just a name for a point in the revision
graph. When you commit to a branch the name is updated to point to the new
commit. Names are local to a particular repository.
When you do a merge, you do it on a particular branch which is updated to
point to the merge commit. The other branches that were merged in (there's
usually one but you can create octopus merges if you want) remain as they
were. The merge commit contains un-named pointers to its parent commits
for use by git, and conventionally records the names of the brances that
were merged in the commit message for the convenience of humans. You can
commit to the other branches to extend them, or delete and reconstruct
them differently, without affecting the state represented by the merge.
Have a look the way "topic branches" are used in the maintenance of
git itself as an example of how to deal with a collection of independent
patches.
http://git.kernel.org/?p=git/git.git;a=blob;f=MaintNotes;hb=refs/heads/todo
Tony.
--
f.anthony.n.finch

On 12 January 2011 22:13, Claus Reinke
You can emulate darcs's patch re-ordering in git if you put each independent sequence of patches on a separate branch. Then you can re-merge the branches in whatever order you want. This is a fairly common git workflow.
What happens after the merges? Does one maintain the branches somehow, or does one lose the (in-)dependency information?
If you are interested in this approach. you can check out Iolaus, which is David Roundy's attempt at getting Darcs-style revision control in Git: https://github.com/droundy/iolaus. I think that it relies on you, the user, to accurately specify which patches a new one of your own depends on, which is a bit of a limitation. Cheers, Max

On 11/01/2011 23:11, Roman Leshchinskiy wrote:
On 11/01/2011, at 22:20, Simon Marlow wrote:
On 11/01/11 21:57, Roman Leshchinskiy wrote:
IMO, darcs-all works pretty well. I don't think I ever really had problems with missing library patches.
I often see problems where someone has done 'darcs pull' rather than './darcs-all pull' and ended up with a weird compilation error as a result. If we could eliminate this source of errors, it would be a major win.
A quick look at the docs seems to indicate that we'd need to do
git pull git submodule update
which doesn't look like a win over darcs-all. Also, I completely fail to understand what git submodule update does. It doesn't seem to pull all patches from the master repo. The git submodule docs are even worse than the rest of the git docs which is rather discouraging.
True, however the build system could automatically check whether you had missed this step, because it could check the hashes.
This would be useful. Unfortunately, git's rewinding seems rather crippled compared to darcs.
In what way?
Thomas says that it doesn't do automatic dependency tracking which looks like a huge weakness to me. Personally, I haven't been able to successfully unpull non-consecutive chunks of patches with git so far but I only tried 2 or 3 times before giving up.
Right, not being able to automatically commute patches is a regression compared to darcs. Git isn't universally "better" than darcs, which is why we're having this discussion - the question is, do the advantages outweigh the disadvantages. For example, you might well consider the lack of a working annotate to be a "huge weakness" in darcs, and the lack of good conflict handling is something that causes us real problems. Cheers, Simon

On 12/01/2011, at 09:22, Simon Marlow wrote:
On 11/01/2011 23:11, Roman Leshchinskiy wrote:
A quick look at the docs seems to indicate that we'd need to do
git pull git submodule update
which doesn't look like a win over darcs-all. Also, I completely fail to understand what git submodule update does. It doesn't seem to pull all patches from the master repo. The git submodule docs are even worse than the rest of the git docs which is rather discouraging.
True, however the build system could automatically check whether you had missed this step, because it could check the hashes.
That would be an improvement. How do you pull submodule patches which the main repo doesn't depend on, though? Out of curiousity, has anyone here used submodules for something similar to what we would need?
Thomas says that it doesn't do automatic dependency tracking which looks like a huge weakness to me. Personally, I haven't been able to successfully unpull non-consecutive chunks of patches with git so far but I only tried 2 or 3 times before giving up.
Right, not being able to automatically commute patches is a regression compared to darcs. Git isn't universally "better" than darcs, which is why we're having this discussion - the question is, do the advantages outweigh the disadvantages.
Oh, definitely, I wasn't implying than one is somehow objectively better than the other. All I'm saying is that darcs is much better suited to my personal workflow than git. Or at least the very small part of git that I've been able to figure out (I do have to say that I've probably read about 3x as much about git as I ever read about darcs, though). Roman

Hello,
On Wed, Jan 12, 2011 at 11:44 AM, Roman Leshchinskiy
On 12/01/2011, at 09:22, Simon Marlow wrote:
On 11/01/2011 23:11, Roman Leshchinskiy wrote:
A quick look at the docs seems to indicate that we'd need to do
git pull git submodule update
which doesn't look like a win over darcs-all. Also, I completely fail to
understand what git submodule update does. It doesn't seem to pull all patches from the master repo. The git submodule docs are even worse than the rest of the git docs which is rather discouraging.
True, however the build system could automatically check whether you had missed this step, because it could check the hashes.
That would be an improvement. How do you pull submodule patches which the main repo doesn't depend on, though? Out of curiousity, has anyone here used submodules for something similar to what we would need?
A "submodule" is basically a "pointer" to a particular state of a remote repo. So when you do "git pull" in GHC, you get changes to the code, and also changes to this "pointer", but it won't automatically modify your local version of the sub-module repo. So at this point, if you started "git gui" you'd see that there is a mismatch between your local copy of the sub-module and the expected version. When you issue the command "git submodule update", you are telling git to advance the sub-module repo to the "expected version" (i.e., where the pointer points to). The reason this does not happen automatically is that you might have also made changes to the submodule, so you might want to do some merging there, instead of just pulling. One thing to note is that if we were to set things up with sub-modules, then every now and then we would have to advance the GHC's "expected pointer" for various libraries to the latest (or a newer) version. Of course, we could have a script do this but, at least in theory, when someone makes a commit which updates the version of a sub-module, they are asserting that they things ought to work with the newer version of the sub-module. -Iavor PS: I've only used sub-module on what project at work. At first I too was quite confused about what was going on, but I've come to think that submodules are a pretty reasonable way to deal with a situation which is inherently complex.

On 12/01/2011, at 22:22, Iavor Diatchki wrote:
When you issue the command "git submodule update", you are telling git to advance the sub-module repo to the "expected version" (i.e., where the pointer points to). The reason this does not happen automatically is that you might have also made changes to the submodule, so you might want to do some merging there, instead of just pulling.
Thank you so much for the explanation. Sadly, I'm still confused. Are you saying that "submodule update" is the wrong thing to do if I have changes in some of the submodules?
One thing to note is that if we were to set things up with sub-modules, then every now and then we would have to advance the GHC's "expected pointer" for various libraries to the latest (or a newer) version. Of course, we could have a script do this but, at least in theory, when someone makes a commit which updates the version of a sub-module, they are asserting that they things ought to work with the newer version of the sub-module.
How would we get the current functionality of darcs-all pull? Is it even possible? Suppose I want to hack on GHC and base (base is a submodule of GHC). For this, I want to: - pull the latest patches to both GHC and base - write code - record my patches in both GHC and base - pull again to get whatever patches have been pushed while I was hacking - validate - push my patches to both GHC and base Which commands would accomplish this? The git docs still don't make any sense to me. FWIW, I'd be very wary of using any features that are so badly documented. Or programs, for that matter. Roman

Excerpts from Roman Leshchinskiy's message of Wed Jan 12 18:20:25 -0500 2011:
How would we get the current functionality of darcs-all pull? Is it even possible?
Here is the rebase-y workflow. Untested, so I might have gotten one or two details wrong.
Suppose I want to hack on GHC and base (base is a submodule of GHC). For this, I want to:
- pull the latest patches to both GHC and base
# pull the latest patches for GHC, and sticks your patchset on top git pull --rebase # <resolve any conflicts that occured during rebase> # register any new submodules (if any) git submodule init # make your submodules reflect the latest version GHC has git submodule update --rebase # <resolve any conflicts that occured during rebase>
- write code - record my patches in both GHC and base
cd libraries/base git commit -asm "Base patch" cd ../.. git commit -asm "GHC patch" Note that your commit to libraries/base changed what commit it is pointing to, so your GHC commit will then pick up the changed sumbodule. If you do the commits in the opposite order, this won't happen. So commit in all submodules first.
- pull again to get whatever patches have been pushed while I was hacking
git pull --rebase git submodule update --rebase
- validate - push my patches to both GHC and base
git send-mail --to=cvs-ghc@haskell.org $PATCHES cd libraries/base git send-mail --to=cvs-ghc@haskell.org $PATCHES Cheers, Edward

On 12 Jan 2011, at 23:31, "Edward Z. Yang"
Excerpts from Roman Leshchinskiy's message of Wed Jan 12 18:20:25 -0500 2011:
How would we get the current functionality of darcs-all pull? Is it even possible?
Here is the rebase-y workflow.
Thank you making things clearer!
# pull the latest patches for GHC, and sticks your patchset on top git pull --rebase # <resolve any conflicts that occured during rebase> # register any new submodules (if any) git submodule init # make your submodules reflect the latest version GHC has git submodule update --rebase
This doesn't pull in all base patches, though, just the ones that GHC depends on, right? How would I get all base patches?
Roman

On 13 January 2011 08:54, Roman Leshchinskiy
On 12 Jan 2011, at 23:31, "Edward Z. Yang"
wrote: Excerpts from Roman Leshchinskiy's message of Wed Jan 12 18:20:25 -0500 2011:
How would we get the current functionality of darcs-all pull? Is it even possible?
Here is the rebase-y workflow.
Thank you making things clearer!
# pull the latest patches for GHC, and sticks your patchset on top git pull --rebase # <resolve any conflicts that occured during rebase> # register any new submodules (if any) git submodule init # make your submodules reflect the latest version GHC has git submodule update --rebase
This doesn't pull in all base patches, though, just the ones that GHC depends on, right? How would I get all base patches?
cd libraries/base git pull [--rebase]

On 12/01/2011 22:22, Iavor Diatchki wrote:
Hello,
On Wed, Jan 12, 2011 at 11:44 AM, Roman Leshchinskiy
mailto:rl@cse.unsw.edu.au> wrote: On 12/01/2011, at 09:22, Simon Marlow wrote:
> On 11/01/2011 23:11, Roman Leshchinskiy wrote: >> >> A quick look at the docs seems to indicate that we'd need to do >> >> git pull >> git submodule update >> >> which doesn't look like a win over darcs-all. Also, I completely fail to understand what git submodule update does. It doesn't seem to pull all patches from the master repo. The git submodule docs are even worse than the rest of the git docs which is rather discouraging. > > True, however the build system could automatically check whether you had missed this step, because it could check the hashes.
That would be an improvement. How do you pull submodule patches which the main repo doesn't depend on, though? Out of curiousity, has anyone here used submodules for something similar to what we would need?
A "submodule" is basically a "pointer" to a particular state of a remote repo. So when you do "git pull" in GHC, you get changes to the code, and also changes to this "pointer", but it won't automatically modify your local version of the sub-module repo. So at this point, if you started "git gui" you'd see that there is a mismatch between your local copy of the sub-module and the expected version.
When you issue the command "git submodule update", you are telling git to advance the sub-module repo to the "expected version" (i.e., where the pointer points to). The reason this does not happen automatically is that you might have also made changes to the submodule, so you might want to do some merging there, instead of just pulling.
One thing to note is that if we were to set things up with sub-modules, then every now and then we would have to advance the GHC's "expected pointer" for various libraries to the latest (or a newer) version. Of course, we could have a script do this but, at least in theory, when someone makes a commit which updates the version of a sub-module, they are asserting that they things ought to work with the newer version of the sub-module.
-Iavor PS: I've only used sub-module on what project at work. At first I too was quite confused about what was going on, but I've come to think that submodules are a pretty reasonable way to deal with a situation which is inherently complex.
I spent quite some time yesterday playing with submodules to see if they would work for GHC. I'm fairly sure there are no fundamental reasons that we couldn't use them, but there are enough gotchas to put me off. I wrote down what I discovered here: http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Submodules The workflow is quite involved - more steps than are required with darcs-all (understandable, because we're storing more information). However, git isn't particularly helpful if you make a mistake or forget to do something. I forsee spending a lot of time digging myself and Simon out of bizarre repository states. I discovered that Google have this tool called "repo" which is their darcs-all for the Android source tree. That might be worth looking at as an alternative in the future: https://sites.google.com/a/android.com/opensource/download/using-repo If we go with git, I suggest we stick with sync-all for the time being and think about either submodules or repo as possibilities for the future. Cheers, Simon

Hi,
Just as a point of information, the following rules can help avoid some of
the gotchas:
- Treat submodules are read-only (i.e., don't make commits there). The
reason for this is that a submodule is usually not on a branch, and so
making a commit would result in a detached head.
- When you pull (or change branches) use "git submodule update" to move the
submodules to their correct versions (yes, it's annoying that one has to do
that).
- Changes to a sub-module should be done in a separate repo (not GHC's
submodule). This is where you switch "hats" and become a "base" developer
rather then a "GHC" developer for a bit, and use whatever workflow you
normally use for development.
- Every now and then you update the sub-module "pointer" of your GHC branch
to a newer versions of the sub-module. You do this by setting the
sub-module to the desired version (e.g., by a pull from its repo), and then
committing the change to the submodule version (perhaps with other GHC
changes).
I agree with Simon's assessment that it is probably a good idea to start
without submodules, at least until all developers are comfortable with the
rest of git's model.
-Iavor
On Thu, Jan 13, 2011 at 12:49 AM, Simon Marlow
On 12/01/2011 22:22, Iavor Diatchki wrote:
Hello,
On Wed, Jan 12, 2011 at 11:44 AM, Roman Leshchinskiy
mailto:rl@cse.unsw.edu.au> wrote: On 12/01/2011, at 09:22, Simon Marlow wrote:
> On 11/01/2011 23:11, Roman Leshchinskiy wrote: >> >> A quick look at the docs seems to indicate that we'd need to do >> >> git pull >> git submodule update >> >> which doesn't look like a win over darcs-all. Also, I completely fail to understand what git submodule update does. It doesn't seem to pull all patches from the master repo. The git submodule docs are even worse than the rest of the git docs which is rather discouraging. > > True, however the build system could automatically check whether you had missed this step, because it could check the hashes.
That would be an improvement. How do you pull submodule patches which the main repo doesn't depend on, though? Out of curiousity, has anyone here used submodules for something similar to what we would need?
A "submodule" is basically a "pointer" to a particular state of a remote repo. So when you do "git pull" in GHC, you get changes to the code, and also changes to this "pointer", but it won't automatically modify your local version of the sub-module repo. So at this point, if you started "git gui" you'd see that there is a mismatch between your local copy of the sub-module and the expected version.
When you issue the command "git submodule update", you are telling git to advance the sub-module repo to the "expected version" (i.e., where the pointer points to). The reason this does not happen automatically is that you might have also made changes to the submodule, so you might want to do some merging there, instead of just pulling.
One thing to note is that if we were to set things up with sub-modules, then every now and then we would have to advance the GHC's "expected pointer" for various libraries to the latest (or a newer) version. Of course, we could have a script do this but, at least in theory, when someone makes a commit which updates the version of a sub-module, they are asserting that they things ought to work with the newer version of the sub-module.
-Iavor PS: I've only used sub-module on what project at work. At first I too was quite confused about what was going on, but I've come to think that submodules are a pretty reasonable way to deal with a situation which is inherently complex.
I spent quite some time yesterday playing with submodules to see if they would work for GHC. I'm fairly sure there are no fundamental reasons that we couldn't use them, but there are enough gotchas to put me off. I wrote down what I discovered here:
http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Submodules
The workflow is quite involved - more steps than are required with darcs-all (understandable, because we're storing more information). However, git isn't particularly helpful if you make a mistake or forget to do something. I forsee spending a lot of time digging myself and Simon out of bizarre repository states.
I discovered that Google have this tool called "repo" which is their darcs-all for the Android source tree. That might be worth looking at as an alternative in the future:
https://sites.google.com/a/android.com/opensource/download/using-repo
If we go with git, I suggest we stick with sync-all for the time being and think about either submodules or repo as possibilities for the future.
Cheers, Simon

On Thu, Jan 13 2011, Simon Marlow wrote:
I discovered that Google have this tool called "repo" which is their darcs-all for the Android source tree. That might be worth looking at as an alternative in the future:
https://sites.google.com/a/android.com/opensource/download/using-repo
If we go with git, I suggest we stick with sync-all for the time being and think about either submodules or repo as possibilities for the future.
The author of Gerrit/Repo has stated that he intends to have better integration between repo and git submodules, but so far he hasn't been putting all that much effort into it. If that were completed, repo would just be a much better way of managing submodules. David

On 01/13/2011 12:49 AM, Simon Marlow wrote:
I spent quite some time yesterday playing with submodules to see if they would work for GHC. I'm fairly sure there are no fundamental reasons that we couldn't use them, but there are enough gotchas to put me off. I wrote down what I discovered here:
http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Submodules
I think the "what works" section of there is already pretty compelling -- for example, it's an annoyance that "darcs-all diff" produces a diff file which mashes together all the subrepos and can't be applied at the top level. It's another annoyance that "darcs diff" doesn't produce unified diffs by default, what's the point of a diff that can't be |patch-ed? It seems from your discussion that subrepos are intended for your category "the rest of libraries (e.g. filepath, containers, bytestring, editline)" i.e. things that you expect to passively track and occasionally pick up new patches from. What's the argument against using subrepos for those? To me, the major gotcha is "git submodule update" detaching the changes, however changing the default to be a --merge would fix that for me. What about that don't you like? Would you rather want a "git submodule update --just-complain-and-exit"? The last 2 drawbacks you mention (pushing to submodules first and needing to commit to GHC for every subrepo commit) is IMHO the price to pay for a guarantee that you're always able to check out a coherent set of changes. If that's too onerous, maybe some of those libraries just belong in the main GHC repo. I'm another interested bystander who thinks that git would be a step forward, btw.

On 13/01/2011 19:11, Brian Bloniarz wrote:
On 01/13/2011 12:49 AM, Simon Marlow wrote:
I spent quite some time yesterday playing with submodules to see if they would work for GHC. I'm fairly sure there are no fundamental reasons that we couldn't use them, but there are enough gotchas to put me off. I wrote down what I discovered here:
http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Submodules
I think the "what works" section of there is already pretty compelling -- for example, it's an annoyance that "darcs-all diff" produces a diff file which mashes together all the subrepos and can't be applied at the top level. It's another annoyance that "darcs diff" doesn't produce unified diffs by default, what's the point of a diff that can't be |patch-ed?
It seems from your discussion that subrepos are intended for your category "the rest of libraries (e.g. filepath, containers, bytestring, editline)" i.e. things that you expect to passively track and occasionally pick up new patches from. What's the argument against using subrepos for those?
I think we'd want it to be all-or-none, i.e. use subrepos consistently or not at all. Some of these subrepos are developed quite actively and concurrently with GHC, particularly base. Indeed if it were the case that we were just consumers of an upstream repo, then I would agree with you that subrepos are a clear win.
To me, the major gotcha is "git submodule update" detaching the changes, however changing the default to be a --merge would fix that for me. What about that don't you like? Would you rather want a "git submodule update --just-complain-and-exit"?
--merge might be good sometimes, but other times you might want --rebase, or indeed --complain (which isn't provided, what you get is --hide-my-changes-and-detach-my-head, which incidentally is an aptly-named concept). With sync-all we're getting --merge by default, and you can ask for --rebase, but we're not getting the head-detaching. Cheers, Simon

On Mon, Jan 10, 2011 at 2:02 PM, Max Bolingbroke
Naturally other workflows are possible and I'm sure other list members will chime in with their own favourites :-)
Here's the flow I use: http://nvie.com/posts/a-successful-git-branching-model/ with the exception of having the master branch be the development branch, which is what most Git users expect.
Ultimately I'm quite concerned with keeping GHC HQ happy (as you guys do the lions share of the work!). I feel we should only make the switch if the most frequent committers (i.e. Simon, Simon and Ian) are *totally happy* with it and any associated workflow changes that may be required.
I agree with this sentiment. If GHC HQ believes Git will make their job harder, I'm not in favor of a switch. (From personal experience I think it will make it easier after an initial short learning curve, but I cannot know this for certain of course.) Johan

On Mon, Jan 10 2011, Max Bolingbroke wrote:
2) There was also concern that Git isn't so great on Windows. I have heard that this is less of an issue now, but I never personally suffered from any problems, so can't be sure. (FWIW I used Git on Windows industrially ~1 year ago for 3 months and didn't have problems, though the people around me occasionally had issues with e.g. case insensitivity causing obscure error messages).
As a linux kernel subsystem maintainer (but probably not a GHC developer), I would probably still recommend git. Mercurial and git are getting fairly similar feature wise. I find git easier to work with lots of branches. As far as Windows performance, it is poorer than when using it on Linux, often vastly, but on Windows it tends to only perform about as slow as other VCSes. It's more the comparison against the Linux version that is heavily tuned for the large history and numerous branches of the kernel. David

Am 10.01.2011 14:02, schrieb Max Bolingbroke:
2) There was also concern that Git isn't so great on Windows. I have heard that this is less of an issue now, but I never personally suffered from any problems, so can't be sure. (FWIW I used Git on Windows industrially ~1 year ago for 3 months and didn't have problems, though the people around me occasionally had issues with e.g. case insensitivity causing obscure error messages).
We are using GIT for a (way smaller) C++ - project here, which is mostly Linux-based. Two of the developers (including me) were using Windows and VS for our part, while building the whole stuff (using the same sources) on Linux. The files were mounted via Samba. The main issue are the line endings, which are to be set correctly. This is a mess in TortoiseGIT, though the problem merely was that we were using the source on both systems. However, we do not exploit the full spectrum of tools and possibilities of GIT. Actually, I am unaware of many of its features... I am not contributing on GHC and am not going to. MFG Heiko

On Mon, Jan 10, 2011 at 12:19 PM, Simon Marlow
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
I would also be happier hacking on GHC if it was git based. My experience of integrating the new I/O manager wasn't very pleasant, due to having to re-record patches and jumping through other hoops. We also lost all the project history [1]. I also find git's tools for working with project history (e.g. using the PickAxe feature and better blame support) better. Being able to keep a first class copy of the GHC repo on GitHub also appeals a lot to me. I keep all my personal projects on GitHub and in my experience it has led to more contributions. While we had a Git clone of the GHC repo on GitHub in the past, it was a second class citizen and since you could actually make your changes against that GitHub repo, I never bothered using it. Cheers, Johan 1. While we managed to convert the git commits to Darcs patches, all the patches needed to be rerecorded as one big patch before submitting. I'm not entirely certain why, perhaps Simon M could elaborate.

I fully support this (especially if it lived on github), but we should
probably sort the top contributors to GHC in the past year or so and
consider their opinions on the matter in that order :) I certainly would not
be on that list. A git(hub)-based workflow would however facilitate any
minor contributions I might make (and I'd imagine those of many others).
Dan
On Mon, Jan 10, 2011 at 6:19 AM, Simon Marlow
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading.
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future
Rebase support is coming, and it does work, though the workflow is a bit laborious.
Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git.
The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!).
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Cheers, Simon
_______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc

On 2011-01-10 16:39, Daniel Peebles wrote:
(especially if it lived on github)
Even if GitHub is used you should probably arrange some other kind of backup solution, because GitHub reserves the right to delete your repository "for any reason at any time" (http://help.github.com/terms/). -- /NAD

On Mon, Jan 10, 2011 at 5:25 PM, Nils Anders Danielsson
Even if GitHub is used you should probably arrange some other kind of backup solution, because GitHub reserves the right to delete your repository "for any reason at any time" (http://help.github.com/terms/).
If that would ever happen (probably less likely than someone breaking into our own machine and deleting the repo) we could take a repo from anyone's machine and put it on any old Linux machine. Go distributed version control! Johan

As everyone has been saying, the primary issue is the workflow of the main contributors and the cost of the transition.
However, I made the transition to Git and GitHub earlier this year and that initial investment has been repaid handsomely (it’s the first system I have felt truly comfortable with).
I suspect a transition to Git would work out well in the long run and make the GHC sources more accessible.
Chris
From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Daniel Peebles
Sent: 10 January 2011 15:40
To: Simon Marlow
Cc: GHC CVS list; glasgow-haskell-users@haskell.org
Subject: Re: RFC: migrating to git
I fully support this (especially if it lived on github), but we should probably sort the top contributors to GHC in the past year or so and consider their opinions on the matter in that order :) I certainly would not be on that list. A git(hub)-based workflow would however facilitate any minor contributions I might make (and I'd imagine those of many others).
Dan
On Mon, Jan 10, 2011 at 6:19 AM, Simon Marlow
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading.
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future http://wiki.darcs.net/Roadmap Rebase support is coming, and it does work, though the workflow is a bit laborious. Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git. The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!). We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute? Cheers, Simon _______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc _____ No virus found in this message. Checked by AVG - www.avg.com Version: 10.0.1191 / Virus Database: 1435/3370 - Release Date: 01/09/11

I'd be for a move, but haven't contributed much lately. I use Git for
all my personal projects, so I consider Git to be useful. I
personally find sending patches via Git to be harder than with Darcs,
but if we use Github the pull-request-based model should work well.
I used Git on Windows two years ago and didn't have any problems (the
case sensitive file name issue has a well-documented setting to avoid
issues). I think I used msysGit and used msys to build GHC, so those
should work well together. (Granted, though, I used Git only for a
small code base at the time.)
We'd probably have to adopt the workflow that Johan linked to
(separate branch for every larger change, merge with --no-ff) but that
might actually improve things (e.g., unmerging a branch if necessary).
The important issues, mentioned by Max, remain and I agree that GHC HQ
should have the last decision on these.
On 10 January 2011 11:19, Simon Marlow
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading.
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future
Rebase support is coming, and it does work, though the workflow is a bit laborious.
Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git.
The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!).
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Cheers, Simon
_______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc
-- Push the envelope. Watch it bend.

It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From 2007 through 2009, I spent at least several months each doing real projects in each of darcs, hg, and git. After this experience I settled on git for all my personal projects and most of the projects done in my lab. I like git well enough to have migrated several
I'd be thrilled to see GHC migrate to git, and I'd be much more likely to make new contributions to the back end. The rest of this email contains observations about my own experience with source-code control. projects from legacy systems like CVS and (brace for it) RCS. I love the alleged features of darcs, but the reality of the performance is disappointing, and I once lost two week's work, which I had to painstakingly re-create by hand. I also have had difficulty learning the ancillary tools that support darcs. My workflow has never involved much cherry-picking, and I tried revising history ('rebasing') once and didn't like it. But I use git's "cheap branching and merging" workflow *very* heavily. (The only part of git I use more heavily is the graphical commit tool.) I left with a very poor impression of Mercurial. The simplicity is more apparent than real. Two grave faults are - Crucial functionality is provided by plugins in a configuration file. The configuration is not itself under revision control, and if different replicas have different configurations, results can be very confusing. - As far as I can tell, conflicts *must* be handled by a plugin. Every single one of these plugins requires a graphical tool, and I found each tool more confusing than the next. To make some merges work I had to get help from students---proving the old adage that the most clueless user of your software is not a graduate student; it is a tenured professor. There are plenty of other problems with Mercurial (commits don't have unique names; it doesn't cope well with big files; the graphical commit tool is a usability disaster; yada yada yada). After the initial settling-in period, I've been very happy with git. Don't get me wrong: git is a terrible tool---but it's the best of a bad breed. Anybody switching to git should be prepared: - Learning git is very unpleasant. I would say that the design is overly complex, but I see no evidence that any activity called 'designing the system' ever took place. (Example: in the world of git, 'push' and 'pull' are not dual.) Some of what makes git strange *does* make good design sense, but it is not explained well. (Example: it took me forever to understand that the mysterious 'index' is simply a device for packaging a group of changes into a single, atomic 'commit'.) - Git doesn't do what it says on the tin. In particular, certain combinations of actions are known to lead to breakage. Examples I have encountered personally include - Pushing to a repo that has changes in its working directory - Changing history in a repository not utterly private - Naming a branch 'head', which works fine on Unix and causes baffling failures on other filesystems Using git successfully requires that you avoid the vermin in the dark corners. But my top three activities---commit, publish, branch/merge---are all well supported and happen *quickly*. The graphical commit tool and history browser are reasonably good. A final comment: asking people to make the transition to git on their own is asking a lot. If GHC Central want to make this change, we should plan on some kind of tutorial, perhaps at the next Haskell Implementors' Workshop, to help people migrate. Norman

On Mon, Jan 10, 2011 at 12:47:43PM -0500, Norman Ramsey wrote:
My workflow has never involved much cherry-picking, and I tried revising history ('rebasing') once and didn't like it. But I use git's "cheap branching and merging" workflow *very* heavily.
Do you mean you've used this to do something similar to maintaining a GHC stable branch? Thanks Ian

I am very interested in contributing to GHC, though the state of development with darcs makes me hesitate. A switch to git would make contribution to the project much easier. --trevor On 01/10/2011 03:19 AM, Simon Marlow wrote:
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading.
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future
Rebase support is coming, and it does work, though the workflow is a bit laborious.
Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git.
The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!).
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Jan 10, 2011, at 5:19 AM, Simon Marlow wrote:
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
+1 for moving to git As an infrequent contributor I would welcome the move to git. I think the biggest advantage from my perspective would be enabling branches which I have avoided up to now because of the painful process I hear about from others. Another possible advantage to git would be its support for submodules[1]. If we made the switch to git for all the repositories that GHC uses, then we could set them up as submodules. The advantage of submodules is that the GHC repo would contain pointers to the exact commit needed in the remote repository, and they would be under version control. Having submodules for the other repos would be similar to the darcs_all script, but would not have the danger of leaving [dangling pointers][2] when making a new branch. [1] http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html [2] http://www.haskell.org/pipermail/cvs-ghc/2010-November/057573.html

On Mon, Jan 10, 2011 at 06:22:03PM -0600, David Peixotto wrote:
Another possible advantage to git would be its support for submodules[1]. If we made the switch to git for all the repositories that GHC uses, then we could set them up as submodules. The advantage of submodules is that the GHC repo would contain pointers to the exact commit needed in the remote repository, and they would be under version control. Having submodules for the other repos would be similar to the darcs_all script, but would not have the danger of leaving [dangling pointers][2] when making a new branch.
[1] http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html [2] http://www.haskell.org/pipermail/cvs-ghc/2010-November/057573.html
During a list conversation about migrating to Git for another software project, there was a discussion about submodules vs. subtrees for tracking other projects. A subtree seems to be a way of getting the contents of a branch merged at a non-root location. It might be a relevant read and something to evaluate. http://progit.org/book/ch6-7.html -- Lars Viklund | zao@acc.umu.se

On 25 January 2011 09:35, Lars Viklund
A subtree seems to be a way of getting the contents of a branch merged at a non-root location. It might be a relevant read and something to evaluate.
There is also the git-subtree project (https://github.com/apenwarr/git-subtree). They explain the difference from the subtree merge strategy as: "The main difference is that, besides merging the other project as a subdirectory, you can also extract the entire history of a subdirectory from your project and make it into a standalone project. Unlike the subtree merge strategy you can alternate back and forth between these two operations. If the standalone library gets updated, you can automatically merge the changes into your project; if you update the library inside your project, you can "split" the changes back out again and merge them back into the library project." Might be useful. A simple example of its use is shown at http://ayende.com/Blog/archive/2011/01/10/git-subtree.aspx Cheers, Max

On Mon, 10 Jan 2011, Simon Marlow wrote:
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us.
I don't develop GHC so you shouldn't really listen to me, but if you think Darcs makes branching and merging very diffcult then you won't like Git et al. which make braching and merging impossible since their semantic model is broken. One of the main tasks of a distrubuted revision control system is to handle branching and merging, so you'd think they'd get that core functionality right. But almost every distributed revision control system fails to satisfy the merge law which states: Merging a the change from branch B into branch A followed by merging a subsequent change from branch B into branch A will have the same results as merging both changes from branch B into branch A at once. So if your histroy looks like A | \ B1 C | B2 then the result, C2, of merging B1 into C and then merging B2 into that A | \ B1 C | \| B2 C1 \| C2 should be equal to the result, C3, of merging B2 into C A | \ B1 C | | B2 | \| C3 Git and basically every other DVCS fails to statify this law. See http://web.archive.org/web/20070603113858/zooko.com/badmerge/simple.html for an example where Git will fail to satify the merege law (granted I haven't tested Git on this example lately). AFAIK th only DVCSs that statify this law are Darcs and Codeville. Remember what Dijkstra said, "People willing to trade correctness for speed deserve neither and will lose both", or something like that. -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

On 11/01/2011 00:36, roconnor@theorem.ca wrote:
On Mon, 10 Jan 2011, Simon Marlow wrote:
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us.
I don't develop GHC so you shouldn't really listen to me, but if you think Darcs makes branching and merging very diffcult then you won't like Git et al. which make braching and merging impossible since their semantic model is broken.
Thanks for this. I distilled your example into a shell script that uses git, and demonstrates that git gets the merge wrong: http://hpaste.org/42953/git_mismerge Still, git could get this merge right, it just doesn't (I know there are more complex cases that would be very hard for git to get right). I suspect that in practice this rarely matters, because context-based merging usually does the right thing. Cheers, Simon

On Tue, 11 Jan 2011, Simon Marlow wrote:
Thanks for this. I distilled your example into a shell script that uses git, and demonstrates that git gets the merge wrong:
I've posted an annotation at http://hpaste.org/paste/42953/git_mismerge_annotation#p42966 which shows the difference between pulling patches one at a time, and pulling both patches together.
Still, git could get this merge right, it just doesn't (I know there are more complex cases that would be very hard for git to get right). I suspect that in practice this rarely matters, because context-based merging usually does the right thing.
The operative word being *usually*. Remember what Dijkstra said. :) -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

* Simon Marlow:
Thanks for this. I distilled your example into a shell script that uses git, and demonstrates that git gets the merge wrong:
http://hpaste.org/42953/git_mismerge
Still, git could get this merge right, it just doesn't (I know there are more complex cases that would be very hard for git to get right). I suspect that in practice this rarely matters, because context-based merging usually does the right thing.
Git will have a very hard time getting this right because it is not that history-aware. It's also unlikely that this is implemented because this mismatching of changes happens only rarely, unless you have a coding style which heavily relies on copy-and-paste. (It has happened in real-world merges, though. It is also easy to construct similar examples involving file renames, I believe.) I know only one criterion for merge correctness: developers working serially on the code base would end up with the same result. (This is based on the concept of a serializability in transaction processing systems.) It is clear that no system can satisfy this. For instance, suppose you have a LaTeX document for one-page flyer. Obviously, there is a very hard requirement that you can have only one page of text. Two parallel edits can satisfy this constraint, but their automatic merge might not. (Zooko's example is different in that there is an apparently correct solution, so it is not absolutely necessary to bail out, but of course, the authors could likely squeeze their content on a single page, too.) Inevitably, you have to make trade-offs. The Git approach seems to suit more developers and codebases than the darcs approach. Git mismerges are much rarer than non-completing darcs merges. On the other hand, speaking as a non-contributor, the requirement to deal with multiple version control systems seems awkward. But the current sub-tree approach also feels a bit clunky (same as for OpenJDK, by the way).

I'm inclined to vote +1 for a move to git. JP and I seem to collaborate just
fine using github for EclipseFP and scion, FWIW. I tend to develop on ad hoc
branches before I merge changes back onto the master branch.
I can't say that either of us have run into significant problems, although I
did hose myself once merging some of JP's changes onto my branch. Not a big
deal, since one will run into those problems from time to time.
Win32/64: Current msysgit hasn't caused me any significant pain. I'm not
entirely focused on performance, just getting latest patch sets from the
github repo.
Since it's just the two of us working on separate forks, I'm not sure that
either of us are pushing git to its limits. I suspect that if there are
multiple branches being developed, merging code from multiple branches into
your branch will be tough. Not sure that any VCS will help you there.
Where things will get really nasty is merging later changes to the
scion-server code back with nominolo's latest "mega patch". This is where
theory of patches or any VCS theory will just break, since the changes are
substantial. I'm not sure any VCS or DVCS will ever solve the problem of
major divergence.
-scooter
On Mon, Jan 10, 2011 at 3:19 AM, Simon Marlow
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading.
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future
Rebase support is coming, and it does work, though the workflow is a bit laborious.
Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git.
The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!).
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 10 January 2011 22:19, Simon Marlow
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
I would really like GHC to move to git. I find darcs pretty annoying when working in a branch and the performance still just isn't good enough (e.g can't use 'annotate'). I've been a big fan of git from pretty much as soon as I started using it, its interface is badly designed at times but with the help of google I've always been able to get it to do what I want. Defiantly can't say the same for darcs. It didn't stop me contributing but when I first started hacking on ghc I was very put of by darcs as I had only recently got over the nightmares of the very common exponential merge issue from 1.x days.

Simon Marlow wrote:
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future
I've just updated the roadmap for darcs 2.8 (the next major release, due this Summer) to mention a few items about conflict handling: - "Include old text in conflict marking" If you have a conflict where you start with xy, and in one branch change that to x and in another change it to xyz, the current marking is pretty hard to decipher: v v v v v x ********* xyz ^ ^ ^ ^ ^ The new marking will give you v v v v v xy ========= x ********* xyz ^ ^ ^ ^ ^ which makes things a lot clearer This is committed already so will definitely be in (in fact I'm now tempted to include it in the upcoming patch release, 2.5.1) - "reduce the number of alternatives presented when there are complicated conflicts" Complicated conflicts often result in a mess of conflicting options being presented to you, where some of the alternatives are a seemingly randomly chosen subset of the one side of the conflict. I've got some patches that improve this problem, though they don't eliminate it completely - this will probably be in 2.8 (so far the patches only work for v2 format, but it should be possible to make them work for v1 too). - "include patch names in conflict markers" I have done some implementation work towards this and am hoping to have something complete for 2.8, though there are still some issues to resolve (might need a compatible patch format change, akin to the old-fashioned -> hashed conversion process). At the very least the work I've done so far will mean that it'll be available for conflicts arising during a rebase. There are other problems with conflict handling for which fixes are further off and much more uncertain, which I'm happy to discuss further if people are interested. Cheers, Ganesh =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================

On Mon, Jan 10, 2011 at 12:19 PM, Simon Marlow
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
From our perspective at GHC HQ, the biggest problem that we would hope to solve by switching is that darcs makes branching and merging very difficult for us. We have a few branches of HEAD that are very painful to keep merged with HEAD, and we would almost certainly have more branches if the overhead were lower. In some sense the overhead is self-inflicted because we have the no-conflict policy in the mainline repository, but that is to avoid problems with darcs' merging algorithms (both performance and correctness). We are still using darcs v1 patches rather than v2, but there are known problems with v2 which are preventing us from upgrading.
The darcs team have been making great strides with performance, but conflict handling remains a serious problem. The darcs roadmap doesn't show this being fixed in the near future
Rebase support is coming, and it does work, though the workflow is a bit laborious.
Besides the branching/merging/conflict issue, switching to git would give us plenty of side benefits, notably via access to a wealth of tool support. Making contribution easy is important to us too, and there are a lot of people using git.
The cost of switching is quite high, which is one reason we decided to stay with darcs last time. We have multiple repos that need to be converted, and for some of them, where the repo is being shared with other projects, we may have to mirror rather than convert in place. We're prepared to put in the effort if the gains would be worthwhile though (offers of help are more than welcome!).
We're intrested in opinions from both active and potential GHC developers/contributors. Let us know what you think - would this make life harder or easier for you? Would it make you less likely or more likely to contribute?
Cheers, Simon
I've contributed a small patch or two to GHC before, but nothing major. I expect the future to be similar: I don't anticipate doing any major work on GHC, but if I come across an itch which I see as within my capabilities and worthwhile to scratch, I might scratch it. I didn't have any problem with darcs after getting over the (not so steep) learning curve, and if GHC were to switch to git, I don't think I would have any problem with that either. So I guess I'll just be fine either way.
_______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc
-- Work is punishment for failing to procrastinate effectively.

I've made git mirrors of the current GHC HEAD repos (all of them), so people can try out their workflows with git. Hopefully this should work: git clone http://darcs.haskell.org/ghc-git/ghc.git cd ghc perl sync-all get You have to use sync-all instead of darcs-all, but the syntax is the same. e.g. to pull from upstream: perl sync-all pull Local clones should work: git clone <local-ghc-repo> cd ghc perl sync-all get and then a future 'perl sync-all pull' will pull fron the source, or you can pull from upstream with perl sync-all -r http://darcs.haskell.org/ghc-git pull The -r flag takes a remote or local repository, and works with push/pull/get, just like darcs-all. Note that sync-all is not executable, which is why I used "perl sync-all" rather than ./sync-all. You can chmod it, but the chmod will be seen as a local change by git which will get in the way of future pulls, and you'll need to stash or merge or rebase the change (welcome to git :-). The mirroring is manual right now. I might make it automatic, but it's not cheap (a few minutes CPU time each time it is invoked). The GHC repo is missing a few tags, I'm currently trying to sort that out. I haven't set up .gitignore yet, that's also on the todo list. Cheers, Simon

On Thu, Jan 13, 2011 at 4:03 PM, Simon Marlow
I've made git mirrors of the current GHC HEAD repos (all of them), so people can try out their workflows with git.
Poking around in the different repos works for me and is fast. For example: Find new files in base: $ cd libraries/base $ git status Find the definition and uses of threadWaitRead and skip git metadata $ git grep threadWaitRead See when threadWaitRead was added (and introduced in different files): $ git -SthreadWaitRead log (In 2001, by Simon M).
Note that sync-all is not executable, which is why I used "perl sync-all" rather than ./sync-all. You can chmod it, but the chmod will be seen as a local change by git which will get in the way of future pulls, and you'll need to stash or merge or rebase the change (welcome to git :-).
This particular problem is due to darcs (which we are mirroring) does not supporting executable permissions on files. We can just set the executable bit on the file and commit it. We should set up a git daemon at some point as it's much more efficient that pulling over HTTP. Johan

On 13 January 2011 15:30, Johan Tibell
We should set up a git daemon at some point as it's much more efficient that pulling over HTTP.
As of version 1.6.6, Git is much more efficient over HTTP than it used to be. http://progit.org/2010/03/04/smart-http.html In fact, GitHub are now using it as their default transport; they mention it in this blog post. https://github.com/blog/767-recent-services-interruptions Benedict.

On Thu, Jan 13 2011, Benedict Eastaugh wrote:
On 13 January 2011 15:30, Johan Tibell
wrote: We should set up a git daemon at some point as it's much more efficient that pulling over HTTP.
As of version 1.6.6, Git is much more efficient over HTTP than it used to be.
You do have to install the git smart-http plugin in the server, or it only uses the "dumb" HTTP protocol. David

Hello,
thanks for this Simon! I've ported my work on the type-naturals feature as
a git branch, and everything seems to be working as expected so far. I've
put my modified repos at http://code.galois.com/cgi-bin/gitweb (their names
all start with the "type-naturals" prefix). I am sending the link to the
repos because this server is running the gitweb interface, in case people
wanted to play around with it.
-Iavor
On Thu, Jan 13, 2011 at 9:40 AM, David Brown
On Thu, Jan 13 2011, Benedict Eastaugh wrote:
On 13 January 2011 15:30, Johan Tibell
wrote: We should set up a git daemon at some point as it's much more efficient that pulling over HTTP.
As of version 1.6.6, Git is much more efficient over HTTP than it used to be.
You do have to install the git smart-http plugin in the server, or it only uses the "dumb" HTTP protocol.
David
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hello Simon,
I've made git mirrors of the current GHC HEAD repos (all of them), so people can try out their workflows with git. Hopefully this should work:
git clone http://darcs.haskell.org/ghc-git/ghc.git cd ghc perl sync-all get
Thank you for this work. I cloned the git repository and tried to compile GHC. But "perl boot" does not work. % perl boot Unpacking time Failed to open stamp file: No such file or directory at boot-pkgs line 41. Running boot-pkgs failed: 512 at boot line 23. How can I create the "configure" script with GHC from this repository? --Kazu

On 14/01/2011 02:32, Kazu Yamamoto (山本和彦) wrote:
Hello Simon,
I've made git mirrors of the current GHC HEAD repos (all of them), so people can try out their workflows with git. Hopefully this should work:
git clone http://darcs.haskell.org/ghc-git/ghc.git cd ghc perl sync-all get
Thank you for this work.
I cloned the git repository and tried to compile GHC. But "perl boot" does not work.
% perl boot Unpacking time Failed to open stamp file: No such file or directory at boot-pkgs line 41. Running boot-pkgs failed: 512 at boot line 23.
How can I create the "configure" script with GHC from this repository?
Yes, I noticed that too. I'll push a fix today. In the meantime you can 'mkdir libraries/stamp'. Cheers, Simon

Thanks to everyone who responded on this thread! It's great to see so much feedback. Of the people who responded, most were in favour of a switch to git, with a few notable exceptions. Here at GHC HQ, I'm slightly in favour of switching while Ian and Simon PJ are agnostic. So, we've decided to try switching to git. The changeover will be staged: first we'll switch the GHC repository, and if all goes well we'll switch the libraries and other sub-repositories. This means we can experiment with git for the GHC repo while we establish exactly what strategy to use for each sub-repository (change the master to git, or leave the master in darcs and mirror). It does mean for a short while GHC developers will have to use two VCS tools, but the sync-all tool will hide the difference to some extent. The next step is to set up the git repo, probably based on the mirror I set up last week, and sort out various details (commit emails, buildbots, wiki updates etc.). I'll send out a notification once things are set up and we have a date/time for the git repo going live. Cheers, Simon

On Mon, January 17, 2011 11:08 pm, Simon Marlow wrote:
So, we've decided to try switching to git.
That's very sad!
The changeover will be staged: first we'll switch the GHC repository, and if all goes well we'll switch the libraries and other sub-repositories. This means we can experiment with git for the GHC repo while we establish exactly what strategy to use for each sub-repository (change the master to git, or leave the master in darcs and mirror). It does mean for a short while GHC developers will have to use two VCS tools, but the sync-all tool will hide the difference to some extent.
The next step is to set up the git repo, probably based on the mirror I set up last week, and sort out various details (commit emails, buildbots, wiki updates etc.). I'll send out a notification once things are set up and we have a date/time for the git repo going live.
Would it be possible to also produce a definite timeline for when the other repos will switch to git? Having to work with two different VCS in one project is a bit above my pain threshold so I'd like to know when I'll be able to get back into action after the switch. Roman

On 17/01/2011 14:08, rl@cse.unsw.EDU.AU wrote:
On Mon, January 17, 2011 11:08 pm, Simon Marlow wrote:
So, we've decided to try switching to git.
That's very sad!
The changeover will be staged: first we'll switch the GHC repository, and if all goes well we'll switch the libraries and other sub-repositories. This means we can experiment with git for the GHC repo while we establish exactly what strategy to use for each sub-repository (change the master to git, or leave the master in darcs and mirror). It does mean for a short while GHC developers will have to use two VCS tools, but the sync-all tool will hide the difference to some extent.
The next step is to set up the git repo, probably based on the mirror I set up last week, and sort out various details (commit emails, buildbots, wiki updates etc.). I'll send out a notification once things are set up and we have a date/time for the git repo going live.
Would it be possible to also produce a definite timeline for when the other repos will switch to git? Having to work with two different VCS in one project is a bit above my pain threshold so I'd like to know when I'll be able to get back into action after the switch.
Absolutely. The reason I didn't give a definite timescale yet is that we're currently finalizing our future release plans and the two are connected, because we want to minimize the amount of merging we have to do between a git branch and a darcs branch. Anyway, I don't expect the multi-VCS situation will persist for more than a month or so, and certainly not across a release. Cheers, Simon

On Mon, Jan 10, 2011 at 11:19:23AM +0000, Simon Marlow wrote:
It's time to consider again whether we should migrate GHC development from darcs to (probably) git.
The Boost project has been having similar discussions about when, how and if to migrate to Git, together with discussions on whether to modularize the project. During the massive thread (many times the size of this bikeshed thread), an interesting link was posted with a Post Mortem of how the PostgreSQL project migrated from CVS to GHC. http://lwn.net/Articles/409635/ Some key points noted there is that it took several false starts over a period of several years to migrate properly, with everything from corrupted/lost history, and that it's much more bothersome to adjust tools and process to fit the different working model. Key learnings (from the end of the article, reproduced for visibility): * Start with a Git mirror. * Designate a specific "Git migration team". Make sure they have lots of free time. * Your first attempt to migrate will probably fail, so you need to be prepared for more than one. * Changing your infrastructure, workflow, and build tool dependencies is harder than the repository conversion. * Make friends with the conversion tool authors. * Write lots of docs about the new tools and workflow. * The more history you have on your current system, the more work conversion is going to be. * Things which are broken in your current history are not going to fix themselves when you migrate. * When testing the conversion, make sure to look at more than HEAD and branch-tips. -- Lars Viklund | zao@acc.umu.se

In my one serious attempt to use git for one of my own projects, some seemingly-innocuous operation deleted a file on me and I lost a couple hours of work. I agree with the people who have said that git's documentation and semantics are highly confusing, moreso than darcs's. For example, what does it mean to "stage" a commit? Why is there an entire GUI window for this presumably-important action, and why do things I think I've committed not appear in the change history or mysteriously reverse themselves? If ghc went to git, it wouldn't make me less likely to contribute, but I would do so by checking everything into a local darcs repo and using that to track my own changes, then letting somebody else do the work of getting them into git! Which probably would reduce the likelihood of my patch being accepted, but I consider git a complete waste of my time and have zero interest in learning to use it. Plus, while I admire everyone's willingness to consider a VCS that isn't Haskell-based, I have to admit that there's a Haskell partisan in me. And there are real advantages to being a tight-knit community. If the GHC maintainers go to the Darcs maintainers and say "We absolutely need feature X or we will have to stop using you", the Darcs maintainers are likely to say "It'll be tough but we'll find a way to do it." But we aren't by any means the biggest project using Git, so the Git maintainers would be likely to say "That's nice, keep in touch." Obligatory disclaimer - I've never written any code actually in GHC, although I have used the API (I am the author of direct-plugins). But I frequently read its code to clarify how things work, and I do expect that it's a near-certainty that I'll be hacking GHC itself at some point in the future. -- Dan Knapp "An infallible method of conciliating a tiger is to allow oneself to be devoured." (Konrad Adenauer)

I just noticed that the discussion has been concluded and I was replying to an
old thread. I apologize for the noise.
On Wed, Feb 9, 2011 at 6:56 PM, Dan Knapp
In my one serious attempt to use git for one of my own projects, some seemingly-innocuous operation deleted a file on me and I lost a couple hours of work. I agree with the people who have said that git's documentation and semantics are highly confusing, moreso than darcs's. For example, what does it mean to "stage" a commit? Why is there an entire GUI window for this presumably-important action, and why do things I think I've committed not appear in the change history or mysteriously reverse themselves?
If ghc went to git, it wouldn't make me less likely to contribute, but I would do so by checking everything into a local darcs repo and using that to track my own changes, then letting somebody else do the work of getting them into git! Which probably would reduce the likelihood of my patch being accepted, but I consider git a complete waste of my time and have zero interest in learning to use it.
Plus, while I admire everyone's willingness to consider a VCS that isn't Haskell-based, I have to admit that there's a Haskell partisan in me. And there are real advantages to being a tight-knit community. If the GHC maintainers go to the Darcs maintainers and say "We absolutely need feature X or we will have to stop using you", the Darcs maintainers are likely to say "It'll be tough but we'll find a way to do it." But we aren't by any means the biggest project using Git, so the Git maintainers would be likely to say "That's nice, keep in touch."
Obligatory disclaimer - I've never written any code actually in GHC, although I have used the API (I am the author of direct-plugins). But I frequently read its code to clarify how things work, and I do expect that it's a near-certainty that I'll be hacking GHC itself at some point in the future.
-- Dan Knapp "An infallible method of conciliating a tiger is to allow oneself to be devoured." (Konrad Adenauer)
-- Dan Knapp "An infallible method of conciliating a tiger is to allow oneself to be devoured." (Konrad Adenauer)
participants (34)
-
Benedict Eastaugh
-
Brian Bloniarz
-
Bryan O'Sullivan
-
Chris Dornan
-
Claus Reinke
-
Dan Knapp
-
Daniel Peebles
-
David Brown
-
David Peixotto
-
David Terei
-
Edward Z. Yang
-
Florian Weimer
-
Gábor Lehel
-
Heiko Studt
-
Ian Lynagh
-
Iavor Diatchki
-
Johan Tibell
-
Kazu Yamamoto
-
Lars Viklund
-
Manuel M T Chakravarty
-
Max Bolingbroke
-
Nils Anders Danielsson
-
Norman Ramsey
-
Pavel Perikov
-
rl@cse.unsw.EDU.AU
-
roconnor@theorem.ca
-
Roman Leshchinskiy
-
scooter.phd@gmail.com
-
Scott Michel
-
Simon Marlow
-
Sittampalam, Ganesh
-
Thomas Schilling
-
Tony Finch
-
Trevor Elliott