Relocating (some of) GHC's core-libraries to github.com/haskell

Hi GHC devs, In accordance with Edward and Austin, I want to move the primary home of the non-GHC specific core-library packages to the http://github.com/haskell/ organization. Specifically, I plan to move the following package Git repositories to the github.com/haskell organization: - array - directory - filepath - old-locale - old-time - process - stm - unix (N.B.: the 'Win32' package already lives at github.com/haskell) I'm not sure yet about the following packages, those are not officially maintained by the committee (@Simon, maybe you can state your preference where you want your packages to be hosted/issue-tracked in the future?) - parallel - deepseq - hpc - hoopl - haskell2010 - haskell98 The practical effects to GHC developers of this switch would be that: - Issue tracking for those packages moves over to GitHub. (the official maintaing entity is the core-library committee) - In ghc.git those packages will be *turned into Git submodules* (i.e. they will be handled just like the other existing 3rd party upstream packages such as 'Win32' or 'Cabal' are already handled[1], see also next item) - Using pull-requests on GitHub is highly recommended for submitting changes which are not urgent (instead of diverging the respective git.haskell.org lagged mirror repo) -- note also that pull-requests will be validated by Travis-CI for multiple GHC versions to help detect compat-breaking changes. However, GHC developers can easily be given push-rights to the respective repos in the github.com/haskell/ organization should this turn out to be a more appropriate workflow. [1]: https://ghc.haskell.org/trac/ghc/wiki/Repositories/Upstream#Modifiyinglibrar... PS: 'haddock.git' is planned to move its issue tracking over to GitHub as well, however it's going to be handled slightly different (mostly because haddock is tightly coupled to the GHC API) and will be explained in more detail in a future separate email. Cheers, hvr

So let me check that I'm understanding correctly. Right now the source of truth for these repos is under git.haskell.org/ghc, and you're proposing that we move the source of truth to github. In addition we would still need the git.haskell.org/ghc repos, but they would become lagging repos tracking the github upstream? So the situation for pushing to these repos becomes more complex, becuase we have to push to upstream first, then the lagging repo, and finally update the submodule link. I've no objection to hosting issue trackers on github, but I'm concerned about the repo structure and the workflow for pushing becoming more complex. Cheers, Simon On 27/04/2014 10:14, Herbert Valerio Riedel wrote:
Hi GHC devs,
In accordance with Edward and Austin, I want to move the primary home of the non-GHC specific core-library packages to the http://github.com/haskell/ organization.
Specifically, I plan to move the following package Git repositories to the github.com/haskell organization:
- array - directory - filepath - old-locale - old-time - process - stm - unix (N.B.: the 'Win32' package already lives at github.com/haskell)
I'm not sure yet about the following packages, those are not officially maintained by the committee (@Simon, maybe you can state your preference where you want your packages to be hosted/issue-tracked in the future?)
- parallel - deepseq - hpc - hoopl - haskell2010 - haskell98
The practical effects to GHC developers of this switch would be that:
- Issue tracking for those packages moves over to GitHub. (the official maintaing entity is the core-library committee)
- In ghc.git those packages will be *turned into Git submodules* (i.e. they will be handled just like the other existing 3rd party upstream packages such as 'Win32' or 'Cabal' are already handled[1], see also next item)
- Using pull-requests on GitHub is highly recommended for submitting changes which are not urgent (instead of diverging the respective git.haskell.org lagged mirror repo) -- note also that pull-requests will be validated by Travis-CI for multiple GHC versions to help detect compat-breaking changes.
However, GHC developers can easily be given push-rights to the respective repos in the github.com/haskell/ organization should this turn out to be a more appropriate workflow.
[1]: https://ghc.haskell.org/trac/ghc/wiki/Repositories/Upstream#Modifiyinglibrar...
PS: 'haddock.git' is planned to move its issue tracking over to GitHub as well, however it's going to be handled slightly different (mostly because haddock is tightly coupled to the GHC API) and will be explained in more detail in a future separate email.
Cheers, hvr

Hello Simon, On 2014-04-28 at 10:16:39 +0200, Simon Marlow wrote:
So let me check that I'm understanding correctly. Right now the source of truth for these repos is under git.haskell.org/ghc, and you're proposing that we move the source of truth to github. In addition we would still need the git.haskell.org/ghc repos, but they would become lagging repos tracking the github upstream?
So the situation for pushing to these repos becomes more complex, becuase we have to push to upstream first, then the lagging repo, and finally update the submodule link.
Yes, that'd be the extreme case (and we have that kind of complexity already for packages such as transformers/time, where we even have to bridge the darcs/git gap) However, we can configure the lagged mirror such that we'd automatically mirror github's 'master' branch into our lagged mirror (we'd still be free to create local wip/* or ghc-7.10 branches at git.haskell.org if needed) Then you'd only have to do the 2-step workflow, i.e. updating github's master branch (or for more experimental stuff, a git.haskell.org-local wip/ branch), and update the gitlink in ghc.git
I've no objection to hosting issue trackers on github, but I'm concerned about the repo structure and the workflow for pushing becoming more complex.
I'd like to point out, that while it will become more complex in one way or another (if we want to get away from the current loosely-coupled sub-repo setup), breaking changes in GHC HEAD requiring immediate action happen rather infrequently (after all, we tend to avoid such breakages in the first place, as they'd usually affect a larger portion of Hackage then as well)

On 28/04/2014 09:32, Herbert Valerio Riedel wrote:
Hello Simon,
On 2014-04-28 at 10:16:39 +0200, Simon Marlow wrote:
So let me check that I'm understanding correctly. Right now the source of truth for these repos is under git.haskell.org/ghc, and you're proposing that we move the source of truth to github. In addition we would still need the git.haskell.org/ghc repos, but they would become lagging repos tracking the github upstream?
So the situation for pushing to these repos becomes more complex, becuase we have to push to upstream first, then the lagging repo, and finally update the submodule link.
Yes, that'd be the extreme case (and we have that kind of complexity already for packages such as transformers/time, where we even have to bridge the darcs/git gap)
However, we can configure the lagged mirror such that we'd automatically mirror github's 'master' branch into our lagged mirror (we'd still be free to create local wip/* or ghc-7.10 branches at git.haskell.org if needed)
I think that's fine. As Simon points out, we already have lagging repo functionality in the form of the submodule links, so the repo on git.haskell.org can be a pure mirror. Let me make one suggestion: have a sync-all command that automatically checks out a submodule onto a branch and sets the push-url to the appropriate upstream. Something like $ ./sync-all checkout-submodule array so you would do this before modifying 'array', and then a git push inside that submodule would do the right thing. Cheers, Simon
Then you'd only have to do the 2-step workflow, i.e. updating github's master branch (or for more experimental stuff, a git.haskell.org-local wip/ branch), and update the gitlink in ghc.git
I've no objection to hosting issue trackers on github, but I'm concerned about the repo structure and the workflow for pushing becoming more complex.
I'd like to point out, that while it will become more complex in one way or another (if we want to get away from the current loosely-coupled sub-repo setup), breaking changes in GHC HEAD requiring immediate action happen rather infrequently (after all, we tend to avoid such breakages in the first place, as they'd usually affect a larger portion of Hackage then as well)

Hello Simon, On 2014-04-28 at 11:28:35 +0200, Simon Marlow wrote: [...]
However, we can configure the lagged mirror such that we'd automatically mirror github's 'master' branch into our lagged mirror (we'd still be free to create local wip/* or ghc-7.10 branches at git.haskell.org if needed)
I think that's fine. As Simon points out, we already have lagging repo functionality in the form of the submodule links, so the repo on git.haskell.org can be a pure mirror.
Just so I get this right, does "pure mirror" here mean that we don't want users to be able to push to the automatically mirrored repo on git.haskell.org at all, but rather the only way to get any commits into the git.haskell.org mirrored repo would be push it via the GitHub repo? (I'd like that, as it would make the set-up easier and hopefully less confusing, as there'd be only a single data-flow path) Cheers, hvr

On 29/04/2014 10:58, Herbert Valerio Riedel wrote:
Hello Simon,
On 2014-04-28 at 11:28:35 +0200, Simon Marlow wrote:
[...]
However, we can configure the lagged mirror such that we'd automatically mirror github's 'master' branch into our lagged mirror (we'd still be free to create local wip/* or ghc-7.10 branches at git.haskell.org if needed)
I think that's fine. As Simon points out, we already have lagging repo functionality in the form of the submodule links, so the repo on git.haskell.org can be a pure mirror.
Just so I get this right, does "pure mirror" here mean that we don't want users to be able to push to the automatically mirrored repo on git.haskell.org at all, but rather the only way to get any commits into the git.haskell.org mirrored repo would be push it via the GitHub repo?
(I'd like that, as it would make the set-up easier and hopefully less confusing, as there'd be only a single data-flow path)
Makes sense to me, but how does that interact with your post-commit hook that checks for validity of the submodule updates? Cheers, Simon

On 2014-04-29 at 12:27:38 +0200, Simon Marlow wrote: [...]
I think that's fine. As Simon points out, we already have lagging repo functionality in the form of the submodule links, so the repo on git.haskell.org can be a pure mirror.
Just so I get this right, does "pure mirror" here mean that we don't want users to be able to push to the automatically mirrored repo on git.haskell.org at all, but rather the only way to get any commits into the git.haskell.org mirrored repo would be push it via the GitHub repo?
(I'd like that, as it would make the set-up easier and hopefully less confusing, as there'd be only a single data-flow path)
Makes sense to me, but how does that interact with your post-commit hook that checks for validity of the submodule updates?
(btw, it's actually a pre-receive/update script, as a post-commit hook runs to late to be able to reject ref-updates) The submod-referential-check script would still work, as it would check that at least at the time a push to ghc.git was done, the respective submodule commits were present at git.haskell.org I've asked the GitHub admins to disallow force-pushes on the new repos I created at github.com/haskell/ as a safety measure (they don't support disallowing branch-deletion though, so there's still a way to force-push by workarounding via branch deletion+recreation, but I'd trust the users we give write-access to[1] to not abuse this loophole) Moreover, I'd configure the sub-repos at git.haskell.org to never prune unreachable objects automatically as a short-term measure. This would allow to manually recover any "lost" commits, and make them reachable again, even if github.com/haskell already forgot about them. As for the mirroring itself, this may lag a little bit at first, until I get the scripting better: right now git.haskell.org would poll github.com every 10 minutes or so for new commits. Later-on that would be improved by enabling a post-commit webhook on github to notify git.haskell.org about new commits to reduce the mirroring latency. I hope this is enough for now. Also, I'd like to share the automatic mirroring-workflow with other packages already living at github.com/haskell, such as containers or bytestring. Long story short, the workflow for modifying a core-lib package formerly living at git.haskell.org would be to 1.) get the commit somehow into the new upstream at github.com[2] 2.) wait a little bit till the commit gets propagated automatically to git.haskell.org 3.) commit and push the submodule gitlink ref update in ghc.git to git.haskell.org Cheers, hvr [1]: I've created a 'GHC developers' team in the github.com/haskell org some time ago, which mirrors the users who have write access to git.haskell.org; I'd simply assign those to the new repos for now. So there should be no authorization-regressions. [2]: We'll provide a convenient way to redirect pushes to git.haskell.org/packages/$PKG to its real writable upstream repo I recently learned about Git's url.<base>.insteadOf url.<base>.pushInsteadOf redirection feature, you'd only have to set the redirect rules once, and then you could 'git clone --recursive' simply using the canonical http://git.haskell.org URLs, but it would instead go to github.com for fetch/pushes as instructed. This may be far easier and more robust than having sync-all rewriting the pushurls. this would also make it easy, to temporarily switch to github.com, if git.h.o is down, w/o having to reconfigure all your ghc source trees, as those rules are a setting that can live in ~/.gitconfig

I would really like that as well.
My experience is it is rather easy to get users to put together a pull
request through github.
It is rather more like pulling teeth to get them to use git properly and
put together a traditional patch.
This would greatly open up the workflow for end users contributing things
like small documentation fixes and the like.
-Edward
On Tue, Apr 29, 2014 at 5:58 AM, Herbert Valerio Riedel
Hello Simon,
On 2014-04-28 at 11:28:35 +0200, Simon Marlow wrote:
[...]
However, we can configure the lagged mirror such that we'd automatically mirror github's 'master' branch into our lagged mirror (we'd still be free to create local wip/* or ghc-7.10 branches at git.haskell.org if needed)
I think that's fine. As Simon points out, we already have lagging repo functionality in the form of the submodule links, so the repo on git.haskell.org can be a pure mirror.
Just so I get this right, does "pure mirror" here mean that we don't want users to be able to push to the automatically mirrored repo on git.haskell.org at all, but rather the only way to get any commits into the git.haskell.org mirrored repo would be push it via the GitHub repo?
(I'd like that, as it would make the set-up easier and hopefully less confusing, as there'd be only a single data-flow path)
Cheers, hvr

+1
On Tue, Apr 29, 2014 at 9:46 AM, Edward Kmett
I would really like that as well.
My experience is it is rather easy to get users to put together a pull request through github.
It is rather more like pulling teeth to get them to use git properly and put together a traditional patch.
This would greatly open up the workflow for end users contributing things like small documentation fixes and the like.
-Edward
On Tue, Apr 29, 2014 at 5:58 AM, Herbert Valerio Riedel
wrote: Hello Simon,
On 2014-04-28 at 11:28:35 +0200, Simon Marlow wrote:
[...]
However, we can configure the lagged mirror such that we'd automatically mirror github's 'master' branch into our lagged mirror (we'd still be free to create local wip/* or ghc-7.10 branches at git.haskell.org if needed)
I think that's fine. As Simon points out, we already have lagging repo functionality in the form of the submodule links, so the repo on git.haskell.org can be a pure mirror.
Just so I get this right, does "pure mirror" here mean that we don't want users to be able to push to the automatically mirrored repo on git.haskell.org at all, but rather the only way to get any commits into the git.haskell.org mirrored repo would be push it via the GitHub repo?
(I'd like that, as it would make the set-up easier and hopefully less confusing, as there'd be only a single data-flow path)
Cheers, hvr
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

In general I'd like to make the infrastructure reflect the story of who is managing these packages. Moreover, those who are maintaining a package, in this case the core-libraries committee, should have the final word on where their infrastructure lives. Putting issue-tracking in a clearly-defined place (not mixed up with the GHC Trac) is a good idea. (But please can it be clear, somehow, where each issue tracker is?) But I agree with Simon that complicating the workflow ought to have a benefit. What is the benefit in this case? We could instead * Leave the repos where they are * Move the repos but have GHC builds pull directly from the mother repo Both seem simpler than adding a new lagging repo. Having a lagging repo doesn't add anything, does it? Indeed, does it *ever* help to have a lagging repo? The SHA hash for a submodule uniquely determines that build depends on, so we don't need a whole repo for that! (I'm hazy about how and when to make GHC's repo point to newer versions of the sub-module.) Regardless of the decision here, can I appeal, once more, for clear documentation of the workflows that a GHC developer will encounter? Thanks Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Simon | Marlow | Sent: 28 April 2014 09:17 | To: Herbert Valerio Riedel; ghc-devs | Cc: Austin Seipp | Subject: Re: Relocating (some of) GHC's core-libraries to | github.com/haskell | | So let me check that I'm understanding correctly. Right now the source | of truth for these repos is under git.haskell.org/ghc, and you're | proposing that we move the source of truth to github. In addition we | would still need the git.haskell.org/ghc repos, but they would become | lagging repos tracking the github upstream? | | So the situation for pushing to these repos becomes more complex, | becuase we have to push to upstream first, then the lagging repo, and | finally update the submodule link. | | I've no objection to hosting issue trackers on github, but I'm | concerned about the repo structure and the workflow for pushing | becoming more complex. | | Cheers, | Simon | | On 27/04/2014 10:14, Herbert Valerio Riedel wrote: | > Hi GHC devs, | > | > In accordance with Edward and Austin, I want to move the primary home | > of the non-GHC specific core-library packages to the | > http://github.com/haskell/ organization. | > | > Specifically, I plan to move the following package Git repositories | to | > the github.com/haskell organization: | > | > - array | > - directory | > - filepath | > - old-locale | > - old-time | > - process | > - stm | > - unix (N.B.: the 'Win32' package already lives at | github.com/haskell) | > | > I'm not sure yet about the following packages, those are not | > officially maintained by the committee (@Simon, maybe you can state | > your preference where you want your packages to be | > hosted/issue-tracked in the future?) | > | > - parallel | > - deepseq | > - hpc | > - hoopl | > - haskell2010 | > - haskell98 | > | > The practical effects to GHC developers of this switch would be that: | > | > - Issue tracking for those packages moves over to GitHub. | > (the official maintaing entity is the core-library committee) | > | > - In ghc.git those packages will be *turned into Git submodules* | > (i.e. they will be handled just like the other existing 3rd party | > upstream packages such as 'Win32' or 'Cabal' are already | handled[1], | > see also next item) | > | > - Using pull-requests on GitHub is highly recommended for | submitting | > changes which are not urgent (instead of diverging the respective | > git.haskell.org lagged mirror repo) -- note also that pull- | requests | > will be validated by Travis-CI for multiple GHC versions to help | > detect compat-breaking changes. | > | > However, GHC developers can easily be given push-rights to the | > respective repos in the github.com/haskell/ organization should | this | > turn out to be a more appropriate workflow. | > | > [1]: | > | https://ghc.haskell.org/trac/ghc/wiki/Repositories/Upstream#Modifiying | > librariesforwhichthereisanupstreamrepository | > | > PS: 'haddock.git' is planned to move its issue tracking over to | GitHub | > as well, however it's going to be handled slightly different | (mostly | > because haddock is tightly coupled to the GHC API) and will be | > explained in more detail in a future separate email. | > | > | > Cheers, | > hvr | > | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

On 2014-04-28 at 10:40:15 +0200, Simon Peyton Jones wrote:
In general I'd like to make the infrastructure reflect the story of who is managing these packages. Moreover, those who are maintaining a package, in this case the core-libraries committee, should have the final word on where their infrastructure lives.
I've CC'ed Edward, in case he wants to chime in...
Putting issue-tracking in a clearly-defined place (not mixed up with the GHC Trac) is a good idea. (But please can it be clear, somehow, where each issue tracker is?)
For one, I plan to have the cabal 'bug-reports' & 'homepage' URL fields reflect the new location as soon as possible (I need to check with duncan, if there's a working way to manually edit the cabal meta-data shown on Hackage other than uploading a patchlevel release of the affected packages), when this relocation is done.
But I agree with Simon that complicating the workflow ought to have a benefit. What is the benefit in this case? We could instead * Leave the repos where they are
That's btw what's planned for haddock.git: have GitHub be merely a mirror of the mother repo at git.haskell.org; however, the workflow for merging pull-requests filed at GitHub requires being more careful (i.e. not use the "merge"-button in the GitHub web-gui, or any other GitHub feature modifying the Git repo), as any changes made to the GitHub copy of the git.haskell.org mother Git repo would get overwritten by the next automatic git.haskell.org->github.com mirroring. So the only downside I can think of is that you'd have to educate everyone working the GitHub mirrors how to modify the repository.
* Move the repos but have GHC builds pull directly from the mother repo
You'd still want to have an automatic mirror of that repo at git.haskell.org, as we need to traverse the sub-repo in order to validate submodule gitlink refs in ghc.git Also, you need to ensure that the upstream repo never loses the commits you reference from ghc.git (afaik there's a way to ask the GitHub admins to disable non-fast-forward updates for certain repos -- but that would mean, that you should refrain from creating topic-branches in the mother-repo, as those would not be allowed to be deleted -- GitHub doesn't support branch-based ACLs)
Both seem simpler than adding a new lagging repo.
Having a lagging repo doesn't add anything, does it?
It adds the ability to add temporary local changes if you don't have full control over the upstream repo, or if you plan on re-basing your work-in-progress patchsets (which would be disallowed inside the GitHub upstream repo)
Indeed, does it *ever* help to have a lagging repo? The SHA hash for a submodule uniquely determines that build depends on, so we don't need a whole repo for that!
That's right, but as pointed out above, we need a local copy for having ghc.git's validation-script be able to verify the "foreign-key constraint" property on the referenced SHA hash.
(I'm hazy about how and when to make GHC's repo point to newer versions of the sub-module.)
Regardless of the decision here, can I appeal, once more, for clear documentation of the workflows that a GHC developer will encounter?
participants (6)
-
Andrew Farmer
-
Edward Kmett
-
Herbert Valerio Riedel
-
Herbert Valerio Riedel
-
Simon Marlow
-
Simon Peyton Jones