HEADS-UP: new server-side validation git hook for submodule updates & call-for-help

Hello *, I've put in place a new server-side validation hook a few days ago, and since nobody seemed to have complained yet, I assume it didn't have any adverse effects so far :-) It will only be triggered when Git submodule references are touched by a commit; you can find some preliminary (but incomplete) documentation and a sample session triggering validation-failure on purpose at https://ghc.haskell.org/trac/ghc/ticket/8251#comment:4 (this will be turned into a proper wiki-page once #8251 is completed; there's some minor details wrt some corner cases that still need to be looked at) So, this mostly addresses the server-side requirements for migrating to a proper git-submodule set-up for ghc.git; The next steps, however, include taking care of the client-side work-flow for working with a fully "submoduled" ghc.git setup. Personally, I'm quite comfortable using direct git commands to manage such a construct, but I'm well aware not everyone is (as previous discussions here have shown). Also, as my time is rather limited, I'd like to ask interested parties to join in and help formulate the future client-side work-flow[1] and/or update (or rewrite) the 'sync-all' to provide a seamless or at least smooth transition for those GHC devs who want to keep using "sync-all" instead of using direct Git commands. [1]: There's some difference in how tracked upstream packages and GHC-HQ owned sub-repos are to be handled workflow-wise, to avoid ending up with a noisy ghc.git history. For instance, having ghc.git with submodules is not the same as having a huge monolithic ghc.git repository with all subrepos embedded. specifically, it might not be sensible to propagate *every* single subrepo-commit as a separate ghc.git submod-ref update, but rather in logical batches (N.B.: using submodules gives the additional ability to git bisect within subrepos instead of having to bisect always only at top-level). This is one example of things to discuss/consider when designing the new work-flow. Cheers, hvr

Herbert I really appreciate the work you are doing here -- thank you. As a client, though, I'm very ignorant about submodules, so I do need education about the work-flows that I should follow. If there are things I must or must not do, I need telling about them. Much is taken care of by sync-all, which is great. If that continues to be the case, I'm happy! Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | Herbert Valerio Riedel | Sent: 18 March 2014 10:59 | To: ghc-devs | Subject: HEADS-UP: new server-side validation git hook for submodule | updates & call-for-help | | Hello *, | | I've put in place a new server-side validation hook a few days ago, and | since nobody seemed to have complained yet, I assume it didn't have any | adverse effects so far :-) | | It will only be triggered when Git submodule references are touched by a | commit; you can find some preliminary (but incomplete) documentation and | a sample session triggering validation-failure on purpose at | | https://ghc.haskell.org/trac/ghc/ticket/8251#comment:4 | | (this will be turned into a proper wiki-page once #8251 is completed; | there's some minor details wrt some corner cases that still need to be | looked at) | | So, this mostly addresses the server-side requirements for migrating to | a proper git-submodule set-up for ghc.git; | | The next steps, however, include taking care of the client-side work- | flow for working with a fully "submoduled" ghc.git setup. Personally, | I'm quite comfortable using direct git commands to manage such a | construct, but I'm well aware not everyone is (as previous discussions | here have shown). Also, as my time is rather limited, I'd like to ask | interested parties to join in and help formulate the future client-side | work-flow[1] and/or update (or rewrite) the 'sync-all' to provide a | seamless or at least smooth transition for those GHC devs who want to | keep using "sync-all" instead of using direct Git commands. | | | [1]: There's some difference in how tracked upstream packages and | GHC-HQ owned sub-repos are to be handled workflow-wise, to avoid | ending up with a noisy ghc.git history. | | For instance, having ghc.git with submodules is not the same as | having a huge monolithic ghc.git repository with all subrepos | embedded. specifically, it might not be sensible to propagate | *every* single subrepo-commit as a separate ghc.git submod-ref | update, but rather in logical batches (N.B.: using submodules | gives the additional ability to git bisect within subrepos instead | of having to bisect always only at top-level). This is one example | of things to discuss/consider when designing the new work-flow. | | Cheers, | hvr | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

Lets give some example workflows for working with submodules. Here's what I
think a raw (i.e. no sync-all) update to base will look like. Please
correct me if I'm wrong.
# Step 1:
cd ~/src/ghc/libraries/base
# edit some_file
git add some_file
git commit -m "Commit to base repo"
git push # push update to base to git.haskell.org
# Step 2
cd ~/src/ghc
git add libraries/base
git commit -m "Have GHC use the new base version"
git push # push update to ghc to git.haskell.org
Failure modes include:
* Forgetting step 2: the ghc repo will point to a slightly older base next
time someone checks it out. Fixing things when in this state: just perform
step 2.
* Forgetting `git push` in step 1. the ghc repo will point to a base
commit that doesn't exist (except on some developers machine). Fixing
things when in this state: the developer who forgot to `git push` in step 1
needs to do that.
How could sync-all help us:
* sync-all push could push all repos, preventing failure case 2 above.
The second interesting workflow involving pulling new changes. This is what
the raw (i.e. no sync-all) workflow will look like:
cd ~/src/ghc
git pull
git submodule update
Failure modes include:
* Forgetting the `submodule update` and then doing e.g. `git commit -am
"some compile commit"`, reverting the pointer to e.g. base to whatever
older version the developer was using. No commits are lost (nothing changes
in the base repo), but the ghc repo will point to an older commit.
How could sync-all help us:
* sync-all pull could always run `submodule update`.
The server-side check that Herbert added will make sure that the failure
mode cannot happen, as you explicitly have to say in the commit message
that you updated a submodule.
I think if base was folded into ghc.git very few people would have to deal
with submodules.
On Tue, Mar 18, 2014 at 11:58 AM, Herbert Valerio Riedel
Hello *,
I've put in place a new server-side validation hook a few days ago, and since nobody seemed to have complained yet, I assume it didn't have any adverse effects so far :-)
It will only be triggered when Git submodule references are touched by a commit; you can find some preliminary (but incomplete) documentation and a sample session triggering validation-failure on purpose at
https://ghc.haskell.org/trac/ghc/ticket/8251#comment:4
(this will be turned into a proper wiki-page once #8251 is completed; there's some minor details wrt some corner cases that still need to be looked at)
So, this mostly addresses the server-side requirements for migrating to a proper git-submodule set-up for ghc.git;
The next steps, however, include taking care of the client-side work-flow for working with a fully "submoduled" ghc.git setup. Personally, I'm quite comfortable using direct git commands to manage such a construct, but I'm well aware not everyone is (as previous discussions here have shown). Also, as my time is rather limited, I'd like to ask interested parties to join in and help formulate the future client-side work-flow[1] and/or update (or rewrite) the 'sync-all' to provide a seamless or at least smooth transition for those GHC devs who want to keep using "sync-all" instead of using direct Git commands.
[1]: There's some difference in how tracked upstream packages and GHC-HQ owned sub-repos are to be handled workflow-wise, to avoid ending up with a noisy ghc.git history.
For instance, having ghc.git with submodules is not the same as having a huge monolithic ghc.git repository with all subrepos embedded. specifically, it might not be sensible to propagate *every* single subrepo-commit as a separate ghc.git submod-ref update, but rather in logical batches (N.B.: using submodules gives the additional ability to git bisect within subrepos instead of having to bisect always only at top-level). This is one example of things to discuss/consider when designing the new work-flow.
Cheers, hvr _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hello Johan, On 2014-03-18 at 19:17:55 +0100, Johan Tibell wrote:
Lets give some example workflows for working with submodules. Here's what I think a raw (i.e. no sync-all) update to base will look like. Please correct me if I'm wrong.
# Step 1: cd ~/src/ghc/libraries/base # edit some_file git add some_file git commit -m "Commit to base repo" git push # push update to base to git.haskell.org
'git push' w/o a refspec will only work, if the HEAD isn't detached you'd rather have to invoke something like 'git push origin HEAD:ghc-head'[1] (or have a tracked branch checked out)
# Step 2 cd ~/src/ghc git add libraries/base git commit -m "Have GHC use the new base version" git push # push update to ghc to git.haskell.org
Failure modes include:
* Forgetting step 2: the ghc repo will point to a slightly older base next time someone checks it out. Fixing things when in this state: just perform step 2.
that's brings up an interesting question (that was also mentioned on #ghc already): Are there cases when it is desirable to point to an older commit on purpose? (one use-case may be, if you want to rollback ghc.git to some older commit to unbreak the build w/o touching the submodule repo itself) (somewhat related feature: "git submodule update --remote")
* Forgetting `git push` in step 1. the ghc repo will point to a base commit that doesn't exist (except on some developers machine). Fixing things when in this state: the developer who forgot to `git push` in step 1 needs to do that.
Actually, the new server-side hook will reject (for non-wip/ branches at least) a ghc.git commit which would result in a submod-ref pointing to a non-existing commit, so this one's covered already.
How could sync-all help us:
* sync-all push could push all repos, preventing failure case 2 above.
(as I wrote, this can't happen thanks to the new hook script) However, see man-page for "git push --recurse-submodules"
The second interesting workflow involving pulling new changes. This is what the raw (i.e. no sync-all) workflow will look like:
cd ~/src/ghc git pull git submodule update
Failure modes include:
* Forgetting the `submodule update` and then doing e.g. `git commit -am "some compile commit"`, reverting the pointer to e.g. base to whatever older version the developer was using. No commits are lost (nothing changes in the base repo), but the ghc repo will point to an older commit.
How could sync-all help us:
* sync-all pull could always run `submodule update`.
The server-side check that Herbert added will make sure that the failure mode cannot happen, as you explicitly have to say in the commit message that you updated a submodule.
I think if base was folded into ghc.git very few people would have to deal with submodules.
if 'base' remains tightly coupled to ghc internals, that might be indeed be the easiest solution; I'm just not sure how the big base-split will be affected by folded-into-ghc base. Also, supporting a sensible 'cabal get -s base' will require a bit more work (or we'd have to remove the ability for that again -- not that it is of much use anyway) PS: I'm wondering if the next-gen 'sync-all' couldn't be simply realised by defining a set of git aliases[2]; e.g. it's rather commond to have a 'git pullall' alias defined for combining the effect of 'git pull' and 'git submodule update' into one alias[3] Cheers, hvr [1]: occurences of 'ghc-head' will most likely be renamed to 'master' as that's more consistent with GHC HEAD being 'master' in ghc.git as well [2]: https://git.wiki.kernel.org/index.php/Aliases [3]: git config alias.pullall '!git pull && git submodule update --init --recursive'

On 18/03/2014 18:17, Johan Tibell wrote:
Lets give some example workflows for working with submodules. Here's what I think a raw (i.e. no sync-all) update to base will look like. Please correct me if I'm wrong.
# Step 1: cd ~/src/ghc/libraries/base # edit some_file git add some_file git commit -m "Commit to base repo" git push # push update to base to git.haskell.org http://git.haskell.org
I believe this doesn't work, because the normal state for a submodule is "detached HEAD", so you can't commit to it because it isn't on a branch. You have to first "get checkout master", or "git checkout -b mybranch master".
# Step 2 cd ~/src/ghc git add libraries/base git commit -m "Have GHC use the new base version" git push # push update to ghc to git.haskell.org http://git.haskell.org
Failure modes include:
* Forgetting step 2: the ghc repo will point to a slightly older base next time someone checks it out. Fixing things when in this state: just perform step 2. * Forgetting `git push` in step 1. the ghc repo will point to a base commit that doesn't exist (except on some developers machine). Fixing things when in this state: the developer who forgot to `git push` in step 1 needs to do that.
How could sync-all help us:
* sync-all push could push all repos, preventing failure case 2 above.
The second interesting workflow involving pulling new changes. This is what the raw (i.e. no sync-all) workflow will look like:
cd ~/src/ghc git pull git submodule update
Failure modes include:
* Forgetting the `submodule update` and then doing e.g. `git commit -am "some compile commit"`, reverting the pointer to e.g. base to whatever older version the developer was using. No commits are lost (nothing changes in the base repo), but the ghc repo will point to an older commit.
The other failure mode is that the submodule contains local changes, that just got overwritten by the "git submodule update". Perhaps git is better about telling you when this is about to happen and/or failing in submodule update now? What about when the submodule is on a branch? Cheers, Simon
How could sync-all help us:
* sync-all pull could always run `submodule update`.
The server-side check that Herbert added will make sure that the failure mode cannot happen, as you explicitly have to say in the commit message that you updated a submodule.
I think if base was folded into ghc.git very few people would have to deal with submodules.
On Tue, Mar 18, 2014 at 11:58 AM, Herbert Valerio Riedel
mailto:hvr@gnu.org> wrote: Hello *,
I've put in place a new server-side validation hook a few days ago, and since nobody seemed to have complained yet, I assume it didn't have any adverse effects so far :-)
It will only be triggered when Git submodule references are touched by a commit; you can find some preliminary (but incomplete) documentation and a sample session triggering validation-failure on purpose at
https://ghc.haskell.org/trac/ghc/ticket/8251#comment:4
(this will be turned into a proper wiki-page once #8251 is completed; there's some minor details wrt some corner cases that still need to be looked at)
So, this mostly addresses the server-side requirements for migrating to a proper git-submodule set-up for ghc.git;
The next steps, however, include taking care of the client-side work-flow for working with a fully "submoduled" ghc.git setup. Personally, I'm quite comfortable using direct git commands to manage such a construct, but I'm well aware not everyone is (as previous discussions here have shown). Also, as my time is rather limited, I'd like to ask interested parties to join in and help formulate the future client-side work-flow[1] and/or update (or rewrite) the 'sync-all' to provide a seamless or at least smooth transition for those GHC devs who want to keep using "sync-all" instead of using direct Git commands.
[1]: There's some difference in how tracked upstream packages and GHC-HQ owned sub-repos are to be handled workflow-wise, to avoid ending up with a noisy ghc.git history.
For instance, having ghc.git with submodules is not the same as having a huge monolithic ghc.git repository with all subrepos embedded. specifically, it might not be sensible to propagate *every* single subrepo-commit as a separate ghc.git submod-ref update, but rather in logical batches (N.B.: using submodules gives the additional ability to git bisect within subrepos instead of having to bisect always only at top-level). This is one example of things to discuss/consider when designing the new work-flow.
Cheers, hvr _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hello * On 2014-03-18 at 11:58:36 +0100, Herbert Valerio Riedel wrote: [...]
The next steps, however, include taking care of the client-side work-flow for working with a fully "submoduled" ghc.git setup.
[...] After having discussed this with the current Haddock maintainers (i.e. Simon and Mateusz) who are volunteering to help ironing out the Git workflow with submodules by having haddock.git turned into one, I plan to convert utils/haddock into a proper submodule by the end of this weekend (i.e. 2014-03-22/23). This will allow us to gain experience with submodules for a single submodule while only very few developers are actively affected (as only few developers currently push directly to haddock.git these days) and write up Git-workflow-with-submodules documentation somewhere at https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/Git Details will follow when the conversion of haddock.git is actually implemented. Cheers, hvr

On 2014-03-20 at 09:53:32 +0100, Herbert Valerio Riedel wrote: [...]
Details will follow when the conversion of haddock.git is actually implemented.
The conversion has been implemented as of http://git.haskell.org/ghc.git/commitdiff/34b072177b687c8fcc24f87293beae0752... I've started writing up a bit on https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions/Git/Submodules on how to cope with Git submodules (using mostly `git` commands for now; once we have sorted out which `git` command sequences to use, we can start thinking about a next-gen `sync-all` replacement) Feel free to improve/extend the Wiki page! Cheers, hvr
participants (5)
-
Herbert Valerio Riedel
-
Herbert Valerio Riedel
-
Johan Tibell
-
Simon Marlow
-
Simon Peyton Jones