Re: ghc-compete, a repository of fingerprints, and continuous integration

Hello Simon! On 2013-10-11 at 12:59:07 +0200, Simon Marlow wrote: [...]
This is great. With a bit of extra tool support for this we could actually do without submodules and go back to individual repos. Checking out a GHC revision in the past could consist of querying your ghc-complete repo for the fingerprint and then running the fingerprint tool.
(unless you haven't guessed, I'm not a huge fan of submodules)
What problems do you see with Git submodules specifically? I'm asking because I believe that we might be able address most of the concerns you have by tooling (e.g. by server-side hooks to enforce inter-repo invariants with respect to Git submodules, as well as client-side scripts to automate common developer tasks) so that the perceived pain of submodules can be reduced enough to make the change-function significantly more favorable for switching to Git submodules (+ folding some repos which are very tightly coupled/entangled to GHC into ghc.git itself, such as testsuite or nofib) Cheers, hvr

Hi all, On 2013-10-12 11:15, Herbert Valerio Riedel wrote:
Hello Simon!
On 2013-10-11 at 12:59:07 +0200, Simon Marlow wrote:
[...]
This is great. With a bit of extra tool support for this we could actually do without submodules and go back to individual repos. Checking out a GHC revision in the past could consist of querying your ghc-complete repo for the fingerprint and then running the fingerprint tool.
(unless you haven't guessed, I'm not a huge fan of submodules) What problems do you see with Git submodules specifically? I'm asking because I believe that we might be able address most of the concerns you have by tooling (e.g. by server-side hooks to enforce inter-repo invariants with respect to Git submodules, as well as client-side scripts to automate common developer tasks) so that the perceived pain of submodules can be reduced enough to make the change-function significantly more favorable for switching to Git submodules (+ folding some repos which are very tightly coupled/entangled to GHC into ghc.git itself, such as testsuite or nofib) I think newcomers like me would dive faster into ghc development if it used git submodules, since it's a familiar and standard tooling for version pinning. Given that thought I'll let the experienced people decide. :)
Cheers, Arash
Cheers, hvr _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 12/10/2013 11:15, Herbert Valerio Riedel wrote:
Hello Simon!
On 2013-10-11 at 12:59:07 +0200, Simon Marlow wrote:
[...]
This is great. With a bit of extra tool support for this we could actually do without submodules and go back to individual repos. Checking out a GHC revision in the past could consist of querying your ghc-complete repo for the fingerprint and then running the fingerprint tool.
(unless you haven't guessed, I'm not a huge fan of submodules)
What problems do you see with Git submodules specifically? I'm asking because I believe that we might be able address most of the concerns you have by tooling (e.g. by server-side hooks to enforce inter-repo invariants with respect to Git submodules, as well as client-side scripts to automate common developer tasks) so that the perceived pain of submodules can be reduced enough to make the change-function significantly more favorable for switching to Git submodules (+ folding some repos which are very tightly coupled/entangled to GHC into ghc.git itself, such as testsuite or nofib)
If we can get the tooling right I would probably have no objections to submodules. Right now it's a bit of a pain though. - extra steps in the workflow for modifying a library (git checkout master, etc.) - possibility of losing local changes by git submodule update, if you have made local changes in your copy of the library. Perhaps what I want here is for "sync-all --rebase" to do "git submodule update --rebase". - "sync-all new" doesn't work any more. I used to use this to see what patches were unpushed in my tree relative to upstream. I don't mind what the command is, but there needs to be an easy way to do this. - Difficulties with having a local mirror of the GHC repos. Does "sync-all -r <root> remove update-url" work with submodules now? Does "./sync-all get" get the submodules from the same place as the GHC repo? Cheers, Simon

Hello Simon, On 2013-10-15 at 14:45:05 +0200, Simon Marlow wrote: [...]
If we can get the tooling right I would probably have no objections to submodules. Right now it's a bit of a pain though.
- extra steps in the workflow for modifying a library (git checkout master, etc.)
Ok, this is the item that requires actual non-trivial scripting to support; I'll need to think about how to best address this issue How many extra steps in the workflow would be tolerable?
- possibility of losing local changes by git submodule update, if you have made local changes in your copy of the library. Perhaps what I want here is for "sync-all --rebase" to do "git submodule update --rebase".
Fwiw, local uncommitted changes are warned about by recent Git versions: ,---- | $ git submodule update | error: Your local changes to the following files would be overwritten by checkout: | prologue.txt | Please, commit your changes or stash them before you can switch branches. | Aborting | Unable to checkout '6ad8c0d27bcff28c80684a29b57d7a8dbf00caca' in submodule path 'libraries/bytestring' `---- As for committed changes, with "git config submodule.<SUBMODULENAME>.update rebase" the --rebase flag can be made the default setting; this could be configured up by 'sync-all' on initial checkouts; so if 'git submodule update --rebase' is what you want, this can be made the default mode easily even when not using 'sync-all'. And finally, even if you happen to seemingly lose a commit due to a 'git submodule update', there's still the 'git reflog' safeguard which keeps a log of all HEAD updates and lets you recover recently "lost" commits.
- "sync-all new" doesn't work any more. I used to use this to see what patches were unpushed in my tree relative to upstream. I don't mind what the command is, but there needs to be an easy way to do this.
For this, Git itself already provides various commands: 1.) git submodule status lists each submodule and reports if the submodule is "clean"; from the man-page: | Each SHA-1 will be prefixed with - if the submodule is not | initialized, + if the currently checked out submodule commit does not | match the SHA-1 found in the index of the containing repository and U | if the submodule has merge conflicts. or 2.) git submodule summary This sounds a bit like what you want: Provides more verbose information w.r.t. to the added/missing commits between the submodule's HEAD and what is referenced by the main-repository (if you already committed the submod-ref update in ghc.git, you could still generate a delta-summary wrt the remote ghc.git by using "git submodule summary origin") and finally 3.) git submodule foreach git status The generic 'git submodule foreach' allows you to iterate over all submodules, and call any command you can think of within the submodule folder; in this case 'git status' In fact, many current 'sync-all' operations which are simple iterations can simply be replaced by an appropriate invocation of 'git submodule foreach'.
- Difficulties with having a local mirror of the GHC repos. Does "sync-all -r <root> remove update-url" work with submodules now? Does "./sync-all get" get the submodules from the same place as the GHC repo?
btw, the command I assume you meant to write is probably sync-all -r <root> remote set-url and there's been a commit some time ago for improving its behaviour on submodules: http://git.haskell.org/ghc.git/commitdiff/0481e076f3cb4010894324cac71e947c66... but it might not be perfect yet (I use the feature myself rather seldom, so I don't know how well it works right now); however it's definitely possible to make this work, and 'git submodule' does support rewriting the urls between 'git submodule init' and 'git submodule update' via per-submodule config variables, as well as after 'git submodule update' has run via 'git remote set-url [--push]' -- so basically, I know what needs to be done for this item in case it doesn't work already the way it should. So in summary, I think the first issue is the one that's a bit more difficult to get right, as that's were the "tracking" semantics of Git submodules become more apparent in the native git tooling. Cheers, hvr

On 15/10/2013 16:29, Herbert Valerio Riedel wrote:
Hello Simon,
On 2013-10-15 at 14:45:05 +0200, Simon Marlow wrote:
[...]
If we can get the tooling right I would probably have no objections to submodules. Right now it's a bit of a pain though.
- extra steps in the workflow for modifying a library (git checkout master, etc.)
Ok, this is the item that requires actual non-trivial scripting to support; I'll need to think about how to best address this issue
How many extra steps in the workflow would be tolerable?
One extra step is inevitable; the step where you commit an update to the submodule hash in the GHC repo. This will sometimes happen at the same time as corresponding change to GHC, which is fine. It would be nice if this was the only extra step.
- possibility of losing local changes by git submodule update, if you have made local changes in your copy of the library. Perhaps what I want here is for "sync-all --rebase" to do "git submodule update --rebase".
Fwiw, local uncommitted changes are warned about by recent Git versions:
Yes, I'm aware of this. It is committed changes that are more problematic.
As for committed changes, with "git config submodule.<SUBMODULENAME>.update rebase" the --rebase flag can be made the default setting; this could be configured up by 'sync-all' on initial checkouts; so if 'git submodule update --rebase' is what you want, this can be made the default mode easily even when not using 'sync-all'.
I think what we want is the following: - "sync-all pull" does a "git submodule update --merge" - "sync-all pull --rebase" does a "git submodule update --rebase" This is close to the behaviour we have for non-submodule repos, so it should be less surprising for people, and crucially if there are any local committed changes in the submodule they will be either merged or rebased, and not just "lost" (yes I know they're in the reflog).
For this, Git itself already provides various commands:
1.) git submodule status [..] 2.) git submodule summary [..] 3.) git submodule foreach git status
Yes, I'm aware there are ways to do this. The point I'm making is that it would be nice to have a single command that shows the unpushed patches in all repos, including the GHC repo itself. This is not a big deal at all. If it is two commands, that's not a disaster, so long as we document the workflow carefully on the wiki.
- Difficulties with having a local mirror of the GHC repos. Does "sync-all -r <root> remove update-url" work with submodules now? Does "./sync-all get" get the submodules from the same place as the GHC repo?
btw, the command I assume you meant to write is probably
sync-all -r <root> remote set-url
and there's been a commit some time ago for improving its behaviour on submodules:
http://git.haskell.org/ghc.git/commitdiff/0481e076f3cb4010894324cac71e947c66...
but it might not be perfect yet (I use the feature myself rather seldom, so I don't know how well it works right now); however it's definitely possible to make this work, and 'git submodule' does support rewriting the urls between 'git submodule init' and 'git submodule update' via per-submodule config variables, as well as after 'git submodule update' has run via 'git remote set-url [--push]' -- so basically, I know what needs to be done for this item in case it doesn't work already the way it should.
Great - I'll test it and create a ticket if it doesn't work. Cheers, Simon
participants (3)
-
Arash Rouhani
-
Herbert Valerio Riedel
-
Simon Marlow