GitLab forks and submodules

Hi *, so what do we do with submodules? If you point someone to a fork of ghc, say: gitlab.haskell.org/foo/ghc and they try to check it out, they will run into issues because foo didn't clone all the submodules. So how is one supposed to clone a forked ghc repository? Cheers, Moritz

Moritz Angermann
Hi *,
so what do we do with submodules? If you point someone to a fork of ghc, say:
gitlab.haskell.org/foo/ghc
Indeed submodules have been a constant thorn in our side. We encounter this same issue during CI jobs on forks. To work around this we have a script (.gitlab-ci/fix-submodules.py) which tweaks the submodule paths to point to gitlab.haskell.org/ghc/ghc. Others are free to use this script locally however it is surely a hack. I suppose we could just try changing the submodule upstream URLs to absolute URLs. This would make the (arguably more common) case of cloning and contributing without submodule changes easier, while making the case of contributing patches with submodule changes more difficult. My usual solution is to just clone from ghc/ghc and then add a separate remote for my fork. Cheers, - Ben

Would it be worth describing this workflow explicitly in our "How to use GitLab for GHC development" page?
S
| -----Original Message-----
| From: ghc-devs

Can’t we have absolute submodule paths? Wouldn’t that elevate the issue? When we all had branches on ghc/ghc this was not an issue. Sent from my iPhone
On 8 Jan 2019, at 5:24 AM, Ben Gamari
wrote: Simon Peyton Jones via ghc-devs
writes: Would it be worth describing this workflow explicitly in our "How to use GitLab for GHC development" page?
Yes, indeed it would. I have asked David, who is currently looking at revising our contributor documentation, to do so.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Moritz Angermann
Can’t we have absolute submodule paths? Wouldn’t that elevate the issue?
Perhaps; I mentioned this possibility in my earlier response. It's not clear which trade-off is better overall, however.
When we all had branches on ghc/ghc this was not an issue.
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc. Cheers, - Ben

> while making the case of contributing patches with submodule changes more
> difficult
I don't understand this, can you give an example of what absolute paths make
harder?
Looking at the wiki pages and scripts we need to make relative paths work for
everyone, I think it's clear that absolute paths would be better because CI
wouldn't need any scripts anymore and users would need no instructions to make
cloning forks work.
Ömer
Ben Gamari
Moritz Angermann
writes: Can’t we have absolute submodule paths? Wouldn’t that elevate the issue?
Perhaps; I mentioned this possibility in my earlier response. It's not clear which trade-off is better overall, however.
When we all had branches on ghc/ghc this was not an issue.
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Ömer Sinan Ağacan
> while making the case of contributing patches with submodule changes more > difficult
I don't understand this, can you give an example of what absolute paths make harder?
Looking at the wiki pages and scripts we need to make relative paths work for everyone, I think it's clear that absolute paths would be better because CI wouldn't need any scripts anymore and users would need no instructions to make cloning forks work.
The problem comes when a user needs to modify a submodule. In this case then need the submodule to pull from their clone until their patch is accepted upstream. There is no single submodule path scheme that works in both this case and the (likely more common case) of a causual contributor wanting to make a change to only the GHC repository. Cheers, - Ben

As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
This is sometimes not ideal as it wastes GHC's CI resources. For example I make
a lot of WIP commits to my work branches, and I don't want to keep CI machines
busy for those.
Ömer
Ben Gamari
Moritz Angermann
writes: Can’t we have absolute submodule paths? Wouldn’t that elevate the issue?
Perhaps; I mentioned this possibility in my earlier response. It's not clear which trade-off is better overall, however.
When we all had branches on ghc/ghc this was not an issue.
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

You can specify `[skip ci]` in the commit message if you don't want to
run the pipeline. When you are done, just amend your commit with the
finalised note.
Gabor
On 1/8/19, Ömer Sinan Ağacan
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
This is sometimes not ideal as it wastes GHC's CI resources. For example I make a lot of WIP commits to my work branches, and I don't want to keep CI machines busy for those.
Ömer
Ben Gamari
, 8 Oca 2019 Sal, 04:53 tarihinde şunu yazdı: Moritz Angermann
writes: Can’t we have absolute submodule paths? Wouldn’t that elevate the issue?
Perhaps; I mentioned this possibility in my earlier response. It's not clear which trade-off is better overall, however.
When we all had branches on ghc/ghc this was not an issue.
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I agree with Omer that we shouldn't encourage people to push wip branches
to ghc/ghc. It wastes resources and pollutes the repo with lots of branches
that will invariably not be deleted.
I would rather we use absolute paths in the submodule file as I have spent
far longer than I expected trying to get git to use the right submodule in
the past when operating on forks.
Matt
On Tue, 8 Jan 2019, 10:09 Gabor Greif You can specify `[skip ci]` in the commit message if you don't want to
run the pipeline. When you are done, just amend your commit with the
finalised note. Gabor On 1/8/19, Ömer Sinan Ağacan As I mention in the documentation, those with commits bits should feel
free to push branches to ghc/ghc. This is sometimes not ideal as it wastes GHC's CI resources. For example
I
make
a lot of WIP commits to my work branches, and I don't want to keep CI
machines
busy for those. Ömer Ben Gamari Moritz Angermann Can’t we have absolute submodule paths? Wouldn’t that elevate the
issue? Perhaps; I mentioned this possibility in my earlier response. It's not
clear which trade-off is better overall, however. When we all had branches on ghc/ghc this
was not an issue. As I mention in the documentation, those with commits bits should feel
free to push branches to ghc/ghc. Cheers, - Ben
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Depending on the patch, the ci feedback may be fundamental. Eg some of the native code gen hackery im doing impacts a whole bunch of configurations I can’t do locally. We could also have a wip/no-ci prefix ? Either way it’s certainlu true that we have finite resources and should endeavor to use them thoughtfully On Tue, Jan 8, 2019 at 5:32 AM Matthew Pickering < matthewtpickering@gmail.com> wrote:
I agree with Omer that we shouldn't encourage people to push wip branches to ghc/ghc. It wastes resources and pollutes the repo with lots of branches that will invariably not be deleted.
I would rather we use absolute paths in the submodule file as I have spent far longer than I expected trying to get git to use the right submodule in the past when operating on forks.
Matt
On Tue, 8 Jan 2019, 10:09 Gabor Greif
You can specify `[skip ci]` in the commit message if you don't want to run the pipeline. When you are done, just amend your commit with the finalised note.
Gabor
On 1/8/19, Ömer Sinan Ağacan
wrote: As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
This is sometimes not ideal as it wastes GHC's CI resources. For example I make a lot of WIP commits to my work branches, and I don't want to keep CI machines busy for those.
Ömer
Ben Gamari
, 8 Oca 2019 Sal, 04:53 tarihinde şunu yazdı: Moritz Angermann
writes: Can’t we have absolute submodule paths? Wouldn’t that elevate the issue?
Perhaps; I mentioned this possibility in my earlier response. It's not clear which trade-off is better overall, however.
When we all had branches on ghc/ghc this was not an issue.
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Alright let me add some example that is really painful with submodules. Say I have a custom ghc fork angerman/ghc, because I really don't want to overload CI with all my stupidity and I *know* I'd forget to mark every commit with [skip ci] or something. Now I need to modify a bunch of submodules as well, say - libraries/bytestring - libraires/unix And next I want to have someone else collaborate on this with me, either for testing or contributing or what not. So I'm going to give them the following commands to run: git clone --recursive https://gitlab.haskell.org/ghc/ghc (cd ghc && git remote add angerman https://gitlab.haskell.org/angerman/ghc) (cd ghc && git fetch --all) (cd ghc/libraries/bytestring && git remote add angerman https://github.com/angerman/bytestring && git fetch --all) (cd ghc/libraries/unix && git remote add angerman https://github.com/angerman/unix && git fetch --all) (cd ghc && git checkout angerman/awesome/sauce) (cd ghc && git submodule update --init --recursive) instead of git clone --recursive https://gitlab.haskell.org/angerman/ghc --branch awesome/sauce Of course that would require me to change the absolute paths for bytestring and unix in my repo. So maybe I only need 5 instead of 7 commands to remember to tell, and type, and ... Cheers, Moritz
On Jan 8, 2019, at 11:16 PM, Carter Schonwald
wrote: Depending on the patch, the ci feedback may be fundamental. Eg some of the native code gen hackery im doing impacts a whole bunch of configurations I can’t do locally.
We could also have a wip/no-ci prefix ?
Either way it’s certainlu true that we have finite resources and should endeavor to use them thoughtfully
On Tue, Jan 8, 2019 at 5:32 AM Matthew Pickering
wrote: I agree with Omer that we shouldn't encourage people to push wip branches to ghc/ghc. It wastes resources and pollutes the repo with lots of branches that will invariably not be deleted. I would rather we use absolute paths in the submodule file as I have spent far longer than I expected trying to get git to use the right submodule in the past when operating on forks.
Matt
On Tue, 8 Jan 2019, 10:09 Gabor Greif
Gabor
On 1/8/19, Ömer Sinan Ağacan
wrote: As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
This is sometimes not ideal as it wastes GHC's CI resources. For example I make a lot of WIP commits to my work branches, and I don't want to keep CI machines busy for those.
Ömer
Ben Gamari
, 8 Oca 2019 Sal, 04:53 tarihinde şunu yazdı: Moritz Angermann
writes: Can’t we have absolute submodule paths? Wouldn’t that elevate the issue?
Perhaps; I mentioned this possibility in my earlier response. It's not clear which trade-off is better overall, however.
When we all had branches on ghc/ghc this was not an issue.
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Moritz Angermann
Alright let me add some example that is really painful with submodules.
Say I have a custom ghc fork angerman/ghc, because I really don't want to overload CI with all my stupidity and I *know* I'd forget to mark every commit with [skip ci] or something.
Now I need to modify a bunch of submodules as well, say - libraries/bytestring - libraires/unix
And next I want to have someone else collaborate on this with me, either for testing or contributing or what not.
So I'm going to give them the following commands to run:
git clone --recursive https://gitlab.haskell.org/ghc/ghc (cd ghc && git remote add angerman https://gitlab.haskell.org/angerman/ghc) (cd ghc && git fetch --all) (cd ghc/libraries/bytestring && git remote add angerman https://github.com/angerman/bytestring && git fetch --all) (cd ghc/libraries/unix && git remote add angerman https://github.com/angerman/unix && git fetch --all) (cd ghc && git checkout angerman/awesome/sauce) (cd ghc && git submodule update --init --recursive)
If you pushed your bytestring and unix changes to your gitlab account then this wouldn't be necessary. The fact that we use relative paths would actually work to your advantage. My current thinking is that the fix-submodules script run by CI should do the following for each submodule: * If the branch has changed the submodule then do nothing (leaving the submodule URL as relative; this ensures that a user can push their submodule changes to their fork of the submodule on GitLab and things will "just work" * If the branch has not changed then rewrite the submodule URL to point to gitlab.haskell.org/ghc/packages/.... This ensures that CI will work for contributors making non-submodule changes in their GHC forks. Cheers, - Ben

Sorry for reviving this thread, but this is causing so much trouble for me. I
want a fresh clone of a GHC fork. If I clone the fork it doesn't work for
reasons mentioned in this thread, however I just realized that it doesn't work
even if I fork gitlab/ghc/ghc and then add the fork as a new remote. Here's what
I do:
- Clone gitlab/ghc/ghc ("origin")
- Add gitlab/fork/ghc ("fork", the fork I want to build)
- git fetch --all
- git checkout fork/branch
- git submodule update --init
For whatever reason git tries to fetch submodules from "fork" instead of
"origin", and I couldn't find any way to tell git to use "origin" for submodule
instead. `git submodule sync` does not fix it. I also tried pulling submodules
before switching to the fork's branch, thinking that maybe if I initialize
submodules with the correct remote when I switch branches it'd fetch them from
there. The only way I could make this work is by replacing all relative URLs
with absolute URLs with this
:%s/\.\./https:\/\/gitlab.haskell.org\/ghc/g
The argument for relative submodules doesn't make sense to me. Is updating a
submodule remote so hard that we want to make it easy at the cost of making lots
of other tasks so much harder? To me it makes sense that if you want to work on
a submodule you need to update its remote to your fork.
Ömer
Ben Gamari
Moritz Angermann
writes: Alright let me add some example that is really painful with submodules.
Say I have a custom ghc fork angerman/ghc, because I really don't want to overload CI with all my stupidity and I *know* I'd forget to mark every commit with [skip ci] or something.
Now I need to modify a bunch of submodules as well, say - libraries/bytestring - libraires/unix
And next I want to have someone else collaborate on this with me, either for testing or contributing or what not.
So I'm going to give them the following commands to run:
git clone --recursive https://gitlab.haskell.org/ghc/ghc (cd ghc && git remote add angerman https://gitlab.haskell.org/angerman/ghc) (cd ghc && git fetch --all) (cd ghc/libraries/bytestring && git remote add angerman https://github.com/angerman/bytestring && git fetch --all) (cd ghc/libraries/unix && git remote add angerman https://github.com/angerman/unix && git fetch --all) (cd ghc && git checkout angerman/awesome/sauce) (cd ghc && git submodule update --init --recursive)
If you pushed your bytestring and unix changes to your gitlab account then this wouldn't be necessary. The fact that we use relative paths would actually work to your advantage.
My current thinking is that the fix-submodules script run by CI should do the following for each submodule:
* If the branch has changed the submodule then do nothing (leaving the submodule URL as relative; this ensures that a user can push their submodule changes to their fork of the submodule on GitLab and things will "just work"
* If the branch has not changed then rewrite the submodule URL to point to gitlab.haskell.org/ghc/packages/.... This ensures that CI will work for contributors making non-submodule changes in their GHC forks.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Ömer Sinan Ağacan
As I mention in the documentation, those with commits bits should feel free to push branches to ghc/ghc.
This is sometimes not ideal as it wastes GHC's CI resources. For example I make a lot of WIP commits to my work branches, and I don't want to keep CI machines busy for those.
This is precisely why we have the two-phase CI configuration. It ensures that obviously-wrong changes waste no more than two Linux builds worth of effort. We currently have plenty of Linux capacity so I'm not terribly worried about this. Cheers, - Ben
participants (8)
-
Ben Gamari
-
Ben Gamari
-
Carter Schonwald
-
Gabor Greif
-
Matthew Pickering
-
Moritz Angermann
-
Simon Peyton Jones
-
Ömer Sinan Ağacan