how to checkout proper submodules

newer
Reproducible build recipes for GHC...

Kazu Yamamoto

5 Jun 2013 5 Jun '13

1:07 a.m.

Hi, Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7. We need to narrow the range of commits, so I did: % git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails. Please tell us how to checkout proper submodules against a specific GHC tree. --Kazu

Show replies by date

Johan Tibell

5 Jun 5 Jun

1:46 a.m.

Unfortunately we don't use submodules for all repos e.g. base. This makes it very hard to accurately check out a previous state and bisect errors unfortunately. On Tue, Jun 4, 2013 at 6:07 PM, Kazu Yamamoto wrote:

...

Hi,

Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

% git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update

But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific GHC tree.

--Kazu

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Nicolas Frisby

1:53 a.m.

Is the way forward then to manually bisect by timestamp? Perhaps there are scripts "out there" to assist with stuck a task. On Jun 4, 2013 8:47 PM, "Johan Tibell" wrote:

...

Unfortunately we don't use submodules for all repos e.g. base. This makes it very hard to accurately check out a previous state and bisect errors unfortunately.

On Tue, Jun 4, 2013 at 6:07 PM, Kazu Yamamoto wrote:

...
Hi,

Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

% git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update

But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific GHC tree.

--Kazu

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Mateusz Kowalczyk

2:01 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/06/13 02:46, Johan Tibell wrote:

...

Unfortunately we don't use submodules for all repos e.g. base. This makes it very hard to accurately check out a previous state and bisect errors unfortunately.

On Tue, Jun 4, 2013 at 6:07 PM, Kazu Yamamoto wrote:

...
Hi,

Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

% git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update

But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific GHC tree.

--Kazu

Is there a reason why some submodules are proper git repos and some aren't? Benefits of having git repos as submodules are hopefully clear so I'm interested why this isn't the case here. - -- Mateusz K. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJRrpwJAAoJEM1mucMq2pqX8fcP/iNexwoV425kxRh5uPH0/Rrc hP0a9li5z4ddzYHjCaZTFc25HxVK6f6FqX05nbfUH8Uc39a71g+A2qntdpQ0JI7S SO5EBH39i/ehCmyUDdM/tcdF4jvdk+1iVmiyXmzsefnC+WC4vlMSEwNnOeWUxNok 79AUw8cC/7yAT88q3Ktvs2hgPKmpQ/90nQnNvLceYgSu19UgGCilmfVn0KuOCtda wBEO32xC61MJdDVrPgQqqo/niW4s67ECF5yEZEvtBKY8sBBtJQhR+nOTtiaBqTl5 q8DHz+6V8djGAZ89xiDjFakGA1E5+VhKkCZhwwvsH3DqzfVn/q9G2IH9pomdxYCy COhefxxN2Fsqe5V5rqBhZEdASJuraPhnD6Wh2cHTHgCrYC39RjgHGdUsZ304ufaN P9CDxBn2uJtPaW5klL8yMvRAjL78myljdozZMmeqZ/Jdwi28iCJ+T8Bg2ZTnwncm J1BRKHdx84AhVqQtJEv2fl6jX7XX3Mh2Iuoe9Vkr2WoO7UaqkJQUE0rhlExHrh9/ NQHKQhDxeinHtc5DRJBFA6n1eKhb1CKm/XPA0k2xQMjTaC6GamwOD1BpKekhHrxk yExUIINGmDBr0PaitTJq85NRFsBzLciCbO2oPVnVVTkCJdnZf0xSuetkrnh1hSgM NAhVIIZikZgPKEnJlP/E =YFFN -----END PGP SIGNATURE-----

Austin Seipp

2:05 a.m.

(Warning: incoming answer, followed by a rant.) Base is not a submodule, meaning that there is essentially no way to automatically check it back out to the "exact same state" it was in, given some specified GHC commit - the commit IDs are not tracked. At this point, you are basically on your own. You'll have to manually checkout libraries/base to a specific commit that occurred 'around' the same time as the GHC commit. In this case, that means looking through whatever commits hit HEAD on May 7th: $ cd libraries/base $ git log --until="May 7th" The resulting list will show you what happened up to may 7th. Take the latest commit in that list, and check out base to that revision. Any commits afterword happened on may 8th or later: $ git checkout -b temporary-io-fix You're going to need to do this for every module that is not tracked as a submodule. Most of the repositories are very low-activity. base & testsuite are going to be the annoying ones. You'll have to continue this 'manual bisection' by hand, with a very hefty dose of frustrating trial-and-error, in my experience. There is a secondary alternative. GHC has a script called 'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to work around this deficiency (very poorly.) This script basically dumps out a text file, containing a key/value pair mapping every repository to its current HEAD commit. It can then take that text file and automatically do 'git checkout' for you in every repo. The idea is you can take fingerprints of the tree, save the results, and cleanly check out to some state later. The GHC build bots run by Ben L.'s "Buildbox" library automatically runs the 'fingerprint.py' script during nightly-builds, from what I remember. It may be possible to just look in the ghc-builds archives, and steal some fingerprints from the last month off one of the buildbots. I don't know who maintains the individual bots; perhaps you can ask the list. However, this will at best give you a 1-day level of granularity, rather than commit level granularity, which is still rather unsatisfying. ------------- Answer over, rant begins. --------------------- I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts? It's terrible. I'm frankly surprised we've even been doing it this long, over a year or more? It is literally the worst of submodules, and free-standing-repositories put together, with none of the advantages of either. Free-standing repos are attractive because they are just there, and you don't have to 'maintain' them (sort of.) Submodules are attractive because they identify the critical points in which your repositories depend on each other. We have neither benefit right now, clearly. In particular, this makes it impossible to use tools like 'git bisect' which is *incredibly* useful for just these exact cases. Hell, you can even make 'git bisect' work almost 100% automatically with a tiny bit of shell scripting. http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-vari... You could just instead have a script that built the compiler, and ran the built compiler on your testcase, after every bisection. Wouldn't it be *great* to have something like that Just Work? A tool like this could potentially boil down Kazu's bug almost automatically for example, with little-to-no frustrating intervention. And even now, looking at the repository listing of what is in libraries/, that are not submodules, I really see no reason why more - or even all - of them cannot be submodules. Is it a workflow issue of some sort? That's what I'm thinking at this point, but I also don't think it could be any worse than it is now. Realistically, very few libraries GHC needs for bootstrapping seem to change that much. unix, integer-simple, haskeline and filepath for example change *extremely* infrequently, but all are free-standing. Why? In the event they were submodules, would anything actually be lost? The maintainer - that is, not GHC HQ - would still 'own' the official repository. They can make changes to it. But if there is a necessity to pull that in for GHC (feature request, bug fix, random thing) it can be done by updating the submodule pointer to the new commit. But this must happen explicitly by a GHC committer. In the event they update the submodule pointer, they should also obviously make sure the build still works. That means we have to update the submodule pointers ourselves if things change. That sucks I guess, but really, aside from base and testsuite, the two most frequently changing repositories, is that *actually* going to cost us a lot of work? And even if it does cost us work, I'll speak for myself: I will gladly pay for that work and do it all myself if it means I can actually bisect and actually roll back my tree to some point to fix things - without needing to prepare for it months in advance using hacks. Like creating thousands of fingerprints, using fingerprint.py every day when people make commits (no, I haven't done this, but it could be done, and I really don't want to do it.) Long-term reproducible builds are, IMO, a must for any project. *Especially* a project of our size. *Especially* a compiler of all things. But as it stands, when you build GHC, you can probably reproduce *today's* results and *today's* bugs. Last month's results? Last years? Finding the difference between those months ago and today? Good luck - you will need it. On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto wrote:

...

Hi,

Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

% git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update

But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific GHC tree.

--Kazu

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin - PGP: 4096R/0x91384671

Johan Tibell

5:12 a.m.

On Tue, Jun 4, 2013 at 7:05 PM, Austin Seipp wrote:

...

I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts? It's terrible. I'm frankly surprised we've even been doing it this long, over a year or more? It is literally the worst of submodules, and free-standing-repositories put together, with none of the advantages of either.

This is my understanding of what happened: we started out with only plain repos. This avoids some of the pitfalls of submodules and we believed it was the least disruptive workflow (when switching form darcs) for the core contributors. Eventually we needed GHC to track upstream releases of libraries (e.g. Cabal) instead of jus tracking HEAD, which it did before. To achieve that, we switched the libraries that GHC just tracks (e.g. Cabal) to submodules. The libraries maintained by GHC HQ (e.g. base) we're still kept as plain repos to avoid disrupting anyones workflow. The latest git release has improved submodules support some so if we now thing the benefits of submodules outweigh the costs we can discuss if we want to change to policy. I don't want to make that decision for other GHC developers that spend much more time on GHC than I (e.g. SPJ). Their productivity is more important than any inconveniences the lack of consistent use of submodules might cause me.

Austin Seipp

6:35 a.m.

I absolutely agree here, FWIW. We should only do this if there is a clear consensus on doing so and everyone doing active development is comfortable with it. And it's entirely possible submodules are inadequate for some reason that I'm not aware of which is a show-stopper. However, the notion of impact-on-contributors cuts both ways. GHC has an extremely small team of hackers as it stands, and we are lucky to have *amazing* contributors like Kazu, Andreas, yourself, Simon & Simon, and numerous others help make GHC what it is. Much of this is volunteer work. But as the Haskell community grows, and we are at a loss of other full-time contributors like Simon Marlow, I think we are beginning to see the strain on GHC and its current contributors. So, it's important to evaluate what we're doing right and wrong. This feedback loop is always present even if seasoned contributors can live with it - but new contributors will definitely be impacted. In this instance, I honestly find it disheartening that the answer to things like "getting older revisions of the source code in HEAD," or techniques like bisection is basically "that doesn't work." The second is unfortunate, but the latter is pretty legitimately worrying. It would be one thing if this was a one-off occurrence of some odd developer-workflow. But I have answered the fundamental question here (submodules vs free-floating clones) a handful of times myself at least, experienced the pain of the decision myself when doing rollbacks, and I'm sure other contributors can say the same. GHC is already a large, industry-strength software project with years of work put behind it. The barrier to entry and contribution is not exactly small, but I think we've all done a good job. I'd love to see more people contributing. But I cannot help but find these discussions a bit sad, where contributors are impaired due to regular/traditional development workflows like rollbacks are rendered useless - due to some odd source control discrepancy that nobody else on the planet seems to suffer from. I guess the short version is basically that that you're absolutely right: the time of Simon, Ian, and other high-profile contributors is *extremely* important. But I'd also rather not have people like Kazu potentially spend hours or even days doing what simple automation can achieve in what is literally a few keystrokes, and not only that - par for the course for other projects. This ultimately impacts the development cycles of *everybody*. And even if Kazu deals with it - what about the next person? On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell wrote:

...

The latest git release has improved submodules support some so if we now thing the benefits of submodules outweigh the costs we can discuss if we want to change to policy. I don't want to make that decision for other GHC developers that spend much more time on GHC than I (e.g. SPJ). Their productivity is more important than any inconveniences the lack of consistent use of submodules might cause me.

-- Regards, Austin - PGP: 4096R/0x91384671

Simon Peyton-Jones

7:09 a.m.

For the avoidance of doubt, I totally support what Austin and Johan are saying: I find the current setup confusing too. I'm totally persuaded of the merits of git bisect etc. I am the opposite of a git power-user (a git weedy-user?). I will be content to do whatever I'm told workflow-wise, provided I am told clearly in words of one syllable. I *very strongly* want to reduce barriers to entry for would-be contributors, and this is clearly a barrier we could lower. Making Kazu, Austin, Johan, etc more productive is massively valuable. There may be some history to how we arrived at this point, but that should not constrain for the future. We can change our workflow. I would want Ian and Simon to be thoroughly on board, but I regard the current setup as totally open to improvement. Please! BTW, Ian has written it up quite carefully here: http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream. Simon | -----Original Message----- | From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] | On Behalf Of Austin Seipp | Sent: 05 June 2013 07:35 | To: Johan Tibell | Cc: ghc-devs@haskell.org | Subject: Re: how to checkout proper submodules | | I absolutely agree here, FWIW. We should only do this if there is a | clear consensus on doing so and everyone doing active development is | comfortable with it. And it's entirely possible submodules are | inadequate for some reason that I'm not aware of which is a | show-stopper. | | However, the notion of impact-on-contributors cuts both ways. GHC has | an extremely small team of hackers as it stands, and we are lucky to | have *amazing* contributors like Kazu, Andreas, yourself, Simon & | Simon, and numerous others help make GHC what it is. Much of this is | volunteer work. But as the Haskell community grows, and we are at a | loss of other full-time contributors like Simon Marlow, I think we are | beginning to see the strain on GHC and its current contributors. So, | it's important to evaluate what we're doing right and wrong. This | feedback loop is always present even if seasoned contributors can live | with it - but new contributors will definitely be impacted. | | In this instance, I honestly find it disheartening that the answer to | things like "getting older revisions of the source code in HEAD," or | techniques like bisection is basically "that doesn't work." The second | is unfortunate, but the latter is pretty legitimately worrying. It | would be one thing if this was a one-off occurrence of some odd | developer-workflow. But I have answered the fundamental question here | (submodules vs free-floating clones) a handful of times myself at | least, experienced the pain of the decision myself when doing | rollbacks, and I'm sure other contributors can say the same. | | GHC is already a large, industry-strength software project with years | of work put behind it. The barrier to entry and contribution is not | exactly small, but I think we've all done a good job. I'd love to see | more people contributing. But I cannot help but find these discussions | a bit sad, where contributors are impaired due to regular/traditional | development workflows like rollbacks are rendered useless - due to | some odd source control discrepancy that nobody else on the planet | seems to suffer from. | | I guess the short version is basically that that you're absolutely | right: the time of Simon, Ian, and other high-profile contributors is | *extremely* important. But I'd also rather not have people like Kazu | potentially spend hours or even days doing what simple automation can | achieve in what is literally a few keystrokes, and not only that - par | for the course for other projects. This ultimately impacts the | development cycles of *everybody*. And even if Kazu deals with it - | what about the next person? | | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell | wrote: | > The latest git release has improved submodules support some so if we now | > thing the benefits of submodules outweigh the costs we can discuss if we | > want to change to policy. I don't want to make that decision for other GHC | > developers that spend much more time on GHC than I (e.g. SPJ). Their | > productivity is more important than any inconveniences the lack of | > consistent use of submodules might cause me. | | | -- | Regards, | Austin - PGP: 4096R/0x91384671 | | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

Manuel M T Chakravarty

8:43 a.m.

I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO. Manuel PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.) Simon Peyton-Jones :

...

For the avoidance of doubt, I totally support what Austin and Johan are saying:

I find the current setup confusing too.

I'm totally persuaded of the merits of git bisect etc.

I am the opposite of a git power-user (a git weedy-user?). I will be content to do whatever I'm told workflow-wise, provided I am told clearly in words of one syllable.

I *very strongly* want to reduce barriers to entry for would-be contributors, and this is clearly a barrier we could lower. Making Kazu, Austin, Johan, etc more productive is massively valuable.

There may be some history to how we arrived at this point, but that should not constrain for the future. We can change our workflow. I would want Ian and Simon to be thoroughly on board, but I regard the current setup as totally open to improvement. Please!

BTW, Ian has written it up quite carefully here: http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.

Simon

| -----Original Message----- | From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] | On Behalf Of Austin Seipp | Sent: 05 June 2013 07:35 | To: Johan Tibell | Cc: ghc-devs@haskell.org | Subject: Re: how to checkout proper submodules | | I absolutely agree here, FWIW. We should only do this if there is a | clear consensus on doing so and everyone doing active development is | comfortable with it. And it's entirely possible submodules are | inadequate for some reason that I'm not aware of which is a | show-stopper. | | However, the notion of impact-on-contributors cuts both ways. GHC has | an extremely small team of hackers as it stands, and we are lucky to | have *amazing* contributors like Kazu, Andreas, yourself, Simon & | Simon, and numerous others help make GHC what it is. Much of this is | volunteer work. But as the Haskell community grows, and we are at a | loss of other full-time contributors like Simon Marlow, I think we are | beginning to see the strain on GHC and its current contributors. So, | it's important to evaluate what we're doing right and wrong. This | feedback loop is always present even if seasoned contributors can live | with it - but new contributors will definitely be impacted. | | In this instance, I honestly find it disheartening that the answer to | things like "getting older revisions of the source code in HEAD," or | techniques like bisection is basically "that doesn't work." The second | is unfortunate, but the latter is pretty legitimately worrying. It | would be one thing if this was a one-off occurrence of some odd | developer-workflow. But I have answered the fundamental question here | (submodules vs free-floating clones) a handful of times myself at | least, experienced the pain of the decision myself when doing | rollbacks, and I'm sure other contributors can say the same. | | GHC is already a large, industry-strength software project with years | of work put behind it. The barrier to entry and contribution is not | exactly small, but I think we've all done a good job. I'd love to see | more people contributing. But I cannot help but find these discussions | a bit sad, where contributors are impaired due to regular/traditional | development workflows like rollbacks are rendered useless - due to | some odd source control discrepancy that nobody else on the planet | seems to suffer from. | | I guess the short version is basically that that you're absolutely | right: the time of Simon, Ian, and other high-profile contributors is | *extremely* important. But I'd also rather not have people like Kazu | potentially spend hours or even days doing what simple automation can | achieve in what is literally a few keystrokes, and not only that - par | for the course for other projects. This ultimately impacts the | development cycles of *everybody*. And even if Kazu deals with it - | what about the next person? | | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell | wrote: | > The latest git release has improved submodules support some so if we now | > thing the benefits of submodules outweigh the costs we can discuss if we | > want to change to policy. I don't want to make that decision for other GHC | > developers that spend much more time on GHC than I (e.g. SPJ). Their | > productivity is more important than any inconveniences the lack of | > consistent use of submodules might cause me. | | | -- | Regards, | Austin - PGP: 4096R/0x91384671 | | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

David Terei

9:10 a.m.

On 5 June 2013 01:43, Manuel M T Chakravarty wrote:

...

I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.

Manuel

PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)

I'd be all for this. We partially use the GitHub infrastructure since trac broke and I changed the emails to point to GitHub instead. I also often do code reviews with other devs on a personal GHC fork on github before merging in. I believe it would also help encourage more contributors (especially for libraries) but others have expressed disagreement with this point of view in the past and I'm not in hold of data. Either way, I'm glad git bisect may soon work. We'll finally be able to use the whole feature set of a version control tool :) (other piece was the move from darcs -> git which gave us a working annotate).

...

...
For the avoidance of doubt, I totally support what Austin and Johan are saying:

I find the current setup confusing too.

I'm totally persuaded of the merits of git bisect etc.

I am the opposite of a git power-user (a git weedy-user?). I will be content to do whatever I'm told workflow-wise, provided I am told clearly in words of one syllable.

I *very strongly* want to reduce barriers to entry for would-be contributors, and this is clearly a barrier we could lower. Making Kazu, Austin, Johan, etc more productive is massively valuable.

There may be some history to how we arrived at this point, but that should not constrain for the future. We can change our workflow. I would want Ian and Simon to be thoroughly on board, but I regard the current setup as totally open to improvement. Please!

BTW, Ian has written it up quite carefully here: http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked

Simon Peyton-Jones : page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.

...
Simon

| -----Original Message----- | From: ghc-devs-bounces@haskell.org [mailto:

ghc-devs-bounces@haskell.org]

...
| On Behalf Of Austin Seipp | Sent: 05 June 2013 07:35 | To: Johan Tibell | Cc: ghc-devs@haskell.org | Subject: Re: how to checkout proper submodules | | I absolutely agree here, FWIW. We should only do this if there is a | clear consensus on doing so and everyone doing active development is | comfortable with it. And it's entirely possible submodules are | inadequate for some reason that I'm not aware of which is a | show-stopper. | | However, the notion of impact-on-contributors cuts both ways. GHC has | an extremely small team of hackers as it stands, and we are lucky to | have *amazing* contributors like Kazu, Andreas, yourself, Simon & | Simon, and numerous others help make GHC what it is. Much of this is | volunteer work. But as the Haskell community grows, and we are at a | loss of other full-time contributors like Simon Marlow, I think we are | beginning to see the strain on GHC and its current contributors. So, | it's important to evaluate what we're doing right and wrong. This | feedback loop is always present even if seasoned contributors can live | with it - but new contributors will definitely be impacted. | | In this instance, I honestly find it disheartening that the answer to | things like "getting older revisions of the source code in HEAD," or | techniques like bisection is basically "that doesn't work." The second | is unfortunate, but the latter is pretty legitimately worrying. It | would be one thing if this was a one-off occurrence of some odd | developer-workflow. But I have answered the fundamental question here | (submodules vs free-floating clones) a handful of times myself at | least, experienced the pain of the decision myself when doing | rollbacks, and I'm sure other contributors can say the same. | | GHC is already a large, industry-strength software project with years | of work put behind it. The barrier to entry and contribution is not | exactly small, but I think we've all done a good job. I'd love to see | more people contributing. But I cannot help but find these discussions | a bit sad, where contributors are impaired due to regular/traditional | development workflows like rollbacks are rendered useless - due to | some odd source control discrepancy that nobody else on the planet | seems to suffer from. | | I guess the short version is basically that that you're absolutely | right: the time of Simon, Ian, and other high-profile contributors is | *extremely* important. But I'd also rather not have people like Kazu | potentially spend hours or even days doing what simple automation can | achieve in what is literally a few keystrokes, and not only that - par | for the course for other projects. This ultimately impacts the | development cycles of *everybody*. And even if Kazu deals with it - | what about the next person? | | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell | wrote: | > The latest git release has improved submodules support some so if we now | > thing the benefits of submodules outweigh the costs we can discuss if we | > want to change to policy. I don't want to make that decision for other GHC | > developers that spend much more time on GHC than I (e.g. SPJ). Their | > productivity is more important than any inconveniences the lack of | > consistent use of submodules might cause me. | | | -- | Regards, | Austin - PGP: 4096R/0x91384671 | | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Erik de Castro Lopo

9:30 a.m.

David Terei wrote:

...

Either way, I'm glad git bisect may soon work.

Having git bisect work on the GHC tree would be a plus! Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

John Lato

9:30 a.m.

On Wed, Jun 5, 2013 at 5:10 PM, David Terei wrote:

...

On 5 June 2013 01:43, Manuel M T Chakravarty wrote:

...
I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.

Manuel

PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)

I'd be all for this. We partially use the GitHub infrastructure since trac broke and I changed the emails to point to GitHub instead. I also often do code reviews with other devs on a personal GHC fork on github before merging in.

I believe it would also help encourage more contributors (especially for libraries) but others have expressed disagreement with this point of view in the past and I'm not in hold of data.

I strongly suspect that fixing the original issue from this thread would do much more to encourage contributions. It certainly doesn't matter to me if ghc is on github or not, but I (as an extremely meager GHC hacker) find it near-impossible to maintain a usable repo if I want to do any sort of branching or checkouts. And while I hate git submodules with a passion, I agree with everyone who thus far has said that the current practice is even less usable (all the drawbacks and none of benefits).

Jan Stolarek

9:56 a.m.

For me the biggest plus of switching to submodules would be keeping GHC and testsuite in sync. If there are any reasons not to change in-tree library repos to submodules, then I would at least want testsuite to be changed to a submodule. I also use github for my daily work on GHC and being able to send patches via Pull Requests would make things easier. On the other hand it might be more difficult to attach files to a ticket (no such feature on Github AFAIK). Speaking of Github, perhaps we should put more stress on github folks to fix this: https://github.com/github/markup/issues/196 ? Jan

Vincent Hanquez

10:13 a.m.

On 06/05/2013 10:10 AM, David Terei wrote:

...

On 5 June 2013 01:43, Manuel M T Chakravarty wrote:

...
I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.

Manuel

PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)

I'd be all for this. We partially use the GitHub infrastructure since trac broke and I changed the emails to point to GitHub instead. I also often do code reviews with other devs on a personal GHC fork on github before merging in.

I believe it would also help encourage more contributors (especially for libraries) but others have expressed disagreement with this point of view in the past and I'm not in hold of data. As a very recent new (try-to-be-)contributor, i'ld like to weight in, in favor of this.

IMHO, having to create a trac account, and submit patches by attachment (with the confusing trac UI) instead of just pushing to some repositories and issuing pull requests is quite suboptimal. I don't think it would scare anyone enough that they wouldn't contribute, but lowering the "entry cost" is always useful. -- Vincent

Manuel M T Chakravarty

10:56 a.m.

David Terei :

...

On 5 June 2013 01:43, Manuel M T Chakravarty wrote: I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.

Manuel

PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)

I'd be all for this. We partially use the GitHub infrastructure since trac broke and I changed the emails to point to GitHub instead. I also often do code reviews with other devs on a personal GHC fork on github before merging in.

I believe it would also help encourage more contributors (especially for libraries) but others have expressed disagreement with this point of view in the past and I'm not in hold of data.

For the compiler, the barriers to contribution are probably elsewhere, but for the libraries, I'm sure, it would lower the barrier to entry. For example, to fix some documentation, I personally would never bother to create a patch file and attach it to some Trac ticket (where I first have to create an account). In contrast, a pull request on GitHub is a matter of a few clicks. Manuel PS: Anybody who doubts this needs to post their GitHub account name, so we can check that they actually ever used GitHub properly ;)

Daniel Vainsencher

9:56 a.m.

Geoffrey Mainland

12:15 p.m.

I don't know much about subtrees, but that might be another possibility? There are a lot of things to recommend moving to github. I do hate (non-empty) merge commits, though, so I'm not a fan of github's pull request mechanism. Geoff On 06/05/2013 09:43 AM, Manuel M T Chakravarty wrote:

...

I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.

Manuel

PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)

Simon Peyton-Jones :

...
For the avoidance of doubt, I totally support what Austin and Johan are saying:

I find the current setup confusing too.

I'm totally persuaded of the merits of git bisect etc.

I am the opposite of a git power-user (a git weedy-user?). I will be content to do whatever I'm told workflow-wise, provided I am told clearly in words of one syllable.

I *very strongly* want to reduce barriers to entry for would-be contributors, and this is clearly a barrier we could lower. Making Kazu, Austin, Johan, etc more productive is massively valuable.

There may be some history to how we arrived at this point, but that should not constrain for the future. We can change our workflow. I would want Ian and Simon to be thoroughly on board, but I regard the current setup as totally open to improvement. Please!

BTW, Ian has written it up quite carefully here: http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.

Simon

| -----Original Message----- | From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] | On Behalf Of Austin Seipp | Sent: 05 June 2013 07:35 | To: Johan Tibell | Cc: ghc-devs@haskell.org | Subject: Re: how to checkout proper submodules | | I absolutely agree here, FWIW. We should only do this if there is a | clear consensus on doing so and everyone doing active development is | comfortable with it. And it's entirely possible submodules are | inadequate for some reason that I'm not aware of which is a | show-stopper. | | However, the notion of impact-on-contributors cuts both ways. GHC has | an extremely small team of hackers as it stands, and we are lucky to | have *amazing* contributors like Kazu, Andreas, yourself, Simon & | Simon, and numerous others help make GHC what it is. Much of this is | volunteer work. But as the Haskell community grows, and we are at a | loss of other full-time contributors like Simon Marlow, I think we are | beginning to see the strain on GHC and its current contributors. So, | it's important to evaluate what we're doing right and wrong. This | feedback loop is always present even if seasoned contributors can live | with it - but new contributors will definitely be impacted. | | In this instance, I honestly find it disheartening that the answer to | things like "getting older revisions of the source code in HEAD," or | techniques like bisection is basically "that doesn't work." The second | is unfortunate, but the latter is pretty legitimately worrying. It | would be one thing if this was a one-off occurrence of some odd | developer-workflow. But I have answered the fundamental question here | (submodules vs free-floating clones) a handful of times myself at | least, experienced the pain of the decision myself when doing | rollbacks, and I'm sure other contributors can say the same. | | GHC is already a large, industry-strength software project with years | of work put behind it. The barrier to entry and contribution is not | exactly small, but I think we've all done a good job. I'd love to see | more people contributing. But I cannot help but find these discussions | a bit sad, where contributors are impaired due to regular/traditional | development workflows like rollbacks are rendered useless - due to | some odd source control discrepancy that nobody else on the planet | seems to suffer from. | | I guess the short version is basically that that you're absolutely | right: the time of Simon, Ian, and other high-profile contributors is | *extremely* important. But I'd also rather not have people like Kazu | potentially spend hours or even days doing what simple automation can | achieve in what is literally a few keystrokes, and not only that - par | for the course for other projects. This ultimately impacts the | development cycles of *everybody*. And even if Kazu deals with it - | what about the next person? | | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell | wrote: | > The latest git release has improved submodules support some so if we now | > thing the benefits of submodules outweigh the costs we can discuss if we | > want to change to policy. I don't want to make that decision for other GHC | > developers that spend much more time on GHC than I (e.g. SPJ). Their | > productivity is more important than any inconveniences the lack of | > consistent use of submodules might cause me. | | | -- | Regards, | Austin - PGP: 4096R/0x91384671

Daniel Trstenjak

1:24 p.m.

Hi Geoffrey,

...

I don't know much about subtrees, but that might be another possibility?

the main point about subtrees is, that you've just one repository and you're merging a directory of this repository with 'git subtree' with some other git repository. subtrees and submodules both try to handle the use case if you want to incorporate a third party repository into your own repository and would like to merge the changes in both directions. I think that subtrees are easier for the developer working on the repository, because there's only one repository, but it's a bit more hassle merging the third party repository. submodules are harder for the developer, because there're multiple repositories, but merging the third party repository might be a bit easier. GHC devs might have other reasons for using submodules, because they want to separate things or they're afraid that the resulting one repository might get too big, but I think that there should be good reasons for using submodules, because a lot of workflows (like branching) are such a hassle with submodules. Greetings, Daniel

Nicolas Trangez

1:27 p.m.

On Wed, 2013-06-05 at 15:24 +0200, Daniel Trstenjak wrote:

...

because a lot of workflows (like branching) are such a hassle with submodules.

As my experience with submodules is positive (though limimted), could you elaborate on the difficulties/hassle here? Thanks, Nicolas

Daniel Trstenjak

1:49 p.m.

Hi Nicolas, On Wed, Jun 05, 2013 at 03:27:09PM +0200, Nicolas Trangez wrote:

...

As my experience with submodules is positive (though limimted), could you elaborate on the difficulties/hassle here?

If you would like to develop some kind of feature which involves changes on multiple repositories/submodules and you would like to do it in a branch, than you have to create a branch in each repository, commit separately in each repository and than merge back each repository into its master branch. Greetings, Daniel

Austin Seipp

2:41 p.m.

I'm back after sleep. A few points: 1) Subtree is - in my opinion - basically not an option. It has a nice workflow from the small amount of time I spent with it. But it's not installed by default with git, it's unclear if it ever will be. Although subtree gives the appearance of a unified repository from my understanding, in practice all developers will probably need to touch multiple repositories for several reasons anyway (like testsuite and base.) That means the third-party merge is pretty much always going to happen for any non-sizeable work, the person who *did* the work will be the one doing it, essentially amounting to basically everyone needing subtree in the long term. I may be wrong about this. If I am please call me out on it. And there may be alternative workflows for patch-submitters to help this. But in general, I'd rather not have to tell GHC developers they probably need a special git build in the long haul. 2) I agree with John Lato. I think the immediate problem of fixing the submodule situation is a core issue, and GitHub can come later. Or at the very least, we should discuss GitHub in its own email thread. That's because while I see the problem of "our current setup is bad" as rather obvious and with a clear mitigation/fix, there *are* some legitimate complaints about GitHub that won't be resolved so easily. We should tackle each separately (remember: we have thousands of existing tickets, wiki pages, historical existing links, etc. All of these are pretty important in a lot of ways. It's not clear what the movement-strategy here is and it is definitely not going to be free, or painless.) This is definitely a more touchy issue, but I can see both sides. 3) Regarding Daniel Trstenjak's complaint: submodules from a workflow perspective may suck a little, but realistically we use their *exact* workflow anyway as it stands. We just don't get any of the benefits: in practice developers will make branches in each affected repo and push them and maintain them concurrently. Eventually they will be merged into master for each respective repository. This process will not change if we move entirely to submodules as you said. Some extra food for thought: 1) We could now delete ./sync-all if this happened. It's almost 1000 lines of code dedicated to managing this stuff. Instead, we merely tell all hackers to clone with 'git clone --init --recursive' and viola! After a git clone, you can immediately start building. That'd be great. 2) One thing this *does* complicate is that currently, some repositories are optional. Submodules effectively make them 100% non-optional. Now, normally, I would say all developers should have every relevant library anyway. In this case however, it is a tad bit annoying. On my ARM machines for example, DPH regularly fails late-in-build due to a bug in the (custom) linker, because dph requires stage2+ghci. But it also takes a long time to build DPH, so in practice I just remove it to save myself that time. Some others do the same. That said, I'm potentially the vast minority here, and I'd be willing to just deal with it in the mean time if we can do this (this is the *exception* and certainly not the rule.) Not that big a deal, and it can also be fixed later. There are probably other things that I can't think of, but I'm sure you can all think of other stuff too. :) On Wed, Jun 5, 2013 at 8:49 AM, Daniel Trstenjak wrote:

...

Hi Nicolas,

On Wed, Jun 05, 2013 at 03:27:09PM +0200, Nicolas Trangez wrote:

...
As my experience with submodules is positive (though limimted), could you elaborate on the difficulties/hassle here?

If you would like to develop some kind of feature which involves changes on multiple repositories/submodules and you would like to do it in a branch, than you have to create a branch in each repository, commit separately in each repository and than merge back each repository into its master branch.

Greetings, Daniel

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin - PGP: 4096R/0x91384671

Jan Stolarek

3:04 p.m.

...

1) We could now delete ./sync-all if this happened. In that case I would vote for replacing sync-all with a script that aids in managing branches in multiple subrepos. I implemented such a script for myself in a very ad hoc way. Having something more robust would be great.

...

2) One thing this *does* complicate is that currently, some repositories are optional. (...) I believe this could be solved by changes in the build system, so that some components can be optional (yes, I also delete DPH to speed up building).

Janek

Daniel Trstenjak

3:20 p.m.

Hi Austin, On Wed, Jun 05, 2013 at 09:41:56AM -0500, Austin Seipp wrote:

...

But it's not installed by default with git, it's unclear if it ever will be.

I think subtree has been part of git since 1.7.x . I have just installed the default git package (git 1.8.1.2) of Ubuntu 13.04 and the subtree command is just there.

...

Although subtree gives the appearance of a unified repository from my understanding, in practice all developers will probably need to touch multiple repositories for several reasons anyway (like testsuite and base.) That means the third-party merge is pretty much always going to happen for any non-sizeable work, the person who *did* the work will be the one doing it, essentially amounting to basically everyone needing subtree in the long term.

Sorry that I'm not aware of the GHC development process, but why are the testsuite and base in separate repositories? submodules are fine for tracking repositories, but if you're all the time changing multiple submodules, than it's a sign that you've a strong dependency between the repositories, so why not just having one repository?

...

2) One thing this *does* complicate is that currently, some repositories are optional. Submodules effectively make them 100% non-optional. Now, normally, I would say all developers should have every relevant library anyway. In this case however, it is a tad bit annoying. On my ARM machines for example, DPH regularly fails late-in-build due to a bug in the (custom) linker, because dph requires stage2+ghci. But it also takes a long time to build DPH, so in practice I just remove it to save myself that time. Some others do the same.

Isn't this more a build system issue, that you're able to specify what should/shouldn't be build, than a repository issue? Greetings, Daniel

Austin Seipp

3:47 p.m.

On Wed, Jun 5, 2013 at 10:20 AM, Daniel Trstenjak wrote:

...

I think subtree has been part of git since 1.7.x .

I have just installed the default git package (git 1.8.1.2) of Ubuntu 13.04 and the subtree command is just there.

It's *part* of mainline git, but it is not installed with git. It's part of git's "contrib" functionality package which requires that your package maintainer be gracious enough to include it and install it by default, which requires extra intervention at build-time. As a counter-example, my 'git' from Ubuntu 12.04 LTS machine has no subtree and there are no existing instances of it in any 'precise' repositories. I'm hesitant to require developers en masse to use it for reasons like this. (Frankly I also don't know how this would work out on windows. Like, I don't know how to get a git-build-with-subtree-for-windows, much less if it works on windows at all.)

...

Sorry that I'm not aware of the GHC development process, but why are the testsuite and base in separate repositories?

Because GHC does not technically 'own' them by the most strict definition. testsuite and base are also useful for other compilers, such as nhc98 (and indeed, nhc uses base itself.) The same can be said of nofib. As a result, there is a separation. Now, in practice everybody working on base is a GHC hacker pretty much, and ditto with testsuite/nofib. Regardless of all that, to change *this* part of the equation is a much, much bigger argument. One I don't intend to wage at the moment.

...

submodules are fine for tracking repositories, but if you're all the time changing multiple submodules, than it's a sign that you've a strong dependency between the repositories, so why not just having one repository?

I would agree. In practice many of the submodules are touched extremely rarely - one change every several months. Sometimes, no changes at all between entire releases spanning a year. testsuite and base are definitely the exception, but they are also what most people spend their time with in terms of hacking (pareto in action; 80% of peoples work, 20% of the code.) But again, to change this is a far larger argument with historical implications, and implications beyond GHC. Malcolm would certainly have input as he maintains nhc. (In the past, from my understanding, nhc etc were more prevalent. But over time we've moved more and more to GHC, and 'cruft' has arguably lingered.) I think folding base and testsuite into GHC 'for good' is a separate discussion entirely.

...

Isn't this more a build system issue, that you're able to specify what should/shouldn't be build, than a repository issue?

Yes. It is not insurmountable, my point is more it's an immediate loss for some small reasons, but really nothing more than a minor annoyance. It's just something to remind people of, should we make the change.

...

Greetings, Daniel

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin - PGP: 4096R/0x91384671

Daniel Trstenjak

4:08 p.m.

Hi Austin, On Wed, Jun 05, 2013 at 10:47:27AM -0500, Austin Seipp wrote:

...

It's *part* of mainline git, but it is not installed with git. It's part of git's "contrib" functionality package which requires that your package maintainer be gracious enough to include it and install it by default, which requires extra intervention at build-time.

As a counter-example, my 'git' from Ubuntu 12.04 LTS machine has no subtree and there are no existing instances of it in any 'precise' repositories. I'm hesitant to require developers en masse to use it for reasons like this.

(Frankly I also don't know how this would work out on windows. Like, I don't know how to get a git-build-with-subtree-for-windows, much less if it works on windows at all.)

git-subtree is just a bash script, so it's more of an installing than a building issue. The windows version of git already has to support bash by using the msys environment.

...

Because GHC does not technically 'own' them by the most strict definition. testsuite and base are also useful for other compilers, such as nhc98 (and indeed, nhc uses base itself.) The same can be said of nofib. As a result, there is a separation.

Just to make it clear, testsuite and base could still be contained in separate repositories, regardless if GHC would use git-subtree or not. It's just the question how these repositories are incorporated into GHC. Greetings, Daniel

Jan Stolarek

6:02 p.m.

I think that testsuite should be included in the main GHC repo. I don't recall any other project that has its tests placed in a separate repository. The nhc argument doesn't convince me - after all, most test that are added nowadays are GHC specific. Janek

Ryan Newton

22 Aug 22 Aug

7:04 p.m.

Hi all, I just reread this thread again. Is this one of these situations where *almost everyone agrees, but the fix just didn't happen*? In particular, there is still no formal relationship between versions of the compiler and versions of the testsuite that tests it -- that seems odd! Can we please make *testsuite at least *a sub-module? If we count this long email thread as rough consensus, is it just waiting on someone of sufficient authority typing a "git submodule add" command (and tweaking sync-all accordingly)? Also, Jan's suggestion sounded good -- that once all child repos are git submodules then sync-all can be replaced with something that helps out with git submodule branching, as it helps out with multi-repo branching now (a little bit). Best, -Ryan On Wed, Jun 5, 2013 at 2:02 PM, Jan Stolarek wrote:

...

I think that testsuite should be included in the main GHC repo. I don't recall any other project that has its tests placed in a separate repository. The nhc argument doesn't convince me - after all, most test that are added nowadays are GHC specific.

Janek

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Austin Seipp

7:31 p.m.

Simon and I discussed this a little today. I think there are several legitimate points made throughout the threads here, but the problem is clear: consistent builds are difficult, if not legitimately impossible. That's a very big problem. Right now, it is far too late into release cycle to do anything drastic I'm afraid. Once we branch, we can feasibly start making good changes in this direction. One problem however is that we don't even have a clear writeup over what all the relevant points are (aside from this + all the ranting I did elsewhere, which is loosely in my head still.) Earlier today, I preemptively created this page, but have not jotted down any of my notes: http://ghc.haskell.org/trac/ghc/wiki/GitSubmoduleProblem For a short recap, here is what I think: 1) Several repositories should really just become part of GHC's repository. I'd argue that includes testsuite, nofib, and several others (integer-gmp/integer-simple, hpc, etc.) They don't need to be submodules and making them so is unnecessary complexity, when they can realistically never be used with anything else. This cuts down on something like 10 repositories, IIRC. 2) Several more should become submodules, where 'more' = the libraries under the new Core Libraries Committee. They will be taking over several of the other free floating repositories that are not currently submodules. We no longer will 'own' them, as it is. 3) 'base' and 'ghc-prim' are up for more debate it seems. Roman wants them in particular for haskell-suite, but really he only wants a repository to work with from what I remember. I'm not sure what to do here. Making them a submodule is realistic, but I'm honestly a little afraid of submodules for a package which is so highly traffic'd by developers (another reason I don't want e.g. testsuite as a submodule, either.) The first two points alone should help a lot in making builds more reliable and reproducible, but it will require changes in the development workflow. In particular, it's much easier to lose work with submodules - especially for those among us who are not Git masters. So we should take the time to clearly explain all of this. But 1 & 2 should cover a large part the current setup, and most repos are very low traffic. Also, I'd like to take the time to have a discussion with Edward Kmett (who I have CC'd) about point 2 to make sure we're on the same page here. But I haven't done this yet. Point 3 seems to really be the most contentious, since a few other things come with it. Should we give up on 'base' being usable by other compilers? Historically that's why it's separate. But really it's easy to write code against 'base' that will never work with another compiler anyway. But maybe that can be fixed. And will the base split - also slated for post 7.8 - also change the ownership of significant parts of the library, based on how it is implemented? There were several things floating around this. Regardless of point 3 and all that, something should and will be done soon. I'll put this up on the wiki later when I have time. We just need a directly spelled out plan of attack. On Thu, Aug 22, 2013 at 2:04 PM, Ryan Newton wrote:

...

Hi all,

I just reread this thread again. Is this one of these situations where *almost everyone agrees, but the fix just didn't happen*?

In particular, there is still no formal relationship between versions of the compiler and versions of the testsuite that tests it -- that seems odd! Can we please make *testsuite at least *a sub-module? If we count this long email thread as rough consensus, is it just waiting on someone of sufficient authority typing a "git submodule add" command (and tweaking sync-all accordingly)?

Also, Jan's suggestion sounded good -- that once all child repos are git submodules then sync-all can be replaced with something that helps out with git submodule branching, as it helps out with multi-repo branching now (a little bit).

Best, -Ryan

On Wed, Jun 5, 2013 at 2:02 PM, Jan Stolarek wrote:

...
I think that testsuite should be included in the main GHC repo. I don't recall any other project that has its tests placed in a separate repository. The nhc argument doesn't convince me - after all, most test that are added nowadays are GHC specific.

Janek

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin - PGP: 4096R/0x91384671

Simon Peyton-Jones

8:14 p.m.

There was a long discussion about this a couple of months ago. It did not reach a conclusion, but it is merely parked, not abandoned. I hope that you can all pick it up again after the release. Simon From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Austin Seipp Sent: 22 August 2013 20:31 To: Ryan Newton Cc: ghc-devs@haskell.org; Edward Kmett Subject: Re: how to checkout proper submodules Simon and I discussed this a little today. I think there are several legitimate points made throughout the threads here, but the problem is clear: consistent builds are difficult, if not legitimately impossible. That's a very big problem. Right now, it is far too late into release cycle to do anything drastic I'm afraid. Once we branch, we can feasibly start making good changes in this direction. One problem however is that we don't even have a clear writeup over what all the relevant points are (aside from this + all the ranting I did elsewhere, which is loosely in my head still.) Earlier today, I preemptively created this page, but have not jotted down any of my notes: http://ghc.haskell.org/trac/ghc/wiki/GitSubmoduleProblem For a short recap, here is what I think: 1) Several repositories should really just become part of GHC's repository. I'd argue that includes testsuite, nofib, and several others (integer-gmp/integer-simple, hpc, etc.) They don't need to be submodules and making them so is unnecessary complexity, when they can realistically never be used with anything else. This cuts down on something like 10 repositories, IIRC. 2) Several more should become submodules, where 'more' = the libraries under the new Core Libraries Committee. They will be taking over several of the other free floating repositories that are not currently submodules. We no longer will 'own' them, as it is. 3) 'base' and 'ghc-prim' are up for more debate it seems. Roman wants them in particular for haskell-suite, but really he only wants a repository to work with from what I remember. I'm not sure what to do here. Making them a submodule is realistic, but I'm honestly a little afraid of submodules for a package which is so highly traffic'd by developers (another reason I don't want e.g. testsuite as a submodule, either.) The first two points alone should help a lot in making builds more reliable and reproducible, but it will require changes in the development workflow. In particular, it's much easier to lose work with submodules - especially for those among us who are not Git masters. So we should take the time to clearly explain all of this. But 1 & 2 should cover a large part the current setup, and most repos are very low traffic. Also, I'd like to take the time to have a discussion with Edward Kmett (who I have CC'd) about point 2 to make sure we're on the same page here. But I haven't done this yet. Point 3 seems to really be the most contentious, since a few other things come with it. Should we give up on 'base' being usable by other compilers? Historically that's why it's separate. But really it's easy to write code against 'base' that will never work with another compiler anyway. But maybe that can be fixed. And will the base split - also slated for post 7.8 - also change the ownership of significant parts of the library, based on how it is implemented? There were several things floating around this. Regardless of point 3 and all that, something should and will be done soon. I'll put this up on the wiki later when I have time. We just need a directly spelled out plan of attack. On Thu, Aug 22, 2013 at 2:04 PM, Ryan Newton mailto:rrnewton@gmail.com> wrote: Hi all, I just reread this thread again. Is this one of these situations where almost everyone agrees, but the fix just didn't happen? In particular, there is still no formal relationship between versions of the compiler and versions of the testsuite that tests it -- that seems odd! Can we please make testsuite at least a sub-module? If we count this long email thread as rough consensus, is it just waiting on someone of sufficient authority typing a "git submodule add" command (and tweaking sync-all accordingly)? Also, Jan's suggestion sounded good -- that once all child repos are git submodules then sync-all can be replaced with something that helps out with git submodule branching, as it helps out with multi-repo branching now (a little bit). Best, -Ryan On Wed, Jun 5, 2013 at 2:02 PM, Jan Stolarek mailto:jan.stolarek@p.lodz.pl> wrote: I think that testsuite should be included in the main GHC repo. I don't recall any other project that has its tests placed in a separate repository. The nhc argument doesn't convince me - after all, most test that are added nowadays are GHC specific. Janek _______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs -- Regards, Austin - PGP: 4096R/0x91384671

Ryan Newton

9:02 p.m.

Ok, resuming after release makes sense. Regarding whether it reached a conclusion: What struck me about this particular discussion was the *lack* of disagreement (relative to say, the records debate). It seemed like no one was arguing for the status quo and just about everyone agreed that moving to all-submodules is better than the current mix. Still, one could argue that making an improvement is premature if (1) there is significant transition cost to make the change, or (2) it puts you on some kind of local optima that makes it harder to get to a higher peak. Yet in the case of all-submodules vs. ugly-mix, the transition cost is very low, and it doesn't preclude any future improvements. (For example, it is completely reasonable to later decide to copy certain modules into the tree rather than using submodules.) But maybe I'm under-estimating the severity of the anti-submodule grumbling... that is, I may not not be accurately distinguishing the "submodules have their annoyances but they are the lesser evil" opinion from "I will adamantly oppose adding any more submodules". Best, -Ryan On Thu, Aug 22, 2013 at 4:14 PM, Simon Peyton-Jones wrote:

...

There was a long discussion about this a couple of months ago. It did not reach a conclusion, but it is merely parked, not abandoned. I hope that you can all pick it up again after the release.****

** **

Simon****

** **

*From:* ghc-devs [mailto:ghc-devs-bounces@haskell.org] *On Behalf Of *Austin Seipp *Sent:* 22 August 2013 20:31 *To:* Ryan Newton *Cc:* ghc-devs@haskell.org; Edward Kmett *Subject:* Re: how to checkout proper submodules****

** **

Simon and I discussed this a little today. I think there are several legitimate points made throughout the threads here, but the problem is clear: consistent builds are difficult, if not legitimately impossible. That's a very big problem.****

** **

Right now, it is far too late into release cycle to do anything drastic I'm afraid. Once we branch, we can feasibly start making good changes in this direction. One problem however is that we don't even have a clear writeup over what all the relevant points are (aside from this + all the ranting I did elsewhere, which is loosely in my head still.) Earlier today, I preemptively created this page, but have not jotted down any of my notes: http://ghc.haskell.org/trac/ghc/wiki/GitSubmoduleProblem****

** **

For a short recap, here is what I think:****

** **

1) Several repositories should really just become part of GHC's repository. I'd argue that includes testsuite, nofib, and several others (integer-gmp/integer-simple, hpc, etc.) They don't need to be submodules and making them so is unnecessary complexity, when they can realistically never be used with anything else. This cuts down on something like 10 repositories, IIRC.****

** **

2) Several more should become submodules, where 'more' = the libraries under the new Core Libraries Committee. They will be taking over several of the other free floating repositories that are not currently submodules. We no longer will 'own' them, as it is.****

** **

3) 'base' and 'ghc-prim' are up for more debate it seems. Roman wants them in particular for haskell-suite, but really he only wants a repository to work with from what I remember. I'm not sure what to do here. Making them a submodule is realistic, but I'm honestly a little afraid of submodules for a package which is so highly traffic'd by developers (another reason I don't want e.g. testsuite as a submodule, either.)****

** **

The first two points alone should help a lot in making builds more reliable and reproducible, but it will require changes in the development workflow. In particular, it's much easier to lose work with submodules - especially for those among us who are not Git masters. So we should take the time to clearly explain all of this. But 1 & 2 should cover a large part the current setup, and most repos are very low traffic. Also, I'd like to take the time to have a discussion with Edward Kmett (who I have CC'd) about point 2 to make sure we're on the same page here. But I haven't done this yet.****

** **

Point 3 seems to really be the most contentious, since a few other things come with it. Should we give up on 'base' being usable by other compilers? Historically that's why it's separate. But really it's easy to write code against 'base' that will never work with another compiler anyway. But maybe that can be fixed. And will the base split - also slated for post 7.8 - also change the ownership of significant parts of the library, based on how it is implemented? There were several things floating around this.****

** **

Regardless of point 3 and all that, something should and will be done soon. I'll put this up on the wiki later when I have time. We just need a directly spelled out plan of attack.****

** **

** **

On Thu, Aug 22, 2013 at 2:04 PM, Ryan Newton wrote:** **

Hi all,****

** **

I just reread this thread again. Is this one of these situations where *almost everyone agrees, but the fix just didn't happen*?****

** **

In particular, there is still no formal relationship between versions of the compiler and versions of the testsuite that tests it -- that seems odd! Can we please make *testsuite at least *a sub-module? If we count this long email thread as rough consensus, is it just waiting on someone of sufficient authority typing a "git submodule add" command (and tweaking sync-all accordingly)?****

** **

Also, Jan's suggestion sounded good -- that once all child repos are git submodules then sync-all can be replaced with something that helps out with git submodule branching, as it helps out with multi-repo branching now (a little bit).****

** **

Best,****

-Ryan****

** **

** **

** **

** **

On Wed, Jun 5, 2013 at 2:02 PM, Jan Stolarek wrote:****

I think that testsuite should be included in the main GHC repo. I don't recall any other project that has its tests placed in a separate repository. The nhc argument doesn't convince me - after all, most test that are added nowadays are GHC specific.

Janek****

** **

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs****

** **

****

** **

-- ****

Regards, Austin - PGP: 4096R/0x91384671****

Malcolm Wallace

8 Jun 8 Jun

11 a.m.

On 5 Jun 2013, at 16:47, Austin Seipp wrote:

...

testsuite and base are also useful for other compilers, such as nhc98 (and indeed, nhc uses base itself.)

Useful, perhaps, but not actually used in practice. Since the base library repo moved from darcs to git, I think that ghc is the only compiler that uses it. (Maybe the jhc, uhc, or Helium people could refute that though.) For a long, long time, the close coupling between ghc and the base library has been obvious. I have long since given up trying to pretend that base is portable - it is not. It is ghc-specific. I don't think it should be. That is a crazy architecture. But it is the way it is. Maybe it is time for everyone else to stop pretending too. Regards, Malcolm

Nicolas Trangez

6 Jun 6 Jun

7:59 a.m.

Daniel, On Wed, 2013-06-05 at 15:49 +0200, Daniel Trstenjak wrote:

...

Hi Nicolas,

On Wed, Jun 05, 2013 at 03:27:09PM +0200, Nicolas Trangez wrote:

...
As my experience with submodules is positive (though limimted), could you elaborate on the difficulties/hassle here?

If you would like to develop some kind of feature which involves changes on multiple repositories/submodules and you would like to do it in a branch, than you have to create a branch in each repository, commit separately in each repository and than merge back each repository into its master branch.

Right, thanks for the explanation. This might indeed be somewhat inconvenient. On the other hand, the current situation (with sync-all etc) doesn't seem very different from a workflow perspective, except for being unable to easily run bisect :-) Nicolas

Kazu Yamamoto

1:42 a.m.

...

There are a lot of things to recommend moving to github. I do hate (non-empty) merge commits, though, so I'm not a fan of github's pull request mechanism.

Please read "A successful Git branching model" to know why fast-forward is not used recently. Git flow: http://nvie.com/posts/a-successful-git-branching-model/ Another relating article is here: Github flow: http://scottchacon.com/2011/08/31/github-flow.html --Kazu

Geoffrey Mainland

7:35 a.m.

On 06/06/2013 02:42 AM, Kazu Yamamoto (山本和彦) wrote:

...

...
There are a lot of things to recommend moving to github. I do hate (non-empty) merge commits, though, so I'm not a fan of github's pull request mechanism.

Please read "A successful Git branching model" to know why fast-forward is not used recently.

Git flow: http://nvie.com/posts/a-successful-git-branching-model/

Another relating article is here:

Github flow: http://scottchacon.com/2011/08/31/github-flow.html

--Kazu

I have read both of these before. GHC does not use the git flow model advocated in those two articles. The choice to rebase private feature branches is orthogonal to the choice to use --no-ff when merging feature branches. I am of the opinion that major feature branches should be rebased *and* that they should then be merged with --no-ff. However, GHC's history is a mess. A small fix does not require a feature branch, and yet GHC's history has many, many small changes that have all been merged instead of rebased. The postings you cite don't really take a stand on rebasing private feature branches. Here are a few that do. Git merge vs. rebase http://mislav.uniqpath.com/2013/02/merge-vs-rebase/ A Rebase Workflow for Git http://randyfay.com/content/rebase-workflow-git I realize that this is a religious issue. However, perhaps it is less controversial for me to claim that the GHC history is a mess. Can we easily do something about that by making a minimally intrusive change to our workflow? Geoff

Kazu Yamamoto

10 Jun 10 Jun

7:53 a.m.

Hi Geoffrey,

...

I am of the opinion that major feature branches should be rebased *and* that they should then be merged with --no-ff.

I totally agree with you. :-) --Kazu

Daniel Trstenjak

6 Jun 6 Jun

8:59 a.m.

Hi Kazu, On Thu, Jun 06, 2013 at 10:42:03AM +0900, Kazu Yamamoto wrote:

...

Please read "A successful Git branching model" to know why fast-forward is not used recently.

I think you've to differentiate the case of merging a feature branch into the master branch and the case of merging a local with a remote branch, like just calling git pull/push on the master branch. A fast-forward in the case of merging a feature branch is loosing information, because you can't see anymore which commits have been involved in developing a feature. In the second case, merging a local with a remote branch, you gain no information by the merge commits, but just mess up your history. Therefore I'm using 'git pull --rebase' to prevent the creation of these merge commits. Greetings, Daniel

Kazu Yamamoto

10 Jun 10 Jun

7:59 a.m.

Hi,

...

I think you've to differentiate the case of merging a feature branch into the master branch and the case of merging a local with a remote branch, like just calling git pull/push on the master branch.

I just wanted to say that first forward merge loses information about which sequence of commits was a topic branch. As far as I'm concerned, I rebase my topic branch by myself before I send a pull request.

...

Therefore I'm using 'git pull --rebase' to prevent the creation of these merge commits.

I think this is a good practice for puller side. :-) --Kazu

Niklas Larsson

5 Jun 5 Jun

10:32 a.m.

When I was fiddling with having to rollback everything to a known good state I patched sync-all to checkout all the repos to the state they were in on a certain date, it's pretty naive, but it should be usable for doing manual bisecting at least. I can't find the old mailing list archives, so I attach the patch here. Niklas 2013/6/5 Austin Seipp

...

(Warning: incoming answer, followed by a rant.)

Base is not a submodule, meaning that there is essentially no way to automatically check it back out to the "exact same state" it was in, given some specified GHC commit - the commit IDs are not tracked.

At this point, you are basically on your own. You'll have to manually checkout libraries/base to a specific commit that occurred 'around' the same time as the GHC commit. In this case, that means looking through whatever commits hit HEAD on May 7th:

$ cd libraries/base $ git log --until="May 7th"

The resulting list will show you what happened up to may 7th. Take the latest commit in that list, and check out base to that revision. Any commits afterword happened on may 8th or later:

$ git checkout -b temporary-io-fix

You're going to need to do this for every module that is not tracked as a submodule. Most of the repositories are very low-activity. base & testsuite are going to be the annoying ones.

You'll have to continue this 'manual bisection' by hand, with a very hefty dose of frustrating trial-and-error, in my experience.

There is a secondary alternative. GHC has a script called 'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to work around this deficiency (very poorly.) This script basically dumps out a text file, containing a key/value pair mapping every repository to its current HEAD commit. It can then take that text file and automatically do 'git checkout' for you in every repo. The idea is you can take fingerprints of the tree, save the results, and cleanly check out to some state later.

The GHC build bots run by Ben L.'s "Buildbox" library automatically runs the 'fingerprint.py' script during nightly-builds, from what I remember. It may be possible to just look in the ghc-builds archives, and steal some fingerprints from the last month off one of the buildbots. I don't know who maintains the individual bots; perhaps you can ask the list. However, this will at best give you a 1-day level of granularity, rather than commit level granularity, which is still rather unsatisfying.

------------- Answer over, rant begins. ---------------------

I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts? It's terrible. I'm frankly surprised we've even been doing it this long, over a year or more? It is literally the worst of submodules, and free-standing-repositories put together, with none of the advantages of either.

Free-standing repos are attractive because they are just there, and you don't have to 'maintain' them (sort of.) Submodules are attractive because they identify the critical points in which your repositories depend on each other. We have neither benefit right now, clearly.

In particular, this makes it impossible to use tools like 'git bisect' which is *incredibly* useful for just these exact cases. Hell, you can even make 'git bisect' work almost 100% automatically with a tiny bit of shell scripting.

http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-vari...

You could just instead have a script that built the compiler, and ran the built compiler on your testcase, after every bisection. Wouldn't it be *great* to have something like that Just Work? A tool like this could potentially boil down Kazu's bug almost automatically for example, with little-to-no frustrating intervention.

And even now, looking at the repository listing of what is in libraries/, that are not submodules, I really see no reason why more - or even all - of them cannot be submodules. Is it a workflow issue of some sort? That's what I'm thinking at this point, but I also don't think it could be any worse than it is now.

Realistically, very few libraries GHC needs for bootstrapping seem to change that much. unix, integer-simple, haskeline and filepath for example change *extremely* infrequently, but all are free-standing. Why? In the event they were submodules, would anything actually be lost?

The maintainer - that is, not GHC HQ - would still 'own' the official repository. They can make changes to it. But if there is a necessity to pull that in for GHC (feature request, bug fix, random thing) it can be done by updating the submodule pointer to the new commit. But this must happen explicitly by a GHC committer. In the event they update the submodule pointer, they should also obviously make sure the build still works.

That means we have to update the submodule pointers ourselves if things change. That sucks I guess, but really, aside from base and testsuite, the two most frequently changing repositories, is that *actually* going to cost us a lot of work?

And even if it does cost us work, I'll speak for myself: I will gladly pay for that work and do it all myself if it means I can actually bisect and actually roll back my tree to some point to fix things - without needing to prepare for it months in advance using hacks. Like creating thousands of fingerprints, using fingerprint.py every day when people make commits (no, I haven't done this, but it could be done, and I really don't want to do it.)

Long-term reproducible builds are, IMO, a must for any project. *Especially* a project of our size. *Especially* a compiler of all things. But as it stands, when you build GHC, you can probably reproduce *today's* results and *today's* bugs. Last month's results? Last years? Finding the difference between those months ago and today? Good luck - you will need it.

On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto wrote:

...
Hi,

Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

% git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update

But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific GHC tree.

--Kazu

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin - PGP: 4096R/0x91384671

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Geoffrey Mainland

12:11 p.m.

I very much support moving to all-submodules. In fact, I argued for all-submodules when we made the half-submodules transition last year. Being able to easily check out a consistent and complete source code tree in a repeatable way is extremely important. Checking out by date "works" if you have dated history in your git reflog. For example, see: http://stackoverflow.com/questions/6990484/git-checkout-by-date In general, git commits are *not* time ordered, so asking for the version at a particular time is not well-defined across different working repositories. The GHC HQ buildbots dump fingerprints in a form that is usable directly with fingerprint.py. You can get these fingerprints from the ghc-builds@ archive. Unfortunately there was a large gap after MSR moved buildings where our builds did not run, but things are more or less working now. I believe Ben's buildbot package dumps fingerprints in a form that needs to be massaged before fingerprints.py can deal with it. Geoff On 06/05/2013 11:32 AM, Niklas Larsson wrote:

...

When I was fiddling with having to rollback everything to a known good state I patched sync-all to checkout all the repos to the state they were in on a certain date, it's pretty naive, but it should be usable for doing manual bisecting at least. I can't find the old mailing list archives, so I attach the patch here.

Niklas

2013/6/5 Austin Seipp

(Warning: incoming answer, followed by a rant.)

Base is not a submodule, meaning that there is essentially no way to automatically check it back out to the "exact same state" it was in, given some specified GHC commit - the commit IDs are not tracked.

At this point, you are basically on your own. You'll have to manually checkout libraries/base to a specific commit that occurred 'around' the same time as the GHC commit. In this case, that means looking through whatever commits hit HEAD on May 7th:

$ cd libraries/base $ git log --until="May 7th"

The resulting list will show you what happened up to may 7th. Take the latest commit in that list, and check out base to that revision. Any commits afterword happened on may 8th or later:

$ git checkout -b temporary-io-fix

You're going to need to do this for every module that is not tracked as a submodule. Most of the repositories are very low-activity. base & testsuite are going to be the annoying ones.

You'll have to continue this 'manual bisection' by hand, with a very hefty dose of frustrating trial-and-error, in my experience.

There is a secondary alternative. GHC has a script called 'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to work around this deficiency (very poorly.) This script basically dumps out a text file, containing a key/value pair mapping every repository to its current HEAD commit. It can then take that text file and automatically do 'git checkout' for you in every repo. The idea is you can take fingerprints of the tree, save the results, and cleanly check out to some state later.

The GHC build bots run by Ben L.'s "Buildbox" library automatically runs the 'fingerprint.py' script during nightly-builds, from what I remember. It may be possible to just look in the ghc-builds archives, and steal some fingerprints from the last month off one of the buildbots. I don't know who maintains the individual bots; perhaps you can ask the list. However, this will at best give you a 1-day level of granularity, rather than commit level granularity, which is still rather unsatisfying.

------------- Answer over, rant begins. ---------------------

I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts? It's terrible. I'm frankly surprised we've even been doing it this long, over a year or more? It is literally the worst of submodules, and free-standing-repositories put together, with none of the advantages of either.

Free-standing repos are attractive because they are just there, and you don't have to 'maintain' them (sort of.) Submodules are attractive because they identify the critical points in which your repositories depend on each other. We have neither benefit right now, clearly.

In particular, this makes it impossible to use tools like 'git bisect' which is *incredibly* useful for just these exact cases. Hell, you can even make 'git bisect' work almost 100% automatically with a tiny bit of shell scripting.

http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-vari...

...

You could just instead have a script that built the compiler, and ran the built compiler on your testcase, after every bisection. Wouldn't it be *great* to have something like that Just Work? A tool like this could potentially boil down Kazu's bug almost automatically for example, with little-to-no frustrating intervention.

And even now, looking at the repository listing of what is in libraries/, that are not submodules, I really see no reason why more - or even all - of them cannot be submodules. Is it a workflow issue of some sort? That's what I'm thinking at this point, but I also don't think it could be any worse than it is now.

Realistically, very few libraries GHC needs for bootstrapping seem to change that much. unix, integer-simple, haskeline and filepath for example change *extremely* infrequently, but all are free-standing. Why? In the event they were submodules, would anything actually be lost?

The maintainer - that is, not GHC HQ - would still 'own' the official repository. They can make changes to it. But if there is a necessity to pull that in for GHC (feature request, bug fix, random thing) it can be done by updating the submodule pointer to the new commit. But this must happen explicitly by a GHC committer. In the event they update the submodule pointer, they should also obviously make sure the build still works.

That means we have to update the submodule pointers ourselves if things change. That sucks I guess, but really, aside from base and testsuite, the two most frequently changing repositories, is that *actually* going to cost us a lot of work?

And even if it does cost us work, I'll speak for myself: I will gladly pay for that work and do it all myself if it means I can actually bisect and actually roll back my tree to some point to fix things - without needing to prepare for it months in advance using hacks. Like creating thousands of fingerprints, using fingerprint.py every day when people make commits (no, I haven't done this, but it could be done, and I really don't want to do it.)

Long-term reproducible builds are, IMO, a must for any project. *Especially* a project of our size. *Especially* a compiler of all things. But as it stands, when you build GHC, you can probably reproduce *today's* results and *today's* bugs. Last month's results? Last years? Finding the difference between those months ago and today? Good luck - you will need it.

On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto wrote: > Hi, > > Andreas and I found that the new IO manager is not working

properly in

...

> the current GHC head. I'm sure that it worked well at least on

May 7.

...

> > We need to narrow the range of commits, so I did: > > % git checkout bb2795db36b36966697c228315ae20767c4a8753 > % git submodule update > > But this does not checkout proper submodules. For instance, > libraries/base has newer commits. And of cource, building fails. > > Please tell us how to checkout proper submodules against a specific > GHC tree. > > --Kazu

-- Regards, Austin - PGP: 4096R/0x91384671

Ian Lynagh

3:59 p.m.

On Tue, Jun 04, 2013 at 09:05:58PM -0500, Austin Seipp wrote:

...

I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts?

Submodules are very handy for libraries that someone else maintains: We can make a local change to the library when we need something fixed, and then, when upstream has a fix too, we can jump straight to their fix without having to do any merging. However, submodules have various disadvantages, e.g. http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt... The main one for me is that it's fairly easy to lose local changes when using submodules. This is relatively unimportant for the libraries that someone else maintains, as we don't often make any local changes to lose. Even so, I've lost changes on a couple of occasions. So the reason we entered this state is that we didn't think the advantages outweighed the disadvantages for the other repositories. Thanks Ian -- Ian Lynagh, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Simon Marlow

6 Jun 6 Jun

8:44 p.m.

On 05/06/13 16:59, Ian Lynagh wrote:

...

On Tue, Jun 04, 2013 at 09:05:58PM -0500, Austin Seipp wrote:

...
I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts?

Submodules are very handy for libraries that someone else maintains: We can make a local change to the library when we need something fixed, and then, when upstream has a fix too, we can jump straight to their fix without having to do any merging.

However, submodules have various disadvantages, e.g. http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt...

The main one for me is that it's fairly easy to lose local changes when using submodules. This is relatively unimportant for the libraries that someone else maintains, as we don't often make any local changes to lose. Even so, I've lost changes on a couple of occasions.

Drive-by-comment: 'sync-all new' doesn't work since we switched to submodules. If someone could fix that I'd be very grateful (or alternatively tell me what workflow you use to figure out what patches you have in your local repos that aren't upstream). Another thing that annoys me about submodules is that I like to keep a local mirror of the GHC repos on my computer. When I clone from it, the submodules all come from darcs.haskell.org instead of my local mirror. I know how to fix this by hand, but it's sync-all's job to get this right (it does for the other repos). Cheers, Simon

...

So the reason we entered this state is that we didn't think the advantages outweighed the disadvantages for the other repositories.

Thanks Ian

Geoffrey Mainland

8 Jun 8 Jun

7:38 a.m.

On 06/06/2013 09:44 PM, Simon Marlow wrote:

...

On 05/06/13 16:59, Ian Lynagh wrote:

...
On Tue, Jun 04, 2013 at 09:05:58PM -0500, Austin Seipp wrote:

...
I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts?

Submodules are very handy for libraries that someone else maintains: We can make a local change to the library when we need something fixed, and then, when upstream has a fix too, we can jump straight to their fix without having to do any merging.

However, submodules have various disadvantages, e.g.

http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt...

...
The main one for me is that it's fairly easy to lose local changes when using submodules. This is relatively unimportant for the libraries that someone else maintains, as we don't often make any local changes to lose. Even so, I've lost changes on a couple of occasions.

Drive-by-comment: 'sync-all new' doesn't work since we switched to submodules. If someone could fix that I'd be very grateful (or alternatively tell me what workflow you use to figure out what patches you have in your local repos that aren't upstream).

Another thing that annoys me about submodules is that I like to keep a local mirror of the GHC repos on my computer. When I clone from it, the submodules all come from darcs.haskell.org instead of my local mirror. I know how to fix this by hand, but it's sync-all's job to get this right (it does for the other repos).

Cheers, Simon

Yes, I have hit this problem too. It's the cause of many of the nightly build failures at GHC HQ. Does anyone know how to get git-submodule to use a mirror? There is the --reference option to 'git submodule update', but I think it still needs a network connection. Geoff

...

...
So the reason we entered this state is that we didn't think the advantages outweighed the disadvantages for the other repositories.

Thanks Ian

Simon Marlow

10 Jun 10 Jun

12:20 p.m.

On 08/06/13 08:38, Geoffrey Mainland wrote:

...

On 06/06/2013 09:44 PM, Simon Marlow wrote:

...
On 05/06/13 16:59, Ian Lynagh wrote:

...
On Tue, Jun 04, 2013 at 09:05:58PM -0500, Austin Seipp wrote:

...
I know we had this discussion sometime recently I think, but can someone *please* explain why we are in this situation of half submodules, half random-floating-git-repository-checkouts?

Submodules are very handy for libraries that someone else maintains: We can make a local change to the library when we need something fixed, and then, when upstream has a fix too, we can jump straight to their fix without having to do any merging.

However, submodules have various disadvantages, e.g.

http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt...

...
...
The main one for me is that it's fairly easy to lose local changes when using submodules. This is relatively unimportant for the libraries that someone else maintains, as we don't often make any local changes to lose. Even so, I've lost changes on a couple of occasions.

Drive-by-comment: 'sync-all new' doesn't work since we switched to submodules. If someone could fix that I'd be very grateful (or alternatively tell me what workflow you use to figure out what patches you have in your local repos that aren't upstream).

Another thing that annoys me about submodules is that I like to keep a local mirror of the GHC repos on my computer. When I clone from it, the submodules all come from darcs.haskell.org instead of my local mirror. I know how to fix this by hand, but it's sync-all's job to get this right (it does for the other repos).

Cheers, Simon

Yes, I have hit this problem too. It's the cause of many of the nightly build failures at GHC HQ.

Does anyone know how to get git-submodule to use a mirror? There is the --reference option to 'git submodule update', but I think it still needs a network connection.

IIRC, you have to manually edit the .git/config file at the correct time (after git submodule init, but before the pull). But sync-all doesn't stop between these two steps, so it's a bit more fiddly. Cheers, Simon

...

Geoff

...
...
So the reason we entered this state is that we didn't think the advantages outweighed the disadvantages for the other repositories.

Thanks Ian

Kazu Yamamoto

17 Jun 17 Jun

6:55 a.m.

Hi, We misunderstood that the new IO manager was not working properly. This is our fault. We confirmed that it is working well. Sorry for bothering you, guys. Anyway, I believe we need a way to check out proper submodules as many others said. --Kazu

...

Hi,

Andreas and I found that the new IO manager is not working properly in the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

% git checkout bb2795db36b36966697c228315ae20767c4a8753 % git submodule update

But this does not checkout proper submodules. For instance, libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific GHC tree.

--Kazu

4345

Age (days ago)

4423

Last active (days ago)

List overview

Download

43 comments

21 participants

participants (21)

Austin Seipp
Daniel Trstenjak
Daniel Vainsencher
David Terei
Erik de Castro Lopo
Geoffrey Mainland
Ian Lynagh
Jan Stolarek
Johan Tibell
John Lato
Kazu Yamamoto
Malcolm Wallace
Manuel M T Chakravarty
Mateusz Kowalczyk
Nicolas Frisby
Nicolas Trangez
Niklas Larsson
Ryan Newton
Simon Marlow
Simon Peyton-Jones
Vincent Hanquez