Proposal: better library management ideas (was: how to checkout proper submodules)

So, there seems to be a fairly clear majority favor in terms of doing something I think. The question then, is what. I'm fairly convinced from Ian's response earlier that submodules *can* be dangerous if you're using a lot of high-traffic packages, especially the ability to trample each other might be bad. I could see this happening for base for example if two people are working on large features and do not coordinate a merge. Git's own merge facility doesn't suffer nearly as bad from this problem and we can figure out how we want that to happen later. However, it seems like every high-volume package is for better or worse, intimately tied to GHC. These packages are also the most problematic to rollback 'in sync' with GHC. As Geoffrey mentioned, this also becomes even MORE impossible if you use merges without fast-forwards or rebases, because dates no longer correlate accurately. These package include base, and testsuite. Probably nofib as well. In some sense, I agree with Malcolm that 'base' being GHC only is maybe unfortunate. But maybe it's not (I'll talk more about this later,) and maybe in the mean time we shouldn't lie to ourselves. So first off, I'd like to propose something I guess, which seems, to me, the best approach for one if we want to avoid developer pain with as many wins as possible in the long run. I hope this doesn't sound actively radical or anything, but it's going to totally sound actively radical (though I don't think it is): --> Let's just put base and testsuite inside the GHC repository directly. No submodules, no floating repos. Just put it directly inside and make a super commit, I guess. GHC becomes the de facto repository. And hey, why not nofib? I know, I know. People really want to split the maintenance burdens I guess, and ideologically the Haskell community is all about clean separation but, please? All of GHC HQ are the de facto maintainers of this stuff anyway. And as Jan mentioned, testsuite is really *so* crucial GHC should have it inline. The testsuite is perhaps the most important of all. There are other candidates for this treatment too, really. For example, why is template-haskell, ghc-prim, and hpc split out? GHC is the only thing that supports them. template-haskell is especially super-intrusive of an extension to support, and arguably hpc as well. integer-simple and integer-gmp follow the exact same story. Same with hoopl and dph. They're all ours. We own them. Just put them all inside GHC and be done with it. Having active fragmentation in the VCS is not necessary when there need be none. These packages de-facto ship with GHC and are very tied to it. I think people might be really opposed to a mega repository or something, but honestly? There's less maintenance, cross-package changes can work correctly and be tracked correctly in terms of history. It's less work for maintainers. It's less to explain and frankly, less to mess up. All of this I think is a huge win. OK, so radical idea is out there. Let's look at some numbers. I think ultimately anything will be a bit painful, because... $ cd ~/ghc/ghc-work $ grep -v "\#" ./packages | head --lines="-1" | wc -l 39 There are 39 sub packages which GHC requires (the -1 is because GHC itself is listed as the final entry.) These aren't all libraries of course. But that's a massive number of dependencies really, so managing them is a pain. How many are submodules already? $ grep -v "\#" packages | head --lines="-1" | awk '{print $3}' | grep "^-" | wc -l 14 So there are 14 submodules, and 25 packages that are free floating. This is a very very large amount of dependent packages. I guess that's just the price we pay. Let's say that hypothetically, we fold all those packages I said into GHC (base, testsuite, nofib, template-haskell, integer-simple, integer-gmp, hpc, ghc-prim.) That leaves 14 submodules and 17 floaters. I actually believe that most of the submodules right now are a fairly good trade off, because as designed they've all got upstreams. That's good. But what about things that are *not* submodules? Let's look at the commits over all the floaters in 1 year. The command is "git log --since="1 year ago" --format=oneline . | wc -l" * ghc-tarballs: 1 * hsc2hs: 11 * haddock: 147 * array: 10 * base: 306 * deepseq: 6 * directory: 19 * filepath: 3 * ghc-prim: 9 * haskell98: 11 * haskell2010: 7 * hoopl: 13 * hpc: 13 * integer-gmp: 29 * integer-simple: 8 * old-time: 5 * old-locale: 3 * process: 40 * template-haskell: 19 * unix: 32 * testsuite: 825 * nofib: 50 * parallel: 5 * stm: 31 * dph: 95 Remember, a lot of the commits in several of these repositories are somewhat closely tied to GHC commits. Testsuite especially, so the numbers lie a little. But *now* let's take out all the ones we wanted to fold in. * ghc-tarballs: 1 * hsc2hs: 11 * haddock: 147 * array: 10 * deepseq: 6 * directory: 19 * filepath: 3 * haskell98: 11 * haskell2010: 7 * old-time: 5 * old-locale: 3 * process: 40 * unix: 32 * parallel: 5 * stm: 31 These are all incredibly low traffic with the exception of haddock, because I was generous and listed it anyway (even though I shouldn't because it uses the GHC API.) stm/parallel are also pretty generous I'd say. Now let's think about this. Most of these could be converted to submodules with very little loss possibly. They are not very actively touched in the process of most development cycles and after looking at a lot of the changes. It's unlikely you'll hit many merge conflicts or weird situations. And even if you do, it's probably not going to happen *often*. It's even possible a lot of these could also become upstreams with separate maintainers. A lot of these are not dependent on GHC necessarily in theory or practice possibly: unix, process, deepseq, array, directory, filepath, etc. Someone could maintain them and developers work with them. Would anyone want to be a maintainer? (I heard some people clamoring for GitHub. Become a maintainer and you can host it where you want :P) Or we could also fold them in too - mega repository style - and just say GHC HQ is the de-facto maintainer, as it is now. If someone wants to step up, we can split it out later. That would just leave 14 sub repositories which are pretty well taken care of with upstreams. Maybe a few more if some people come onboard and can maintain things. This would reduce our problems a lot I feel. Other things like ./sync-all could change to support branching and other basic multi-repo facilities as Jan said, and that's not totally unreasonable either I think. It's about making the normal case easy. We're often concerned with things being at the right granularity and sharing stuff maybe, but I think the trend is pretty frighteningly clear at this point in time - GHC is the de facto implementation of Haskell, and the number of maintainers isn't especially high. And maintaining it is a lot of work (it's truly a World Class™ programming language implementation, after all.) And having 39 repositories is scary. If that's the case, I'd say we should optimize where it counts and minimize our own burden and make it easy to track our changes, and make our workflows as simple as possible. Yes, hypothetically a competitor can come along and give us a run for our money and maybe they'll want to use base and the testsuite and all that other stuff and we'll own it and whatnot. And duplication of work etc etc. And that'll be sad. Or not. And they'll do their complete own thing and run with it. UHC has its own base and testsuite, as does JHC for example. Perhaps sharing things like that is the exception, not the rule or regular occurrence. Ultimately a software project is as much about ideals, and what we believe is worth working on with our time - just as it is about what code you're writing or using right now. Perhaps we should not hinge our development strategies on these tactics any longer when the pattern seems to be darn clear. This proposal is fairly radical. It would require the agreement of almost every single developer, because several of us have varying degrees of ownership over parts of the source that concern all of them. But like I said, it seems the majority would agree something should change, and I don't think we should give up finding it, so let's just see where our ideas take us. And I think the wins would be enormous. I also appreciate you all dealing with the novels I've written over the past few days. -- Regards, Austin - PGP: 4096R/0x91384671

Hi Austin,
I apologize for not having read the full email yet (I'm in a hurry right
now), but...
* Austin Seipp
--> Let's just put base and testsuite inside the GHC repository directly. No submodules, no floating repos. Just put it directly inside and make a super commit, I guess. GHC becomes the de facto repository. And hey, why not nofib?
I know, I know. People really want to split the maintenance burdens I guess, and ideologically the Haskell community is all about clean separation but, please? All of GHC HQ are the de facto maintainers of this stuff anyway. And as Jan mentioned, testsuite is really *so* crucial GHC should have it inline. The testsuite is perhaps the most important of all.
There are other candidates for this treatment too, really. For example, why is template-haskell, ghc-prim, and hpc split out? GHC is the only thing that supports them. template-haskell is especially super-intrusive of an extension to support, and arguably hpc as well. integer-simple and integer-gmp follow the exact same story. Same with hoopl and dph. They're all ours. We own them. Just put them all inside GHC and be done with it. Having active fragmentation in the VCS is not necessary when there need be none. These packages de-facto ship with GHC and are very tied to it.
I'm a strong -1 on this. As one example, we have forks of base and ghc-prim for Haskell suite: https://github.com/haskell-suite/base https://github.com/haskell-suite/ghc-prim which would be much more complicated if these were not independent repositories. But more generally, I think there's still hope that the core packages will be made portable — I'm referring to Joachim Breitner's work on splitting the base. Roman

Hi Austin, I admire your talent for writing emails ;-) As you wrote in your email I'm totally for including testsuite into GHC, because it is essentially part of GHC and it doesn't make sense to have a version of testsuite not corresponding to a version of GHC. As you pointed out the same argument can be used for other packages, but still there is one thing I don't like about that idea. What if an average haskeller wants to improve one of the libraries e.g. by adding comments or fixing a minor bug? If we have a super-repo that person would need to check out everything, which is discouraging. Another, separate issue here is that such a person needs to either register to ghc-devs or trac to send a patch. Using github would be helpful here, though I agree with Geoffrey about merge commits - we'd have to think of sth here. Also, the fact that GHC HQ is maintaining all of the mentioned packages doesn't mean that they need to be stored in one repo, at least not in git (this would make more sense to me with SVN where you can checkout a subdirectory). Still, I strongly agree that sth should be done about current setup. I'm not a git guru so I cannot fully foresee what would be the consequences of turning everything into submodules, but I think that it cannot be worse than it is now, right? Jan Dnia niedziela, 9 czerwca 2013, Roman Cheplyaka napisał:
Hi Austin,
I apologize for not having read the full email yet (I'm in a hurry right now), but...
* Austin Seipp
[2013-06-09 00:23:22-0500] --> Let's just put base and testsuite inside the GHC repository directly. No submodules, no floating repos. Just put it directly inside and make a super commit, I guess. GHC becomes the de facto repository. And hey, why not nofib?
I know, I know. People really want to split the maintenance burdens I guess, and ideologically the Haskell community is all about clean separation but, please? All of GHC HQ are the de facto maintainers of this stuff anyway. And as Jan mentioned, testsuite is really *so* crucial GHC should have it inline. The testsuite is perhaps the most important of al
There are other candidates for this treatment too, really. For example, why is template-haskell, ghc-prim, and hpc split out? GHC is the only thing that supports them. template-haskell is especially super-intrusive of an extension to support, and arguably hpc as well. integer-simple and integer-gmp follow the exact same story. Same with hoopl and dph. They're all ours. We own them. Just put them all inside GHC and be done with it. Having active fragmentation in the VCS is not necessary when there need be none. These packages de-facto ship with GHC and are very tied to it.
I'm a strong -1 on this. As one example, we have forks of base and ghc-prim for Haskell suite:
https://github.com/haskell-suite/base https://github.com/haskell-suite/ghc-prim
which would be much more complicated if these were not independent repositories.
But more generally, I think there's still hope that the core packages will be made portable — I'm referring to Joachim Breitner's work on splitting the base.
Roman
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On Sun, Jun 9, 2013 at 3:47 AM, Jan Stolarek
I admire your talent for writing emails ;-)
You can be honest and just call them what they are: horribly written novellas.
As you wrote in your email I'm totally for including testsuite into GHC, because it is essentially part of GHC and it doesn't make sense to have a version of testsuite not corresponding to a version of GHC. As you pointed out the same argument can be used for other packages, but still there is one thing I don't like about that idea. What if an average haskeller wants to improve one of the libraries e.g. by adding comments or fixing a minor bug? If we have a super-repo that person would need to check out everything, which is discouraging.
This is a good point I hadn't considered, but it's less of a worry for some packages than others. For example, base, ghc-prim and template-haskell are so intimately tied into GHC that reinstalling them is either impossible or a bad idea. To change them, you must build your own GHC anyway (either from source, or HEAD.) And if you're using a Haskell Platform compiler, clearly you'd have no luck with the git repository anyway (due to their strong interdependence.) But again, I'm totally OK with a lot of these other repositories being submodules. For example, process, unix, deepseq, filepath, directory. Those don't need to be folded in. Lots of them could have their own maintainers with separate upstreams. They're touched infrequently enough traffic concerns aren't as much of a deal. I just want the most high-traffic'd repositories dealt with, because in practice these are the *most* critical and the most interdependent. That in turn leads to the most problems.
Another, separate issue here is that such a person needs to either register to ghc-devs or trac to send a patch. Using github would be helpful here, though I agree with Geoffrey about merge commits - we'd have to think of sth here. Also, the fact that GHC HQ is maintaining all of the mentioned packages doesn't mean that they need to be stored in one repo, at least not in git (this would make more sense to me with SVN where you can checkout a subdirectory).
Not necessarily, the 'owners' of the packages are still the libraries committee. People can propose changes there as they have always done. It just so happens most of the 'libraries' maintained packages are de-facto maintained by GHC people. You're right not all of them need to be folded in. But I think several of them should be, and these are the ones that hurt the most. (Plus, my radical proposal can't be considered totally, completely radical unless I propose something which would - of course - be shot down.)
Still, I strongly agree that sth should be done about current setup. I'm not a git guru so I cannot fully foresee what would be the consequences of turning everything into submodules, but I think that it cannot be worse than it is now, right?
For some submodules, it could certainly be worse. Please see Ian's link in the prior discussion concerning submodules - for high-traffic repositories, some of the concerns are disconcerning.
Jan
Dnia niedziela, 9 czerwca 2013, Roman Cheplyaka napisał:
Hi Austin,
I apologize for not having read the full email yet (I'm in a hurry right now), but...
* Austin Seipp
[2013-06-09 00:23:22-0500] --> Let's just put base and testsuite inside the GHC repository directly. No submodules, no floating repos. Just put it directly inside and make a super commit, I guess. GHC becomes the de facto repository. And hey, why not nofib?
I know, I know. People really want to split the maintenance burdens I guess, and ideologically the Haskell community is all about clean separation but, please? All of GHC HQ are the de facto maintainers of this stuff anyway. And as Jan mentioned, testsuite is really *so* crucial GHC should have it inline. The testsuite is perhaps the most important of al
There are other candidates for this treatment too, really. For example, why is template-haskell, ghc-prim, and hpc split out? GHC is the only thing that supports them. template-haskell is especially super-intrusive of an extension to support, and arguably hpc as well. integer-simple and integer-gmp follow the exact same story. Same with hoopl and dph. They're all ours. We own them. Just put them all inside GHC and be done with it. Having active fragmentation in the VCS is not necessary when there need be none. These packages de-facto ship with GHC and are very tied to it.
I'm a strong -1 on this. As one example, we have forks of base and ghc-prim for Haskell suite:
https://github.com/haskell-suite/base https://github.com/haskell-suite/ghc-prim
which would be much more complicated if these were not independent repositories.
But more generally, I think there's still hope that the core packages will be made portable — I'm referring to Joachim Breitner's work on splitting the base.
Roman
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671

You can be honest and just call them what they are: horribly written novellas. Actually, I was thinking that instead of posting to the list you might consider publishing your emails as papers on workshops or symposia ;)
for high-traffic repositories, some of the concerns are disconcerning. But the high-traffic repositories (base, testsuite) are already submodules, right? For me the major problem of the current setup is that we cannot use one of the most important features of a VCS, i.e. going back in time. The only solutions to this problem that I am aware of are folding or turning into submodules all libraries that GHC depends on.
I just had this moment of enlightment that the question of including a repo as a submodule (or folding it into GHC tree) is not a matter of traffic, but a matter of that library's implementation. If it uses GHC-specific API then it goes in, because it is tightly-coupled. If it is implemented in standard Haskell then it can stay out, because changes to compiler should not affect it. This is pretty simple criterium to identify libraries that we should be concerned with (perhaps this is obvious, but it only occured to me now). So a high-traffic repo that does not depend on non-standard features of GHC could still be kept as an in-tree repo, without affecting the ability to go back in time. Jan

Oh, and I've been made aware that git 1.7 and later can checkout a subdirectory of a repo - this partially invalidates my previous argument. I'm saying partially, because it is a bit more difficult than dealing with a library that has its own repo + it seems that some potential contributors might not be aware of this feature (like me today in the morning). Janek

Hi Roman,
On Sun, Jun 9, 2013 at 1:44 AM, Roman Cheplyaka
I'm a strong -1 on this. As one example, we have forks of base and ghc-prim for Haskell suite:
https://github.com/haskell-suite/base https://github.com/haskell-suite/ghc-prim
which would be much more complicated if these were not independent repositories.
I hate being that person but, if the purpose of these forks is to work around specific bugs in HSE and/or fix problems with name resolution of GHC-specific terms, which sort of seems to be the case from the log, I don't think hacking base & co. is a long term solution. It could potentially need infinite ongoing maintenance. I went down this road with LHC too. And my gut feeling is that hacking ghc-prim out-of-band feels so amazingly wrong I'm frankly not sure if "I need to fork it" can actually warrant a huge amount of sympathy, to the point of keeping the repository separate for that 1 fork in existence (granted, ghc-prim is still pretty low traffic. But base is not.) If you DO need help from GHC, is there really nothing we could easily and reasonably do to further assist you? I think asking for specific, principled solutions on our part is not out of the question here. Are there any other forks of base people have for any particular reason? What reasons are those?
But more generally, I think there's still hope that the core packages will be made portable — I'm referring to Joachim Breitner's work on splitting the base.
To be clear, packages and their numbers aren't *really* the problem. It's repositories. The numbers just make this slightly worse. Adding packages and adding repositories both add overhead. Adding repositories adds a significantly *larger* amount of complexity, all things considered. The only honest, legitimate way to reduce that complexity is to fold in repositories. But this means that we have to give something up, too. If base were to get split into 5 packages or 8 packages, that's potentially fine by me, even welcomed. What I don't want is 5 more repositories that are all intimately tied to GHC's build and features, which a majority of GHC-specific work will be driven towards, and over time that we then must manage and synchronize heavily. That's just a massive amount of work. Just looking at Joachim's fork of base on github, I already have some reservations about its current implementation. Like, base-float still exports GHC-specific namespaces. Every package still has a lot GHC specific code, as opposed to some isolated substrate that we provide and base-* packages interface with. So we're going to maintain all of that, it's the sad truth. And if Joachim's patch were merged tomorrow somehow, I think that frankly so much of it would still be under GHC control, my argument would still stand. It would still be one repository. We would still own it. It makes base more granular, but this has almost nothing to do with our real problems. Fixing all of that where we're not *actually* in control of it is a ton of work. The current patches just don't solve that I think. And this was last discussed in February? So what's the timeline here? Clearly we're not even done with the API discussion at all. So, 6 months? A year? Who knows? "When it's done"? I'm not sure most of us want to wait that long, especially considering the need to track down bugs and have accurate historical logs is a fairly frequent occurrence.
Roman
-- Regards, Austin - PGP: 4096R/0x91384671

On 09/06/13 17:51, Ian Lynagh wrote:
On Sun, Jun 09, 2013 at 11:15:37AM -0500, Austin Seipp wrote:
I'm referring to Joachim Breitner's work on splitting the base.
So what's the timeline here?
As soon as possible after 7.8 is branched.
Has there been a decision somewhere on what to do? The wiki page sets out the parameters of the design, but doesn't have any conclusions that I could see. Splitting base has the potential to be extremely destabilising, I want to make sure that we're getting appreciable benefits in exchange. Cheers, Simon

Simon asks: | >>> I'm referring to Joachim Breitner's work on | >>> splitting the base. | >> | >> So what's the timeline here? | > | > As soon as possible after 7.8 is branched. | | Has there been a decision somewhere on what to do? The wiki page sets | out the parameters of the design, but doesn't have any conclusions that | I could see. Splitting base has the potential to be extremely | destabilising, I want to make sure that we're getting appreciable | benefits in exchange. No, no decision has been taken so far as I know. Happily, I think the new core-libraries committee is planning to think actively about this question. Edward: are you going to have a public mailing list (that anyone can join) for most discussion, plus a private committee-only one for occasional use? Or are you going to use libraries@ for the former? Simon

* Austin Seipp
Hi Roman,
On Sun, Jun 9, 2013 at 1:44 AM, Roman Cheplyaka
wrote: I'm a strong -1 on this. As one example, we have forks of base and ghc-prim for Haskell suite:
https://github.com/haskell-suite/base https://github.com/haskell-suite/ghc-prim
which would be much more complicated if these were not independent repositories.
I hate being that person but, if the purpose of these forks is to work around specific bugs in HSE and/or fix problems with name resolution of GHC-specific terms, which sort of seems to be the case from the log, I don't think hacking base & co. is a long term solution. It could potentially need infinite ongoing maintenance. I went down this road with LHC too.
It is only partly to work around bugs in HSE. The second part is to work around bugs and quirks in base itself. There are places where CPP wouldn't produce meaningful code unless __GLASGOW_HASKELL__ is defined, for example. Even ignoring those obvious bugs for a minute, currently the large part of base is defined under GHC.* hierarchy and isn't available unless __GLASGOW_HASKELL__ is defined. But okay, let's suppose that at some point everything is fixed and we don't have to *fork* base. We still would like to use it! Should we fetch the whole GHC tree in order to get its development version?
And my gut feeling is that hacking ghc-prim out-of-band feels so amazingly wrong I'm frankly not sure if "I need to fork it" can actually warrant a huge amount of sympathy, to the point of keeping the repository separate for that 1 fork in existence (granted, ghc-prim is still pretty low traffic. But base is not.)
It *is* wrong, but who is to blame that a big part of Prelude comes from there, including all logical operations and classes Eq and Ord?
If you DO need help from GHC, is there really nothing we could easily and reasonably do to further assist you? I think asking for specific, principled solutions on our part is not out of the question here.
The best help would be to make and keep base relatively portable and not to introduce superfluous conditional compilation. (I realise that a lot of that has just accumulated historically, but now is a good time to get rid of it.) It is a ton of work, and I'm very happy when I see people like Joachim trying to do something in that direction. Right now I'm only asking not to make their work even harder by moving base under the ghc repository.
But more generally, I think there's still hope that the core packages will be made portable — I'm referring to Joachim Breitner's work on splitting the base.
To be clear, packages and their numbers aren't *really* the problem.
What I'm trying to say here is that there's hope for a portable base. Maybe not in the form of split base — I don't know. But it's the direction we should be moving anyways. And usurping base by GHC is a move in the opposite direction. Roman

On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka
What I'm trying to say here is that there's hope for a portable base. Maybe not in the form of split base — I don't know. But it's the direction we should be moving anyways.
And usurping base by GHC is a move in the opposite direction.
Maybe that's a good thing? The current situation doesn't really seem to be working. Keeping base separate negatively impacts workflow of GHC devs (as evidenced by these threads), just to support something that other compilers don't use anyway. Maybe it would be easier to fold base back into ghc and try again, perhaps after some code cleanup? Having base in ghc may provide more motivation to separate it properly.

* John Lato
On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka
wrote: What I'm trying to say here is that there's hope for a portable base. Maybe not in the form of split base — I don't know. But it's the direction we should be moving anyways.
And usurping base by GHC is a move in the opposite direction.
Maybe that's a good thing? The current situation doesn't really seem to be working. Keeping base separate negatively impacts workflow of GHC devs (as evidenced by these threads), just to support something that other compilers don't use anyway. Maybe it would be easier to fold base back into ghc and try again, perhaps after some code cleanup? Having base in ghc may provide more motivation to separate it properly.
After base is in GHC, separating it again will be only harder, not easier. Or do you have a specific plan in mind? Roman

On Mon, Jun 10, 2013 at 1:32 PM, Roman Cheplyaka
On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka
wrote: What I'm trying to say here is that there's hope for a portable base. Maybe not in the form of split base — I don't know. But it's the direction we should be moving anyways.
And usurping base by GHC is a move in the opposite direction.
Maybe that's a good thing? The current situation doesn't really seem to be working. Keeping base separate negatively impacts workflow of GHC devs (as evidenced by these threads), just to support something that other compilers don't use anyway. Maybe it would be easier to fold base back into ghc and try again, perhaps after some code cleanup? Having base in ghc may
* John Lato
[2013-06-10 07:59:55+0800] provide more motivation to separate it properly.
After base is in GHC, separating it again will be only harder, not easier. Or do you have a specific plan in mind?
It's more about motivation. It seems to me right now base is in a halfway state. People think that moving it further away from ghc is The Right Thing To Do, but nobody is feeling enough pain to be sufficiently motivated to do it. If we apply pain, then someone will be motivated to do it properly. And if nobody steps up, maybe having a platform-agnostic base isn't really very important.

I forget who said it, but it's true that we have uncritically assumed that
* One package = one repository
But I now realise that there's no need for that. We could certainly have one repo with multiple packages.
What are the motivations for having a separate repository. Are these two the main ones?
* Sense of "ownership" by the maintainer. (My package isn't merely a barnacle on the side of GHC.)
* Ability to release new versions un-synchronised with GHC releases
And neither really hold for the GHC-maintained packages.
One merit of splitting up 'base' will be that a chunk of it can go in the "independent" sector, leaving a smaller rump that is intimately coupled to GHC. But we don't need to await that glorious day before getting on with the debate this thread is so constructively having.
Again: I am a non-expert. I will be happy to fall in with whatever you git experts decide, provided (a) you have some measure of agreement that it's step forward (b) you tell me clearly what my workflows should be.
Simon
From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] On Behalf Of John Lato
Sent: 10 June 2013 01:00
To: Roman Cheplyaka
Cc: ghc-devs@haskell.org
Subject: Re: Proposal: better library management ideas (was: how to checkout proper submodules)
On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka

My motivation for having a separate repository is that I can check it
out and work on it without having to check out the whole GHC.
At the moment base and ghc-prim are used by the name-resolving compiler
http://haskell-suite.github.io/haskell-names/
Roman
* Simon Peyton-Jones
I forget who said it, but it's true that we have uncritically assumed that
* One package = one repository But I now realise that there's no need for that. We could certainly have one repo with multiple packages.
What are the motivations for having a separate repository. Are these two the main ones?
* Sense of "ownership" by the maintainer. (My package isn't merely a barnacle on the side of GHC.)
* Ability to release new versions un-synchronised with GHC releases
And neither really hold for the GHC-maintained packages.
One merit of splitting up 'base' will be that a chunk of it can go in the "independent" sector, leaving a smaller rump that is intimately coupled to GHC. But we don't need to await that glorious day before getting on with the debate this thread is so constructively having.
Again: I am a non-expert. I will be happy to fall in with whatever you git experts decide, provided (a) you have some measure of agreement that it's step forward (b) you tell me clearly what my workflows should be.
Simon
From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] On Behalf Of John Lato Sent: 10 June 2013 01:00 To: Roman Cheplyaka Cc: ghc-devs@haskell.org Subject: Re: Proposal: better library management ideas (was: how to checkout proper submodules)
On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka
mailto:roma@ro-che.info> wrote: What I'm trying to say here is that there's hope for a portable base. Maybe not in the form of split base - I don't know. But it's the direction we should be moving anyways.
And usurping base by GHC is a move in the opposite direction.
Maybe that's a good thing? The current situation doesn't really seem to be working. Keeping base separate negatively impacts workflow of GHC devs (as evidenced by these threads), just to support something that other compilers don't use anyway. Maybe it would be easier to fold base back into ghc and try again, perhaps after some code cleanup? Having base in ghc may provide more motivation to separate it properly.

On Mon, Jun 10, 2013 at 10:54:06AM +0300, Roman Cheplyaka wrote:
My motivation for having a separate repository is that I can check it out and work on it without having to check out the whole GHC.
With git-subtree you can have both. A separate repository for easy forking of e.g. base and just one repository for GHC with a sub directory for base. At work we're sharing a quite big library between two development teams. There's a separate repository for this library, which is used for synchronization between both projects. Each project has it's own repository with a sub directory containing the library and git-subtree is used to merge this sub directory with the library repository. Most developers don't even have to care that there's a separate repository for the library, they're just working with the one project repository.
From time to time - perhaps once a week - the changes in the projects get merged back into the library repository.
git-submodules is a burden for every developer, git-subtree is "just" a burden for the developer doing the merges with the external repository. The git-subtree script is more or less just a nice wrapper around the subtree merge strategy of git-merge. It uses only the available git commands. Greetings, Daniel

On 06/10/2013 11:06 AM, Daniel Trstenjak wrote:
On Mon, Jun 10, 2013 at 10:54:06AM +0300, Roman Cheplyaka wrote:
My motivation for having a separate repository is that I can check it out and work on it without having to check out the whole GHC.
With git-subtree you can have both. A separate repository for easy forking of e.g. base and just one repository for GHC with a sub directory for base.
At work we're sharing a quite big library between two development teams. There's a separate repository for this library, which is used for synchronization between both projects. Each project has it's own repository with a sub directory containing the library and git-subtree is used to merge this sub directory with the library repository.
Most developers don't even have to care that there's a separate repository for the library, they're just working with the one project repository.
From time to time - perhaps once a week - the changes in the projects get merged back into the library repository.
git-submodules is a burden for every developer, git-subtree is "just" a burden for the developer doing the merges with the external repository.
The git-subtree script is more or less just a nice wrapper around the subtree merge strategy of git-merge. It uses only the available git commands.
I mentioned git-subtree as a possible alternative earlier in the thread. One of the primary objections at the time was that the subtree command is not installed by default in, e.g., the Ubuntu git package. Merging base and/or testsuite into the ghc repository wouldn't solve the primary issue, which is that we can't reproduce a full source code tree without to resorting to the fingerprints script, and even then we can't bisect. Side note: the fingerprint script *didn't even work* for almost a year after it was introduced; see commit 73ce2e70. I think there are three realistic choices about how we should resolve this issue. Our choice affects the decision about whether or not base and/or testsuite should be merged into the ghc repository, so I think the merger discussion should be tabled for the time being. 1) Leave everything as-is. We live with a mix of submodules and fingerprints. 2) Use submodules. 3) Use subtrees. I don't think there is a realistic third option, e.g., use a mix of subtrees and submodules, but I may be wrong. So, if we can agree that these are the three realistic alternatives, I volunteer to flesh out the wiki so it lists the pros and cons of each choice. If there are other sane paths forward besides these three, please let us know! Geoff

On Mon, Jun 10, 2013 at 11:23:13AM +0100, Geoffrey Mainland wrote:
Side note: the fingerprint script *didn't even work* for almost a year after it was introduced; see commit 73ce2e70.
Which implies that wanting to go back in time is rare, so making it easy should be given low weight when considering the options?
3) Use subtrees.
Is this possible with subtrees?: * Initially ghc's Cabal repo is at the same commit as upstream * We make a local commit 123 in Cabal to fix some bug * Cabal upstream makes a commit 456 to fix the same bug differently * We jump to commit 456, in such a way that we don't end up merging with our 123 commit every time we pull from Cabal in the future Thanks Ian -- Ian Lynagh, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Mon, 2013-06-10 at 11:45 +0100, Ian Lynagh wrote:
Side note: the fingerprint script *didn't even work* for almost a year after it was introduced; see commit 73ce2e70.
Which implies that wanting to go back in time is rare, so making it easy should be given low weight when considering the options?
If 'git bisect' would work (out of the box) on the GHC repo, going back in time would certainly be a more common operation. Nicolas

On 06/10/2013 11:49 AM, Nicolas Trangez wrote:
On Mon, 2013-06-10 at 11:45 +0100, Ian Lynagh wrote:
Side note: the fingerprint script *didn't even work* for almost a year after it was introduced; see commit 73ce2e70.
Which implies that wanting to go back in time is rare, so making it easy should be given low weight when considering the options?
If 'git bisect' would work (out of the box) on the GHC repo, going back in time would certainly be a more common operation.
I agree. Going back in time is really, really hard with fingerprints because you have to get the fingerprint files somewhere, and they don't always exist. Also, it could be the case that people used the fingerprint files to "bisect" but didn't notice they weren't quite right because the fingerprints were "close enough." OK for bug-finding, terrible for reproduceable builds. Many people on the list have been quite vocal about wanting to be able to bisect. *I* have wanted to be able to bisect many, many times, but I don't because it's such a pain. I also want to be able to tell people how to build branches of ghc that I am working on, e.g., the simd and th-new branches. That means having to store a fingerprint file somewhere public and keep it in sync with my tree. I would much rather just tell them to check out the foo branch of ghc and be done with it. Geoff

Hi Ian, On Mon, Jun 10, 2013 at 11:45:22AM +0100, Ian Lynagh wrote:
Is this possible with subtrees?:
* Initially ghc's Cabal repo is at the same commit as upstream * We make a local commit 123 in Cabal to fix some bug * Cabal upstream makes a commit 456 to fix the same bug differently * We jump to commit 456, in such a way that we don't end up merging with our 123 commit every time we pull from Cabal in the future
Yes. Every repository that's added by git-subtree to your repository is represented as a separate branch. So everything that applies to the merging of branches also applies to the merging by git-subtree. Greetings, Daniel

On Mon, Jun 10, 2013 at 01:13:37PM +0200, Daniel Trstenjak wrote:
On Mon, Jun 10, 2013 at 11:45:22AM +0100, Ian Lynagh wrote:
Is this possible with subtrees?:
* Initially ghc's Cabal repo is at the same commit as upstream * We make a local commit 123 in Cabal to fix some bug * Cabal upstream makes a commit 456 to fix the same bug differently * We jump to commit 456, in such a way that we don't end up merging with our 123 commit every time we pull from Cabal in the future
Yes.
Every repository that's added by git-subtree to your repository is represented as a separate branch. So everything that applies to the merging of branches also applies to the merging by git-subtree.
I didn't follow that. Here's an example of what happens with just a plain git repo, with no branches, submodules or subrepos involved: -----8<----------8<----------8<----------8<----- upstream$ git init upstream$ echo content > file upstream$ git add file upstream$ git commit -a -m initial $ git clone upstream ghc $ cd ghc ghc$ echo fix1 > file ghc$ git commit -a -m fix1 upstream$ echo fix2 > file upstream$ git commit -a -m fix2 ghc$ git pull --no-edit -X theirs upstream$ echo feature1 > file upstream$ git commit -a -m feature1 ghc$ git pull --no-edit -X theirs upstream$ echo feature2 > file upstream$ git commit -a -m feature2 ghc$ git pull --no-edit -X theirs -----8<----------8<----------8<----------8<----- At the end of this, you'll see that the ghc repo has a number of merge commits. I guess they may not cause any actual problems, but it's certainly nicer not having them (which is what using submodules gives us). Thanks Ian -- Ian Lynagh, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Hi Ian,
I guess they may not cause any actual problems, but it's certainly nicer not having them (which is what using submodules gives us).
I don't quite understand how you should get rid of these merge commits by using submodules, because at the end every submodule is just a git repository and behaves in the same way as every other git repository for merges. You can get rid of these merge commit by using the '--rebase' option of git-pull. I put your git command lines into the attached script 'ghc_git_test'. Now you can get your version and the version using '--rebase' by calling: mkdir your_version rebase_version cd your_version ghc_git_test -X theirs cd ../rebase_version ghc_git_test --rebase -X ours You will certainly ask why it's 'ours' instead of 'theirs' for the rebase case, well, that's one of the quite counterintuitive things in the git user interface. Greetings, Daniel

On Wed, Jun 12, 2013 at 12:54:38AM +0200, Daniel Trstenjak wrote:
I guess [the merge commits] may not cause any actual problems, but it's certainly nicer not having them (which is what using submodules gives us).
Just to clarify, my problem isn't so much that there are merge commits (although it would still be nicer if there weren't), but that it is hard to see whether we are in the same state as upstream, or to see what the differences between us and upstream are.
I don't quite understand how you should get rid of these merge commits by using submodules,
With submodules we can do cd libraries/Cabal git reset --hard <an upstream commit id> cd .. git commit -a and we will jump to that commit, without needing to merge it with the commit that we were at before.
You can get rid of these merge commit by using the '--rebase' option of git-pull.
We can't rebase, as these patches are in everyone else's GHC tree. Thanks Ian -- Ian Lynagh, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On 06/12/2013 12:37 PM, Ian Lynagh wrote:
On Wed, Jun 12, 2013 at 12:54:38AM +0200, Daniel Trstenjak wrote:
I guess [the merge commits] may not cause any actual problems, but it's certainly nicer not having them (which is what using submodules gives us).
Just to clarify, my problem isn't so much that there are merge commits (although it would still be nicer if there weren't), but that it is hard to see whether we are in the same state as upstream, or to see what the differences between us and upstream are.
I don't quite understand how you should get rid of these merge commits by using submodules,
With submodules we can do
cd libraries/Cabal git reset --hard <an upstream commit id> cd .. git commit -a
and we will jump to that commit, without needing to merge it with the commit that we were at before.
You can get rid of these merge commit by using the '--rebase' option of git-pull.
We can't rebase, as these patches are in everyone else's GHC tree.
Only if you have pushed the "ghc" tree. If it is only local, then rebasing is just fine. And, I would argue, desirable. For the record, I am in favor of moving everything to submodules. Geoff

Hmm, okay, if you're saying that this workflow works and is not very
painful, then I withdraw my objection.
Thanks,
Roman
* Daniel Trstenjak
On Mon, Jun 10, 2013 at 10:54:06AM +0300, Roman Cheplyaka wrote:
My motivation for having a separate repository is that I can check it out and work on it without having to check out the whole GHC.
With git-subtree you can have both. A separate repository for easy forking of e.g. base and just one repository for GHC with a sub directory for base.
At work we're sharing a quite big library between two development teams. There's a separate repository for this library, which is used for synchronization between both projects. Each project has it's own repository with a sub directory containing the library and git-subtree is used to merge this sub directory with the library repository.
Most developers don't even have to care that there's a separate repository for the library, they're just working with the one project repository.
From time to time - perhaps once a week - the changes in the projects get merged back into the library repository.
git-submodules is a burden for every developer, git-subtree is "just" a burden for the developer doing the merges with the external repository.
The git-subtree script is more or less just a nice wrapper around the subtree merge strategy of git-merge. It uses only the available git commands.
Greetings, Daniel
participants (10)
-
Austin Seipp
-
Daniel Trstenjak
-
Geoffrey Mainland
-
Ian Lynagh
-
Jan Stolarek
-
John Lato
-
Nicolas Trangez
-
Roman Cheplyaka
-
Simon Marlow
-
Simon Peyton-Jones