
| It's for technical reasons, and the strongest one being: GitHub doesn't
| allow us to establish strong invariants regarding submodule gitlink
| referential integrity for submodules (which I implemented a couple years ago
| for git.haskell.org).
Interesting. It'd be good to document what the technical reasons are. For example I don’t know what the strong invariants are.
A good place to describe them might be the Repositories pages
https://ghc.haskell.org/trac/ghc/wiki/Repositories
Many thanks
Simon
| -----Original Message-----
| From: Herbert Valerio Riedel [mailto:hvriedel@gmail.com]
| Sent: 18 December 2017 11:13
| To: Simon Peyton Jones

2017-12-18 17:01 GMT+01:00 Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org>:
| It's for technical reasons, and the strongest one being: GitHub doesn't | allow us to establish strong invariants regarding submodule gitlink | referential integrity for submodules (which I implemented a couple years ago | for git.haskell.org).
Interesting. It'd be good to document what the technical reasons are. For example I don’t know what the strong invariants are. [...]
Me neither. :-] Looking at the repositories Wiki page, it seems to be related to the fact that GitHub doesn't offer git hooks, which are used to check the invariants. This leads to another question: Is it *really* necessary to have the invariant checks implemented as a git hook? If you use any kind of continuous integration, which GHC obviously does, one can move the checks to e.g. CircleCI (or whatever CI is used). This is a tradeoff: Doing it that way, you catch incorrect commits a little bit later, but it makes the overall arcane repository magic quite a bit simpler, probably removing the need for mirroring. This seems to be a good tradeoff, but of course I might be missing some details here.

Hi, On 2017-12-19 at 08:31:06 +0100, Sven Panne wrote:
This is a tradeoff: Doing it that way, you catch incorrect commits a little bit later, but it makes the overall arcane repository magic quite a bit simpler, probably removing the need for mirroring.
We'd need mirroring anyway, as we want to keep control over our infrastructure and not have to trust a 3rd party infrastructure to safely handle our family jewels: GHC's source tree. Also, catching bad commits "a bit later" is just asking for trouble -- by the time they're caught the git repos have already lost their invariant and its a big mess to recover; the invariant I devised and whose validation I implemented 4 years ago has served us pretty well, and has ensured that we never glitched into incorrectness; I'm also not sure why it's being suggested to switch to a less principled and more fragile scheme now. As a Haskell programmer, I rather err on the side of correctness for mission critical things, and shifting checks we can (and already) do statically to CI feels to me like embracing `-fdefer-type-errors`... :-) Cheers, HVR

2017-12-19 9:50 GMT+01:00 Herbert Valerio Riedel
We'd need mirroring anyway, as we want to keep control over our infrastructure and not have to trust a 3rd party infrastructure to safely handle our family jewels: GHC's source tree.
I think this is a question of perspective: Having the master repository on GitHub doesn't mean you are in immediate danger or lose your "family jewels". IMHO it's quite the contrary: I'm e.g. sure that in case that something goes wrong with GitHub, there is far more manpower behind it to fix that than for any self-hosted repository. And you can of course have some mirror of your GitHub repo in case of e.g. an earthquake/meteor/... in the San Francisco area... ;-) It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Also, catching bad commits "a bit later" is just asking for trouble -- by the time they're caught the git repos have already lost their invariant and its a big mess to recover;
This is by no means different than saying: "I want to run 'validate' in the commit hook, otherwise it's a big mess." We don't do this for obvious reasons, and what is the "big mess" if there is some incorrect submodule reference for a short time span? How is that different from somebody introducing e.g. a subtle compiler bug in a commit?
the invariant I devised and whose validation I implemented 4 years ago has served us pretty well, and has ensured that we never glitched into incorrectness; I'm also not sure why it's being suggested to switch to a less principled and more fragile scheme now. [...]
Because the whole repository structure is overly complicated and simply hosting everything on GitHub would simplify things. Again: I'm well aware that there are tradeoffs involved, but I would really appreciate simplifications. I have the impression that the entry barrier to GHC development has become larger and larger over the years, partly because of very non-standard tooling, partly because of the increasingly arcane repository organization. There are reasons that other projects like Rust attract far more developers... :-/ </GrumpyMode>

It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Speaking for myself, I have no hostility towards GitHub, and there is no GHC-HQ bias against it that I know of. If it serves the purpose better, we should use it. Indeed that’s why I asked my original question. I agree with your point that data may actually be safer in GitHub than in our own repo. (And there is nothing to stop a belt-and-braces mirror backup system.)
The issue is: does GitHub serve the purpose better? We have frequently debated this multi-dimensional question. And we should continue to do so: the answers may change over time (GitHub’s facilities are not static; and its increasing dominance is itself a cultural familiarity factor that simply was not the case five years ago).
Simon
From: Sven Panne [mailto:svenpanne@gmail.com]
Sent: 19 December 2017 09:30
To: Herbert Valerio Riedel

On Tue, Dec 19, 2017, 09:48 Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:
It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Speaking for myself, I have no hostility towards GitHub, and there is no GHC-HQ bias against it that I know of. If it serves the purpose better, we should use it. Indeed that’s why I asked my original question. I agree with your point that data may actually be safer in GitHub than in our own repo. (And there is nothing to stop a belt-and-braces mirror backup system.)
These are just a few of the times github has been down in 2017 http://currentlydown.com/github.com compared to haskell.org http://currentlydown.com/haskell.org Other third parties such as gitlab.com have suffered catastrophic data failures and by the very virtue of them being free means they don't owe you anything. I have nothing against github for small projects. I have nothing but hate for it for large ones. And I don't see that changing any time soon as everything they do seems to be half baked and the bare minimum
The issue is: does GitHub serve the purpose better? http://currentlydown.co have frequently debated this multi-dimensional question. And we should continue to do so: the answers may change over time (GitHub’s facilities are not static; and its increasing dominance is itself a cultural familiarity factor that simply was not the case five years ago).
As is often the case in computing history. Dominance does not mean best nor fit for purpose. Supposedly switching to these cloud based CIs were suppose to solve all our issues. And to this day none of them are working not withstanding the massive amount of effort wasted to get them to work. Simon
*From:* Sven Panne [mailto:svenpanne@gmail.com] *Sent:* 19 December 2017 09:30 *To:* Herbert Valerio Riedel
*Cc:* Simon Peyton Jones ; ghc-devs@haskell.org Devs *Subject:* Re: Can't push to haddock
2017-12-19 9:50 GMT+01:00 Herbert Valerio Riedel
: We'd need mirroring anyway, as we want to keep control over our infrastructure and not have to trust a 3rd party infrastructure to safely handle our family jewels: GHC's source tree.
I think this is a question of perspective: Having the master repository on GitHub doesn't mean you are in immediate danger or lose your "family jewels". IMHO it's quite the contrary: I'm e.g. sure that in case that something goes wrong with GitHub, there is far more manpower behind it to fix that than for any self-hosted repository. And you can of course have some mirror of your GitHub repo in case of e.g. an earthquake/meteor/... in the San Francisco area... ;-)
It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Also, catching bad commits "a bit later" is just asking for trouble -- by the time they're caught the git repos have already lost their invariant and its a big mess to recover;
This is by no means different than saying: "I want to run 'validate' in the commit hook, otherwise it's a big mess." We don't do this for obvious reasons, and what is the "big mess" if there is some incorrect submodule reference for a short time span? How is that different from somebody introducing e.g. a subtle compiler bug in a commit?
the invariant I devised and whose validation I implemented 4 years ago has served us pretty well, and has ensured that we never glitched into incorrectness; I'm also not sure why it's being suggested to switch to a less principled and more fragile scheme now. [...]
Because the whole repository structure is overly complicated and simply hosting everything on GitHub would simplify things. Again: I'm well aware that there are tradeoffs involved, but I would really appreciate simplifications. I have the impression that the entry barrier to GHC development has become larger and larger over the years, partly because of very non-standard tooling, partly because of the increasingly arcane repository organization. There are reasons that other projects like Rust attract far more developers... :-/
</GrumpyMode>
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Dominance does not mean best nor fit for purpose.
I could not agree more. Dominance leads to familiarity, and that /is/ valuable. And dominance suggests that it is fit for purpose for a large group. But the question is: what is fit for our purposes?
I think that is all that Herbert was getting at, and it’s the right question. I’m making no assumptions about the answer, just saying that we should have no built-in bias (for or against) cloud solutions.
(And perhaps you are right to question my suggestion that a cloud repo is more reliable than a home-grown one. I have no data.)
Simon
From: Phyx [mailto:lonetiger@gmail.com]
Sent: 19 December 2017 10:08
To: Simon Peyton Jones

2017-12-19 11:07 GMT+01:00 Phyx
These are just a few of the times github has been down in 2017 http://currentlydown.com/github.com compared to haskell.org http:// currentlydown.com/haskell.org [...]
I can't see any data for haskell.org on that page, apart from the fact that it is up right now. Furthermore, I very much question the data on currentlydown.com: According to it, Google, Facebook, YouTube, Yahoo! and Amazon were down on March 25th for roughly an hour. A much more probable explanation: currentlydown.com had problems, not the five of the biggest sites in the world. This undermines the trust in the rest of the outage reports a bit...

Cool, then let's turn to media reports then such as
https://techcrunch.com/2017/07/31/github-goes-down-and-takes-developer-produ...
do you have one for git.haskell.org going down?
On Tue, Dec 19, 2017, 10:56 Sven Panne
2017-12-19 11:07 GMT+01:00 Phyx
: These are just a few of the times github has been down in 2017 http://currentlydown.com/github.com compared to haskell.org http://currentlydown.com/haskell.org [...]
I can't see any data for haskell.org on that page, apart from the fact that it is up right now. Furthermore, I very much question the data on currentlydown.com: According to it, Google, Facebook, YouTube, Yahoo! and Amazon were down on March 25th for roughly an hour. A much more probable explanation: currentlydown.com had problems, not the five of the biggest sites in the world. This undermines the trust in the rest of the outage reports a bit...

2017-12-19 12:47 GMT+01:00 Phyx
Cool, then let's turn to media reports then such as https://techcrunch.com/2017/07/31/github-goes-down-and-takes-developer- productivity-with-it/ do you have one for git.haskell.org going down?
Of course this question is a classic example of "the absence of evidence is not the evidence of absence" fallacy, but anyway: * https://www.reddit.com/r/haskell/comments/4gppm8/ann_hackagehaskellorg_is_do... * http://blog.haskell.org/post/4/outages_and_improvements.../ * Searchs ghc-devs@ for posts regarding Phabricator updates, Server moves, problems with arc... (not exactly all downtimes, but in effect of the incidents are the same) I am not saying that the haskell.org infrastructure is bad, far from it, but it would be an illusion to think that it has a much higher effective uptime than GitHub. Furthermore: I don't think that the argument should revolve around uptime. We have a distributed version control system where people can happily work for an extended time span without *any* network at all, and the GHC source repository is not a financial application which would cause the loss of millions of dollars per minute if it's temporarily unavailable. The arguments should be about simplicity, ease of use, etc. Anyway, for my part the discussion is over, there *is* more or less open hostility towards GitHub/more standardized environments here. Is it an instance of the common "not invented here" syndrome or general mistrust in any kind of organization? I don't know... :-/

Sven Panne
2017-12-19 12:47 GMT+01:00 Phyx
: Cool, then let's turn to media reports then such as https://techcrunch.com/2017/07/31/github-goes-down-and-takes-developer- productivity-with-it/ do you have one for git.haskell.org going down?
Of course this question is a classic example of "the absence of evidence is not the evidence of absence" fallacy, but anyway:
* https://www.reddit.com/r/haskell/comments/4gppm8/ann_hackagehaskellorg_is_do... * http://blog.haskell.org/post/4/outages_and_improvements.../ * Searchs ghc-devs@ for posts regarding Phabricator updates, Server moves, problems with arc... (not exactly all downtimes, but in effect of the incidents are the same)
I am not saying that the haskell.org infrastructure is bad, far from it, but it would be an illusion to think that it has a much higher effective uptime than GitHub. Furthermore: I don't think that the argument should revolve around uptime. We have a distributed version control system where people can happily work for an extended time span without *any* network at all, and the GHC source repository is not a financial application which would cause the loss of millions of dollars per minute if it's temporarily unavailable. The arguments should be about simplicity, ease of use, etc.
Anyway, for my part the discussion is over, there *is* more or less open hostility towards GitHub/more standardized environments here. Is it an instance of the common "not invented here" syndrome or general mistrust in any kind of organization? I don't know... :-/
I'm not sure that it's either of these; rather I think GHC is simply a large project with a rather distinct set of needs than most smaller FOSS projects. It is not at all uncommon for large projects to have their own infrastructure: LLVM, GCC, golang, GNOME, KDE, the Linux kernel, blender, firefox, FreeBSD, and many others all use their own infrastructure for code review, issue tracking, code hosting or all three. We are quite far from being alone in this regard. For what it's worth, I'm not necessarily opposed to moving hosting of GHC's repositories to GitHub, GitLab or nearly any other hosting solution assuming that a few things can be ensured: * Trac notifications continue to work * Commits containing bad submodule references don't make it into the tree * We have a means of controlling who can push to which branch namespaces * We don't need to manually synchronize contributor keys to/from Phabricator Note that moving code review or issue tracking to GitHub is a much different question and I think there are good reasons to be skeptical of such proposals, especially in the case of the issue tracking. Regardless, I suspect this is a discussion best had on the devops list. Cheers, - Ben

I think, what Sven is getting at here —and I do have to say, I concur— is that there is a bit of NIH (Not Invented Here) syndrome in parts of the Haskell community. I think, part of it is just inertia and the desire to keep things the same, because that is easier and more familiar. One aspect that complicates this discussion significantly is that GHC dev has developed certain work arounds and ways of doing things, where third party infrastructure seems lacking in features, because it doesn’t support all these quirks. However, it turns out that if we are only prepared to change our workflow and processes to align with modern software development practices, many of theses ”features” aren’t actually necessary. We have seen quite a bit of that in the CI discussion. I am not writing this to blame anything or anybody. I think, it is a normal part of a healthy process of change. However, it complicates the discussion as people get hung up on individual technicalities, such as this or that feature is missing, without considering the big picture. Generally, I think, a worthwhile golden rule in ops is that custom infrastructure is bad. It creates extra work, technical debt, and failure points. So, IMHO the default ought to be to use 3rd part infrastructure (like GitHub) and only augment that where absolutely necessary. This will simply leave us with more time to write Haskell code in GHC instead of building, maintaining, and supporting GHC infrastructure. Cheers, Manuel
19.12.2017 20:47 Simon Peyton Jones via ghc-devs
: It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Speaking for myself, I have no hostility towards GitHub, and there is no GHC-HQ bias against it that I know of. If it serves the purpose better, we should use it. Indeed that’s why I asked my original question. I agree with your point that data may actually be safer in GitHub than in our own repo. (And there is nothing to stop a belt-and-braces mirror backup system.)
The issue is: does GitHub serve the purpose better? We have frequently debated this multi-dimensional question. And we should continue to do so: the answers may change over time (GitHub’s facilities are not static; and its increasing dominance is itself a cultural familiarity factor that simply was not the case five years ago).
Simon
From: Sven Panne [mailto:svenpanne@gmail.com] Sent: 19 December 2017 09:30 To: Herbert Valerio Riedel
Cc: Simon Peyton Jones ; ghc-devs@haskell.org Devs Subject: Re: Can't push to haddock 2017-12-19 9:50 GMT+01:00 Herbert Valerio Riedel
mailto:hvriedel@gmail.com>: We'd need mirroring anyway, as we want to keep control over our infrastructure and not have to trust a 3rd party infrastructure to safely handle our family jewels: GHC's source tree.
I think this is a question of perspective: Having the master repository on GitHub doesn't mean you are in immediate danger or lose your "family jewels". IMHO it's quite the contrary: I'm e.g. sure that in case that something goes wrong with GitHub, there is far more manpower behind it to fix that than for any self-hosted repository. And you can of course have some mirror of your GitHub repo in case of e.g. an earthquake/meteor/... in the San Francisco area... ;-)
It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Also, catching bad commits "a bit later" is just asking for trouble -- by the time they're caught the git repos have already lost their invariant and its a big mess to recover;
This is by no means different than saying: "I want to run 'validate' in the commit hook, otherwise it's a big mess." We don't do this for obvious reasons, and what is the "big mess" if there is some incorrect submodule reference for a short time span? How is that different from somebody introducing e.g. a subtle compiler bug in a commit?
the invariant I devised and whose validation I implemented 4 years ago has served us pretty well, and has ensured that we never glitched into incorrectness; I'm also not sure why it's being suggested to switch to a less principled and more fragile scheme now. [...]
Because the whole repository structure is overly complicated and simply hosting everything on GitHub would simplify things. Again: I'm well aware that there are tradeoffs involved, but I would really appreciate simplifications. I have the impression that the entry barrier to GHC development has become larger and larger over the years, partly because of very non-standard tooling, partly because of the increasingly arcane repository organization. There are reasons that other projects like Rust attract far more developers... :-/
</GrumpyMode>
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Tue, Dec 19, 2017, 09:32 Sven Panne
2017-12-19 9:50 GMT+01:00 Herbert Valerio Riedel
: We'd need mirroring anyway, as we want to keep control over our infrastructure and not have to trust a 3rd party infrastructure to safely handle our family jewels: GHC's source tree.
I think this is a question of perspective: Having the master repository on GitHub doesn't mean you are in immediate danger or lose your "family jewels". IMHO it's quite the contrary: I'm e.g. sure that in case that something goes wrong with GitHub, there is far more manpower behind it to fix that than for any self-hosted repository. And you can of course have some mirror of your GitHub repo in case of e.g. an earthquake/meteor/... in the San Francisco area... ;-)
It seems to me that there is some hostility towards GitHub in GHC HQ, but I don't really understand why. GitHub serves other similar projects quite well, e.g. Rust, and I can't see why we should be special.
Rust and Roslyn which also uses github both have essentially replicated phabricator features to github to make things manageable. People often ignore this on this off-handed remark that Rust uses github. This https://github.com/rust-lang-nursery/rust-forge/blob/master/infrastructure.m... is part of the changes rust which has the backing of a major sponsor has to maintain to even start handling github. And I point out we have all of those just build into phabricator. And of all the tools I've used. Github has by far the worst interface to do code reviews. It's handling of rebases which will wipe all existing review comments when you push (collapsing them into oblivion) is very problematic. I'm not even sure they fixed the bug that pushing a later PR with the same branch name as an existing PR will permanently remove all review comments from the older PR. We're not special, we just don't want to trade a superior tool for a more popular but inferior one. Aside from being popular. Does github objectively have on redeeming feature?
Also, catching bad commits "a bit later" is just asking for trouble -- by the time they're caught the git repos have already lost their invariant and its a big mess to recover;
This is by no means different than saying: "I want to run 'validate' in the commit hook, otherwise it's a big mess." We don't do this for obvious reasons, and what is the "big mess" if there is some incorrect submodule reference for a short time span? How is that different from somebody introducing e.g. a subtle compiler bug in a commit?
the invariant I devised and whose validation I implemented 4 years ago has served us pretty well, and has ensured that we never glitched into incorrectness; I'm also not sure why it's being suggested to switch to a less principled and more
fragile scheme now. [...]
Because the whole repository structure is overly complicated and simply hosting everything on GitHub would simplify things. Again: I'm well aware that there are tradeoffs involved, but I would really appreciate simplifications. I have the impression that the entry barrier to GHC development has become larger and larger over the years, partly because of very non-standard tooling, partly because of the increasingly arcane repository organization. There are reasons that other projects like Rust attract far more developers... :-/ </GrumpyMode>
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Tue, Dec 19, 2017 at 4:30 AM, Sven Panne
I think this is a question of perspective: Having the master repository on GitHub doesn't mean you are in immediate danger or lose your "family jewels". IMHO it's quite the contrary: I'm e.g. sure that in case that something goes wrong with GitHub, there is far more manpower behind it to fix that than for any self-hosted repository. And you can of course have some mirror of your GitHub repo in case of e.g. an earthquake/meteor/... in the San Francisco area... ;-)
You're also assuming github doesn't suddenly pull a SourceForge (or a Gitorious for that matter). Business cares not what it steamrolls in the name of profit. I fail to understand why, with multiple examples of the folly of this belief out there, people are still willing to bet on *this* company being *different* from all others and absolutely safe to trust. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (7)
-
Ben Gamari
-
Brandon Allbery
-
Herbert Valerio Riedel
-
Manuel M T Chakravarty
-
Phyx
-
Simon Peyton Jones
-
Sven Panne