Re: [Haskell-cafe] Improvements to package hosting and security

This is a valid concern. One that I should have addressed explicitly
in the proposal. Git is fairly well supported on Windows these days
and installs easily. It could conceivably be included as part of
MinGHC. There are many alternatives, but I doubt we'll need them:
statically linking a C implementation (libgit2 or another), or a
simple native implementation of the git protocol (the protocol is
quite straightforward and is documented) and basic disk format.
The same is true about GnuPG, via gpg4win, though note that under this
proposal GnuPG wouldn't be a requirement for `cabal update` to work.
Just an additional optional dependency which you'll want to have
installed if you want to protect yourself from the attacks listed in
the proposal.
By the way, one side note about this Git proposal: it sides steps the
discussion around how to add SSL support to cabal-install entirely.
Since Git understands (among others) HTTPS natively, so we can
outsource our support for that to Git. In any case SSL no longer
becomes a necessity for protecting against MITM (the commit signing
takes care of that), only a nice-to-have for privacy.
On 28 April 2015 at 19:46,
To me, the elephant in the room is how the dependency on Git will be handled. I'm not a Windows user, but how much more painful will it be to set up a Haskell environment on Windows with a new dependency on Git? Will users need to install it separately, or do you suggest embedding Git into the relevant tools? Should the Haskell Platform bundle it? What about MinGHC? Oh, and I guess the same question can be asked about GnuPG.
I personally use a Mac and Homebrew, so it's pretty easy for me to install those dependencies, and I'm sure the same is true on Linux. But also, not everyone uses Homebrew (in fact, I'm sure most programmers on Macs don't use it), so it's also worth considering whether the requisite tools should be embedded in the "GHC for Mac OS X" distribution.
On Linux this probably isn't an issue because pretty much everyone has a decent dependency-tracking package manager.
I don't know if you care personally about these issues, but I think any proposal which introduces new dependencies to the core development environment of Haskell should take it into consideration. Very few people have Git and GPG already installed, and I think the new-user experience should be considered, and I'm surprised nobody has mentioned it in this entire thread (unless I missed it).
-- radix (Christopher Armstrong)
P.S. I'm very excited to see this work, including the emphasis on using the well-researched TUF. Thanks to you and other people working on this. :)
On Tuesday, April 28, 2015 at 5:07:56 AM UTC-5, Mathieu Boespflug wrote:
Hi all,
last week, I found some time to write up a very simple proposal that addresses the following goals simultaneously:
- maintain a difficult to forge public audit log of Hackage updates; - make downloads from Hackage mirrors just as trustworthy as downloading from Hackage itself; - guarantee that `cabal update` is always pulling the freshest package index (called "snapshots" in the proposal), and detect when this might not be the case; - implement the first half of TUF (namely the index signing part discussed in Duncan's blog post, not the author package signing part) with fewer metadata files and in a way that reuses existing tooling; - get low-implementation-cost, straightforward and incremental `cabal update`.
After a preliminary review from a few colleagues and friends in the community, here is the proposal, in the form of Commercial Haskell wiki page:
https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hacka...
The design constraints here are:
- stay backwards compatible where the cost for doing so is low. - reuse existing tooling and mechanisms, especially when it comes to key management, snapshot identity, and distributing signatures. - Focus on the above 5 goals only, because they happen to all be solvable by changing a single piece of mechanism. But strive to reuse whatever mechanism others are proposing to solve other goals (e.g. certification of provenance using author package signing, as Chris Done has already proposed).
To that effect, the tl;dr is that I'm proposing that we just use Git for maintaining the Hackage package index, that we use Git for synchronizing this locally, and that we use Git commit signatures for implementing the first half of TUF. The Git tooling currently assumes GnuPG keys for signatures, so I'm proposing that we use GnuPG keys for signing, and that we manage key revocation and any trust delegation between keys using GnuPG and its existing infrasture.
I estimate the total effort necessary here to be the equivalent of 5-6 full time days overall. However, I have not pooled the necessary resources to carry that out yet. I'd like to get feedback first before going ahead with this, but in meantime,
** if there are any volunteers that would like to signal their intent to help with the implementation effort then please add your name at the bottom of the wiki page. **
Best,
Mathieu
On 18 April 2015 at 20:11, Michael Snoyman
wrote: On Sat, Apr 18, 2015 at 12:20 AM Bardur Arantsson
wrote: On 17-04-2015 10:17, Michael Snoyman wrote:
This is a great idea, thank you both for raising it. I was discussing something similar with others in a text chat earlier this morning. I've gone ahead and put together a page to cover this discussion:
https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/...
The document definitely needs more work, this is just meant to get the ball rolling. As usual with the commercialhaskell repo, if anyone wants edit access, just request it on the issue tracker. Or most likely, send a PR and you'll get a commit bit almost magically ;)
Thank you. Just to make sure that I understand -- is this page only meant to cover the original "strawman proposal" at the start of this thread, or...?
Maybe you intend for this to be extended in a detailed way under the "Long-term solutions" heading?
I was imagining a wiki page which could perhaps start out by collecting all the currently identified possible threats in a table, and then all "participants" could perhaps fill in how their suggestion addresses those threats (or tell us why we shouldn't care about this particular threat). Of course other relevent non-threat considerations might be relevant to add to such a table, such as: how prevalent is the software/idea we're basing this on? does this have any prior implementation (e.g. the append-to-tar and expect that web servers will behave sanely thing)? etc.
(I realize that I'm asking for a lot of work, but I think it's going to be necessary, at least if there's going to be consensus and not just a de-facto "winner".)
Hi Bardur,
I don't think I have any different intention for this page than you've identified. In fact, I thought that I had clearly said exactly what you described when I said:
There are various ideas at play already. The bullets are not intended to be full representations of the proposals, but rather high level summaries. We should continue to expand this page with more details going forward.
If this is unclear somehow, please tell me. But my intention absolutely is that many people can edit this page to add their ideas and we can flesh out a complete solution.
Michael
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

[removing erroneous haskell-cafe@googlegroups.com from To list.]
On 28 April 2015 at 23:07, Mathieu Boespflug
This is a valid concern. One that I should have addressed explicitly in the proposal. Git is fairly well supported on Windows these days and installs easily. It could conceivably be included as part of MinGHC. There are many alternatives, but I doubt we'll need them: statically linking a C implementation (libgit2 or another), or a simple native implementation of the git protocol (the protocol is quite straightforward and is documented) and basic disk format.
The same is true about GnuPG, via gpg4win, though note that under this proposal GnuPG wouldn't be a requirement for `cabal update` to work. Just an additional optional dependency which you'll want to have installed if you want to protect yourself from the attacks listed in the proposal.
By the way, one side note about this Git proposal: it sides steps the discussion around how to add SSL support to cabal-install entirely. Since Git understands (among others) HTTPS natively, so we can outsource our support for that to Git. In any case SSL no longer becomes a necessity for protecting against MITM (the commit signing takes care of that), only a nice-to-have for privacy.
On 28 April 2015 at 19:46,
wrote: To me, the elephant in the room is how the dependency on Git will be handled. I'm not a Windows user, but how much more painful will it be to set up a Haskell environment on Windows with a new dependency on Git? Will users need to install it separately, or do you suggest embedding Git into the relevant tools? Should the Haskell Platform bundle it? What about MinGHC? Oh, and I guess the same question can be asked about GnuPG.
I personally use a Mac and Homebrew, so it's pretty easy for me to install those dependencies, and I'm sure the same is true on Linux. But also, not everyone uses Homebrew (in fact, I'm sure most programmers on Macs don't use it), so it's also worth considering whether the requisite tools should be embedded in the "GHC for Mac OS X" distribution.
On Linux this probably isn't an issue because pretty much everyone has a decent dependency-tracking package manager.
I don't know if you care personally about these issues, but I think any proposal which introduces new dependencies to the core development environment of Haskell should take it into consideration. Very few people have Git and GPG already installed, and I think the new-user experience should be considered, and I'm surprised nobody has mentioned it in this entire thread (unless I missed it).
-- radix (Christopher Armstrong)
P.S. I'm very excited to see this work, including the emphasis on using the well-researched TUF. Thanks to you and other people working on this. :)
On Tuesday, April 28, 2015 at 5:07:56 AM UTC-5, Mathieu Boespflug wrote:
Hi all,
last week, I found some time to write up a very simple proposal that addresses the following goals simultaneously:
- maintain a difficult to forge public audit log of Hackage updates; - make downloads from Hackage mirrors just as trustworthy as downloading from Hackage itself; - guarantee that `cabal update` is always pulling the freshest package index (called "snapshots" in the proposal), and detect when this might not be the case; - implement the first half of TUF (namely the index signing part discussed in Duncan's blog post, not the author package signing part) with fewer metadata files and in a way that reuses existing tooling; - get low-implementation-cost, straightforward and incremental `cabal update`.
After a preliminary review from a few colleagues and friends in the community, here is the proposal, in the form of Commercial Haskell wiki page:
https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hacka...
The design constraints here are:
- stay backwards compatible where the cost for doing so is low. - reuse existing tooling and mechanisms, especially when it comes to key management, snapshot identity, and distributing signatures. - Focus on the above 5 goals only, because they happen to all be solvable by changing a single piece of mechanism. But strive to reuse whatever mechanism others are proposing to solve other goals (e.g. certification of provenance using author package signing, as Chris Done has already proposed).
To that effect, the tl;dr is that I'm proposing that we just use Git for maintaining the Hackage package index, that we use Git for synchronizing this locally, and that we use Git commit signatures for implementing the first half of TUF. The Git tooling currently assumes GnuPG keys for signatures, so I'm proposing that we use GnuPG keys for signing, and that we manage key revocation and any trust delegation between keys using GnuPG and its existing infrasture.
I estimate the total effort necessary here to be the equivalent of 5-6 full time days overall. However, I have not pooled the necessary resources to carry that out yet. I'd like to get feedback first before going ahead with this, but in meantime,
** if there are any volunteers that would like to signal their intent to help with the implementation effort then please add your name at the bottom of the wiki page. **
Best,
Mathieu
On 18 April 2015 at 20:11, Michael Snoyman
wrote: On Sat, Apr 18, 2015 at 12:20 AM Bardur Arantsson
wrote: On 17-04-2015 10:17, Michael Snoyman wrote:
This is a great idea, thank you both for raising it. I was discussing something similar with others in a text chat earlier this morning. I've gone ahead and put together a page to cover this discussion:
https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/...
The document definitely needs more work, this is just meant to get the ball rolling. As usual with the commercialhaskell repo, if anyone wants edit access, just request it on the issue tracker. Or most likely, send a PR and you'll get a commit bit almost magically ;)
Thank you. Just to make sure that I understand -- is this page only meant to cover the original "strawman proposal" at the start of this thread, or...?
Maybe you intend for this to be extended in a detailed way under the "Long-term solutions" heading?
I was imagining a wiki page which could perhaps start out by collecting all the currently identified possible threats in a table, and then all "participants" could perhaps fill in how their suggestion addresses those threats (or tell us why we shouldn't care about this particular threat). Of course other relevent non-threat considerations might be relevant to add to such a table, such as: how prevalent is the software/idea we're basing this on? does this have any prior implementation (e.g. the append-to-tar and expect that web servers will behave sanely thing)? etc.
(I realize that I'm asking for a lot of work, but I think it's going to be necessary, at least if there's going to be consensus and not just a de-facto "winner".)
Hi Bardur,
I don't think I have any different intention for this page than you've identified. In fact, I thought that I had clearly said exactly what you described when I said:
There are various ideas at play already. The bullets are not intended to be full representations of the proposals, but rather high level summaries. We should continue to expand this page with more details going forward.
If this is unclear somehow, please tell me. But my intention absolutely is that many people can edit this page to add their ideas and we can flesh out a complete solution.
Michael
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Hi, Mathieu Boespflug wrote:
This is a valid concern. One that I should have addressed explicitly in the proposal. Git is fairly well supported on Windows these days and installs easily. It could conceivably be included as part of MinGHC. There are many alternatives, but I doubt we'll need them: statically linking a C implementation (libgit2 or another), or a simple native implementation of the git protocol (the protocol is quite straightforward and is documented) and basic disk format.
I did not read your proposal, but if it entails that new Haskell users on Windows need to manually install git before they can use `cabal install something` for the first time, I think that would be bad. For programming beginners (think B.Sc. students in a field other than computer science that take an "intro to programming" class), every installation that requires manual configuration is a hassle. Making the cabal executable find the git executable on the path would potentially require manual configuration to set up the search path. I believe that both ghc and git binary packages for Windows package MSYS (or maybe something similar, not sure), so there is also some potential for a cabal+ghc+git installation to confuse which bundled copy of MSYS to use. Some of these "intro to programming" classes consist mostly of object-oriented programming, with a bit of FP thrown in. If helping the students to set up a Haskell environment on their laptops takes one lab session or one week of office hours, that's a significant cut from the FP learning time. If students fail their homework because they fail to install the Haskell environment, that takes a significant cut of their FP learning motivation. If instructors only teach plain Haskell without ever using cabal, this gives the impression that Haskell only works for classroom problems, because there seem to be no libraries. I'm aware that programming beginners are not the main target for a programming language infrastructure, but we shouldn't forget about their first-use experience completely, either. I'm not even sure whether "statically linking a C implementation" is any better. How would it support `cabal install cabal-install` on Windows, in practice? Tillmann

I think the idea is that pckage signing is not a requirement, but that git is a requirement for package signing. So users can still get the behavior that they get today, without git.
Tom
El Apr 30, 2015, a las 5:43, Tillmann Rendel
Hi,
Mathieu Boespflug wrote:
This is a valid concern. One that I should have addressed explicitly in the proposal. Git is fairly well supported on Windows these days and installs easily. It could conceivably be included as part of MinGHC. There are many alternatives, but I doubt we'll need them: statically linking a C implementation (libgit2 or another), or a simple native implementation of the git protocol (the protocol is quite straightforward and is documented) and basic disk format.
I did not read your proposal, but if it entails that new Haskell users on Windows need to manually install git before they can use `cabal install something` for the first time, I think that would be bad.
For programming beginners (think B.Sc. students in a field other than computer science that take an "intro to programming" class), every installation that requires manual configuration is a hassle. Making the cabal executable find the git executable on the path would potentially require manual configuration to set up the search path. I believe that both ghc and git binary packages for Windows package MSYS (or maybe something similar, not sure), so there is also some potential for a cabal+ghc+git installation to confuse which bundled copy of MSYS to use.
Some of these "intro to programming" classes consist mostly of object-oriented programming, with a bit of FP thrown in. If helping the students to set up a Haskell environment on their laptops takes one lab session or one week of office hours, that's a significant cut from the FP learning time. If students fail their homework because they fail to install the Haskell environment, that takes a significant cut of their FP learning motivation. If instructors only teach plain Haskell without ever using cabal, this gives the impression that Haskell only works for classroom problems, because there seem to be no libraries.
I'm aware that programming beginners are not the main target for a programming language infrastructure, but we shouldn't forget about their first-use experience completely, either.
I'm not even sure whether "statically linking a C implementation" is any better. How would it support `cabal install cabal-install` on Windows, in practice?
Tillmann _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Hi, [I decided to drop haskell-infrastructure@community.galois.com from the CC list because for my last message in this thread, I got some noise about moderation]. amindfv@gmail.com wrote:
I think the idea is that package signing is not a requirement, but that git is a requirement for package signing. So users can still get the behavior that they get today, without git.
So there would be `cabal update --unsigned` and `cabal update --signed` and the former doesn't need git? I skimmed the the proposal at https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hacka... and did not find this information there. Instead, I found this snippet:
Especially in developing countries, it would be a real liability for Haskell if the first step before doing anything is having to download a 1GB Git archive. Especially considering that given the current growth curve, the Git repository with all content imported will likely be hitting 2GB by this time next year, and so on.
This sounds as if for all Haskell users, "the first step before doing anything" would have to be to use git. Tillmann PS. BTW, check out this stack overflow question to understand why installing and configuring git will be hard for some Haskell users on Windows: http://stackoverflow.com/questions/30000688/windows-loading-haskell-source-c...

There's actually a really easy solution to "have Git installed": bundle it with MinGHC. Another alternative would be to use one of the many libraries out there that can talk the Git wire protocols. In fact, if anyone is worried about the standard Git tool not being secure enough (either due to C code or some other reason), we could have a Haskell-based Git implementation that focuses on security. There would still be big advantages to using the Git protocol in that case, such as a well understood protocol to work against and easy interop with existing tools. That said, I think bundling the necessary Git tooling with MinGHC is an easy win. On Sat, May 2, 2015 at 1:44 PM Tillmann Rendel < rendel@informatik.uni-tuebingen.de> wrote:
Hi,
[I decided to drop haskell-infrastructure@community.galois.com from the CC list because for my last message in this thread, I got some noise about moderation].
amindfv@gmail.com wrote:
I think the idea is that package signing is not a requirement, but that git is a requirement for package signing. So users can still get the behavior that they get today, without git.
So there would be `cabal update --unsigned` and `cabal update --signed` and the former doesn't need git?
I skimmed the the proposal at
https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hacka...
and did not find this information there. Instead, I found this snippet:
Especially in developing countries, it would be a real liability for Haskell if the first step before doing anything is having to download a 1GB Git archive. Especially considering that given the current growth curve, the Git repository with all content imported will likely be hitting 2GB by this time next year, and so on.
This sounds as if for all Haskell users, "the first step before doing anything" would have to be to use git.
Tillmann
PS. BTW, check out this stack overflow question to understand why installing and configuring git will be hard for some Haskell users on Windows:
http://stackoverflow.com/questions/30000688/windows-loading-haskell-source-c... _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Hi, Michael Snoyman wrote:
That said, I think bundling the necessary Git tooling with MinGHC is an easy win.
Agreed. I mostly want to lobby for actually bundling it (properly, see below) instead of merely hand-waiving about how easy it is to install git on Windows.
here's actually a really easy solution to "have Git installed": bundle it with MinGHC
This solution is certainly possible, but I'm not so sure whether it is *really easy*. From my perspective, MinGHC+git should be able to coexist on a system with some other system that bundles git, say, FOO+git, and/or just a copy of git that the user installed. (Otherwise, installing git after installing MinGHC+git would break MinGHC+git which would be unfortunate, wouldn't it?) Now how should the various copies of git interact? - should they share a configuration file? - should they use the same shell? - should they ever call each other? I'm imaging a search-path-tweaking nightmare to get this to work. For example, what if a user sets up `git bisect` to call `cabal update` (as part of some larger script) which in turns would call `git whatever` to update the index. Presumably, that should be different copies of git involved. But maybe cabal would need only some low-level git stuff which don't interact with user configuration or use the shell at all? That would make things easier. I'm not sure how valid my concerns here are, but I'm not convinced by "Git is fairly well supported on Windows these days and installs easily." Tillmann

We've implemented a fair amount of the Git-facing stuff here already (see the stackage-update package). It just needs to clone from an https Git repo on Github and fetch from the same repo. Signature verification is also possible, but not necessary to get an improvement over the current state of affairs. So I don't think interacting with user configs will be a blocker here. That said, I agree with the main thrust of your email: seeing is believing. Let's add Git and GPG to MinGHC and see where that puts us. As usual, more hands help out with getting these things done quicker, so if someone wants to get involved, let me know. But I expect this improvement to happen some time this month. On Tue, May 5, 2015 at 1:04 PM Tillmann Rendel < rendel@informatik.uni-tuebingen.de> wrote:
Hi,
Michael Snoyman wrote:
That said, I think bundling the necessary Git tooling with MinGHC is an easy win.
Agreed. I mostly want to lobby for actually bundling it (properly, see below) instead of merely hand-waiving about how easy it is to install git on Windows.
here's actually a really easy solution to "have Git installed": bundle it with MinGHC
This solution is certainly possible, but I'm not so sure whether it is *really easy*. From my perspective, MinGHC+git should be able to coexist on a system with some other system that bundles git, say, FOO+git, and/or just a copy of git that the user installed. (Otherwise, installing git after installing MinGHC+git would break MinGHC+git which would be unfortunate, wouldn't it?)
Now how should the various copies of git interact?
- should they share a configuration file? - should they use the same shell? - should they ever call each other?
I'm imaging a search-path-tweaking nightmare to get this to work. For example, what if a user sets up `git bisect` to call `cabal update` (as part of some larger script) which in turns would call `git whatever` to update the index. Presumably, that should be different copies of git involved.
But maybe cabal would need only some low-level git stuff which don't interact with user configuration or use the shell at all? That would make things easier.
I'm not sure how valid my concerns here are, but I'm not convinced by "Git is fairly well supported on Windows these days and installs easily."
Tillmann

On 5 May 2015 at 12:04, Tillmann Rendel
Hi,
Michael Snoyman wrote:
That said, I think bundling the necessary Git tooling with MinGHC is an easy win.
Agreed. I mostly want to lobby for actually bundling it (properly, see below) instead of merely hand-waiving about how easy it is to install git on Windows.
Or maybe just provide ghc for [msys2](https://github.com/msys2)? /M -- Magnus Therning OpenPGP: 0xAB4DFBA4 email: magnus@therning.org jabber: magnus@therning.org twitter: magthe http://therning.org/magnus

Hi Tilman,
we should certainly have all Haskell users in mind in this discussion,
including beginners, and of course including Windows users.
At the end of the day, MinGHC is the recommended way to get Haskell on
Windows. It is what haskell.org points to. It includes msys, which is
also the main prereq for git on windows. Adding git would grow the
archive size by about 5MB (adding Perl not required) to an archive
that is 125MB in size. So I don't see Git being a problem on Windows.
The more general point here is whether leveraging (arguably standard)
third-party commands and/or C code in order to keep our maintainance
burden low, and pick up many robust features for free to boot, is a
good approach. I believe that it is. Our infrastructure and tooling is
cracking at the seams as it is (cabal-install mysteriously dropping
HTTP connections and corrupting .cabal files when behind a corporate
firewall, updates to hackage-server inadvertently reversing the order
of revisions, low availability of Hackage, ...). Leveraging Git would
solve all mentioned problems, plus give us incremental updates for
free, plus give us package index signing for little effort.
To me that sounds like a pretty big win for 99% of users (including
Windows + OS X), at a *lower* extra maintenance cost for the community
today since we would need to maintain less code. The added reliability
and availability should ultimately benefit beginner users, who can
find it pretty confounding when following instructions just don't
work, because temp glitch, when they've never seen it work before.
Now, we are hearing of use cases where an extra 57MB (including perl,
which should actually be optional) unpacked is a liability for
minimalistic server images. I fully expect other niche use cases that
would prefer a different technical solution. But if we use a de facto
standard format for histories and for distributing signatures, then we
can support multiple ways of accessing and manipulating it, including
via a custom haskell (or existing C) git downloader if users of the
niche use cases deem the cost worth it.
Also, one can envision specialized mirrors where appropriate for
certain niche use cases. Why should the "canonical" (or upstream)
source for the package index be served via Git then? Because I believe
it makes the common case code path on Hackage simpler, and the end
user tools people use in the common case simpler, including for
advanced features like index signing, which we get nearly for free
once we switch to Git. Keep the common case simple, make niche cases
possible without complicating the common case.
That's what the Julia and the Ocaml folks have been betting on, and
for having tried out their tools just recently, they've ended up with
tooling that's arguably quite a bit more user friendly than we have.
On 2 May 2015 at 12:43, Tillmann Rendel
Hi,
[I decided to drop haskell-infrastructure@community.galois.com from the CC list because for my last message in this thread, I got some noise about moderation].
amindfv@gmail.com wrote:
I think the idea is that package signing is not a requirement, but that git is a requirement for package signing. So users can still get the behavior that they get today, without git.
So there would be `cabal update --unsigned` and `cabal update --signed` and the former doesn't need git?
I skimmed the the proposal at
https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hacka...
and did not find this information there. Instead, I found this snippet:
Especially in developing countries, it would be a real liability for Haskell if the first step before doing anything is having to download a 1GB Git archive. Especially considering that given the current growth curve, the Git repository with all content imported will likely be hitting 2GB by this time next year, and so on.
This sounds as if for all Haskell users, "the first step before doing anything" would have to be to use git.
Tillmann
PS. BTW, check out this stack overflow question to understand why installing and configuring git will be hard for some Haskell users on Windows:
http://stackoverflow.com/questions/30000688/windows-loading-haskell-source-c...

Mathieu Boespflug-2 wrote
Now, we are hearing of use cases where an extra 57MB (including perl, which should actually be optional) unpacked is a liability for minimalistic server images.
Debian's git packaging requires perl. Maybe https://hackage.haskell.org/package/gitlib would be a suitable alternative to the full package? -- View this message in context: http://haskell.1045720.n5.nabble.com/Improvements-to-package-hosting-and-sec... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

On May 3, 2015 at 5:14:58 AM, Mathieu Boespflug (mboes@tweag.net) wrote:
The more general point here is whether leveraging (arguably standard) third-party commands and/or C code in order to keep our maintainance burden low, and pick up many robust features for free to boot, is a good approach. I believe that it is. Our infrastructure and tooling is cracking at the seams as it is (cabal-install mysteriously dropping HTTP connections and corrupting .cabal files when behind a corporate firewall, updates to hackage-server inadvertently reversing the order of revisions, low availability of Hackage, ...). Leveraging Git would solve all mentioned problems, plus give us incremental updates for free, plus give us package index signing for little effort.
This seems to me to be mixing apples and oranges and pears and artichokes. The primary reason for hackage downtime in the past was instability of our hetzner box. Migrating to rackspace did wonders for us. Regardless, choice of hardware/webhost is orthogonal to git. You then list a logic bug in a version of hackage server. Logic bugs, as I’m sure you’re aware, can be introduced anywhere that code is written. No proposal put forward involves not writing and running code, so while we can work on better regression test suites, code-review procedures, etc., this has very little to do with adoption of git (especially as I understand the reverse in revision order was a bug on _display_ which this proposal doesn’t address at all). Finally, you discuss cabal-install having trouble behind firewalls. I agree with this being a problem, and I want us to work on this. However, git is again not a magic bullet. I’ve had firewalls where I run into trouble with git too, or with mercurial, or where certain website/firewall combinations meant, mysteriously that curl would work but not wget, or vice versa. I think the plans to expand the choice of transports for cabal-install will improve things in this regard, and could in fact lay the basis for adding git as an additional transport as well. In summary: You point to real problems that have occurred (of them, only the first [firewalls] is an ongoing issue). There are many other problems you did not point to, but that are also problems, and remain problems. Moving to git as a transport could potentially address some problems, with certain other tradeoffs in terms of other tooling choices we would have to make. However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows). Cheers, Gershom

It takes apples, oranges, pears and artichokes and then some to stay healthy and keep the doctor away. That's why Git, in any capacity, isn't a silver bullet to solve all problems. However, let's break down how the envisioned setup helps with the problems I mentioned: - cabal-install mysteriously dropping HTTP connections and corrupting .cabal files: this particular firewall that I've seen is used by hundreds of developers in the company without it silently truncating requests on anything else but Cabal updates. Investigations so far point to a bad interaction between Network.HTTP and lazy bytestrings, see http://www.hawaga.org.uk/tmp/bug-cabal-zlib-http-lazy.html (no bug report just yet). Reusing the same download mechanism that hundreds of others are already using in the company means we are not at risk of a firewall triggering an obscure latent race condition in the way cabal-install retrieves HTTP responses. It means if there is a real problem with the firewall, it won't just be for the local Haskellian outpost who are trying to sell Haskell to their boss, but for everyone, and therefore fixed. - the reversing revisions issue was NOT just a display issue: it completely broke Stackage Nightly builds that day, which just calls `cabal update` under the hood: https://github.com/haskell/hackage-server/issues/305. Other users of Hackage in that time window also experienced the issue. It's an issue that caused massive breakage in a lot of places. Notice how PkgInfo_v2 is a data structure that is entirely redundant with what Git would provide already, so need not be serialized to disk, have migrations written for it, etc, nor perhaps exist at all. Further, Git would have made it quite impossible to distribute what amounts to a rewritten and inadvertently tampered with history (because the clients would have noticed and refused to fast forward). Fewer pieces of state managed independently + less code = more reliable service. - low availability of hackage: indeed, hosting issues have been a major culprit here. But the point is, with the package database maintained as a Git repo separate from hackage-server, the repo can be served from any (highly available) Git provider, such as Github or Bitbucket, and continue to be served to clients even if the Hackage front page is down for whatever reason. Hosting we don't have to manage ourselves is hosting we don't have to keep humming. Of course no service guarantees 100% uptime, so mirrors are a key additional (or alternative) ingredient here. Efficient, low-latency and reliable mirroring is certainly possible by other means, but mirroring a history of changes is exactly what Git was designed for, and what it does well. Why reinvent that?
However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows).
That's an interesting concern, though without knowing more, this is
not an actionable issue. What difficulties? If MinGHC packaged
Git+gpg4win, what would the issue be?
Best,
Mathieu
On 4 May 2015 at 03:12, Gershom B
On May 3, 2015 at 5:14:58 AM, Mathieu Boespflug (mboes@tweag.net) wrote:
The more general point here is whether leveraging (arguably standard) third-party commands and/or C code in order to keep our maintainance burden low, and pick up many robust features for free to boot, is a good approach. I believe that it is. Our infrastructure and tooling is cracking at the seams as it is (cabal-install mysteriously dropping HTTP connections and corrupting .cabal files when behind a corporate firewall, updates to hackage-server inadvertently reversing the order of revisions, low availability of Hackage, ...). Leveraging Git would solve all mentioned problems, plus give us incremental updates for free, plus give us package index signing for little effort.
This seems to me to be mixing apples and oranges and pears and artichokes. The primary reason for hackage downtime in the past was instability of our hetzner box. Migrating to rackspace did wonders for us. Regardless, choice of hardware/webhost is orthogonal to git. You then list a logic bug in a version of hackage server. Logic bugs, as I’m sure you’re aware, can be introduced anywhere that code is written. No proposal put forward involves not writing and running code, so while we can work on better regression test suites, code-review procedures, etc., this has very little to do with adoption of git (especially as I understand the reverse in revision order was a bug on _display_ which this proposal doesn’t address at all). Finally, you discuss cabal-install having trouble behind firewalls. I agree with this being a problem, and I want us to work on this. However, git is again not a magic bullet. I’ve had firewalls where I run into trouble with git too, or with mercurial, or where certain website/firewall combinations meant, mysteriously that curl would work but not wget, or vice versa. I think the plans to expand the choice of transports for cabal-install will improve things in this regard, and could in fact lay the basis for adding git as an additional transport as well.
In summary: You point to real problems that have occurred (of them, only the first [firewalls] is an ongoing issue). There are many other problems you did not point to, but that are also problems, and remain problems. Moving to git as a transport could potentially address some problems, with certain other tradeoffs in terms of other tooling choices we would have to make. However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows).
Cheers, Gershom

On May 4, 2015 at 4:42:05 AM, Mathieu Boespflug (mboes@tweag.net) wrote:
- cabal-install mysteriously dropping HTTP connections and corrupting .cabal files: this particular firewall that I've seen is used by hundreds of developers in the company without it silently truncating requests on anything else but Cabal updates. Investigations so far point to a bad interaction between Network.HTTP and lazy bytestrings, see http://www.hawaga.org.uk/tmp/bug-cabal-zlib-http-lazy.html (no bug report just yet). Reusing the same download mechanism that hundreds of others are already using in the company means we are not at risk of a firewall triggering an obscure latent race condition in the way cabal-install retrieves HTTP responses. It means if there is a real problem with the firewall, it won't just be for the local Haskellian outpost who are trying to sell Haskell to their boss, but for everyone, and therefore fixed.
Yes, in this particular case, clearly using git is a transport that works and using HTTP is a transport that doesn’t. But as you note, this appears to be a problem with the firewall, not the HTTP library. You’re right that moving to a transport used more widely would help this problem. But, so would moving to curl apparently. In any case, as I wrote, the best way to address this is to make ourselves more generally flexible in our transport layer — and the way to do this is not to swap the HTTP library simply for git, but to open up our choices more broadly. Which is precisely the plan already under discussion with regards to Cabal. Git is no magic bullet here. It is just “anything besides the current thing that happens to trigger a specific bug in a specific firewall."
- the reversing revisions issue was NOT just a display issue: it completely broke Stackage Nightly builds that day, which just calls `cabal update` under the hood: https://github.com/haskell/hackage-server/issues/305. Other users of Hackage in that time window also experienced the issue. It's an issue that caused massive breakage in a lot of places. Notice how PkgInfo_v2 is a data structure that is entirely redundant with what Git would provide already, so need not be serialized to disk, have migrations written for it, etc, nor perhaps exist at all. Further, Git would have made it quite impossible to distribute what amounts to a rewritten and inadvertently tampered with history (because the clients would have noticed and refused to fast forward). Fewer pieces of state managed independently + less code = more reliable service.
Ah I see — they were flipped in the migration, not just in the display of the data. Regardless — there will always be a layer between our data storage — be it git, acid-state, database, anything else — and the programmatic use we make of that data. No matter what we do to that storage layer, the intermediate layer will need to turn that into a programmatic representation, and then the frontend services will need to display/make use of it. No matter what, there is always room for such bugs. You might say “but the server couldn’t cause such a bug in this system!” That’s silly — the deserialization from that storage layer will just take place later then — at each client. And they could cause such a bug. So yes, the literal place the bug was found is in code that would be different under a different storage layer. But there’s absolutely nothing in switching storage layers that rules out such bugs. And furthermore, in the migration you propose, which involves taking all our data, pushing it into an entirely new representation, and then rewriting the entire hackage-server to talk to this new representation at all stages, and writing cabal-install to do the same — I promise that this would necessarily create a _whole lot_ of bugs. Again, there may be reasons to do this (I’m dubious) — but let’s not overstate them to sell the case.
Hosting we don't have to manage ourselves is hosting we don't have to keep humming. Of course no service guarantees 100% uptime, so mirrors are a key additional (or alternative) ingredient here. Efficient, low-latency and reliable mirroring is certainly possible by other means, but mirroring a history of changes is exactly what Git was designed for, and what it does well. Why reinvent that?
In the last case here, you say that mirroring is easier with git? But don’t we already have mirroring now? And haven’t we had it for some time? The work underway, to my knowledge, is only to make mirroring more secure (as a related consequence of making hackage in general more secure). So this seems a silly thing to raise.
However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows).
That's an interesting concern, though without knowing more, this is not an actionable issue. What difficulties? If MinGHC packaged Git+gpg4win, what would the issue be?
I can give you an example I ran into with MinGHC already — I had a preexisting cygwin install on my machine, and tried to install MinGHC. This mixed msys paths with cygwin paths and everything mismatched and was horrible until I ripped out those msys paths. But now, of course, my new GHC can’t find the libraries to build against for e.g. doing a network reinstall, which was the entire point of the exercise. By analogy, many windows users may have an existing git, and some may have an existing gpg. These may come from windows binaries (in a few flavors — direct, wrapped via tortoise, etc), from cygwin, or perhaps from another existing msys install. Now they’re going to get multiple copies of these programs on their system with potentially conflicting paths, settings, etc? (Same goes for gpg, but not git on mac). And since we won’t have guarantees that everyone will have git, we’ll need to maintain existing transports anyway, so this only gives us a very partial solution... I know there are some neat ideas in what you’re pushing for. But I feel like you’re overlooking all the potential issues — and also just underestimating the amount of work it would take to cut everything over to a new storage layer, on both front and backend, while keeping the set of existing features intact. —Gershom

One more point I realized -- switching to git as a transport _for the
package index_ isn't a general purpose solution to the transport problem.
Users also need a transport to download cabalized packages, and also to
upload them. (And, whenever we get distributed build-reports finished, to
upload those too, I suppose.) To my knowledge, the idea on the table
doesn't solve that?
-g
On Mon, May 4, 2015 at 9:55 AM, Gershom B
On May 4, 2015 at 4:42:05 AM, Mathieu Boespflug (mboes@tweag.net) wrote:
- cabal-install mysteriously dropping HTTP connections and corrupting .cabal files: this particular firewall that I've seen is used by hundreds of developers in the company without it silently truncating requests on anything else but Cabal updates. Investigations so far point to a bad interaction between Network.HTTP and lazy bytestrings, see http://www.hawaga.org.uk/tmp/bug-cabal-zlib-http-lazy.html (no bug report just yet). Reusing the same download mechanism that hundreds of others are already using in the company means we are not at risk of a firewall triggering an obscure latent race condition in the way cabal-install retrieves HTTP responses. It means if there is a real problem with the firewall, it won't just be for the local Haskellian outpost who are trying to sell Haskell to their boss, but for everyone, and therefore fixed.
Yes, in this particular case, clearly using git is a transport that works and using HTTP is a transport that doesn’t. But as you note, this appears to be a problem with the firewall, not the HTTP library. You’re right that moving to a transport used more widely would help this problem. But, so would moving to curl apparently. In any case, as I wrote, the best way to address this is to make ourselves more generally flexible in our transport layer — and the way to do this is not to swap the HTTP library simply for git, but to open up our choices more broadly. Which is precisely the plan already under discussion with regards to Cabal. Git is no magic bullet here. It is just “anything besides the current thing that happens to trigger a specific bug in a specific firewall."
- the reversing revisions issue was NOT just a display issue: it completely broke Stackage Nightly builds that day, which just calls `cabal update` under the hood: https://github.com/haskell/hackage-server/issues/305. Other users of Hackage in that time window also experienced the issue. It's an issue that caused massive breakage in a lot of places. Notice how PkgInfo_v2 is a data structure that is entirely redundant with what Git would provide already, so need not be serialized to disk, have migrations written for it, etc, nor perhaps exist at all. Further, Git would have made it quite impossible to distribute what amounts to a rewritten and inadvertently tampered with history (because the clients would have noticed and refused to fast forward). Fewer pieces of state managed independently + less code = more reliable service.
Ah I see — they were flipped in the migration, not just in the display of the data. Regardless — there will always be a layer between our data storage — be it git, acid-state, database, anything else — and the programmatic use we make of that data. No matter what we do to that storage layer, the intermediate layer will need to turn that into a programmatic representation, and then the frontend services will need to display/make use of it. No matter what, there is always room for such bugs. You might say “but the server couldn’t cause such a bug in this system!” That’s silly — the deserialization from that storage layer will just take place later then — at each client. And they could cause such a bug. So yes, the literal place the bug was found is in code that would be different under a different storage layer. But there’s absolutely nothing in switching storage layers that rules out such bugs.
And furthermore, in the migration you propose, which involves taking all our data, pushing it into an entirely new representation, and then rewriting the entire hackage-server to talk to this new representation at all stages, and writing cabal-install to do the same — I promise that this would necessarily create a _whole lot_ of bugs.
Again, there may be reasons to do this (I’m dubious) — but let’s not overstate them to sell the case.
Hosting we don't have to manage ourselves is hosting we don't have to keep humming. Of course no service guarantees 100% uptime, so mirrors are a key additional (or alternative) ingredient here. Efficient, low-latency and reliable mirroring is certainly possible by other means, but mirroring a history of changes is exactly what Git was designed for, and what it does well. Why reinvent that?
In the last case here, you say that mirroring is easier with git? But don’t we already have mirroring now? And haven’t we had it for some time? The work underway, to my knowledge, is only to make mirroring more secure (as a related consequence of making hackage in general more secure). So this seems a silly thing to raise.
However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows).
That's an interesting concern, though without knowing more, this is not an actionable issue. What difficulties? If MinGHC packaged Git+gpg4win, what would the issue be?
I can give you an example I ran into with MinGHC already — I had a preexisting cygwin install on my machine, and tried to install MinGHC. This mixed msys paths with cygwin paths and everything mismatched and was horrible until I ripped out those msys paths. But now, of course, my new GHC can’t find the libraries to build against for e.g. doing a network reinstall, which was the entire point of the exercise.
By analogy, many windows users may have an existing git, and some may have an existing gpg. These may come from windows binaries (in a few flavors — direct, wrapped via tortoise, etc), from cygwin, or perhaps from another existing msys install.
Now they’re going to get multiple copies of these programs on their system with potentially conflicting paths, settings, etc? (Same goes for gpg, but not git on mac). And since we won’t have guarantees that everyone will have git, we’ll need to maintain existing transports anyway, so this only gives us a very partial solution...
I know there are some neat ideas in what you’re pushing for. But I feel like you’re overlooking all the potential issues — and also just underestimating the amount of work it would take to cut everything over to a new storage layer, on both front and backend, while keeping the set of existing features intact.
—Gershom
participants (7)
-
amindfv@gmail.com
-
Gershom B
-
Jeremy
-
Magnus Therning
-
Mathieu Boespflug
-
Michael Snoyman
-
Tillmann Rendel