Improvements to package hosting and security

Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today: * A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most. However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.: * Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages. I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation. [1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure [2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-... [3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

Also, since it's relevant, here's a Github repo with all of the cabal files
from Hackage which (thanks to a cron job and Travis CI) automatically
updates every 30 minutes:
https://github.com/commercialhaskell/all-cabal-files
On Mon, Apr 13, 2015 at 1:02 PM Michael Snoyman
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

Without adding much to the discussion myself, I just want to drop this link
here: http://www.cs.arizona.edu/stork/packagemanagersecurity/ . It
addresses some interesting issues concerning package repositories.
Anyhow I personally think the current state of hackage (not even https) is
unacceptable and I'm really excited that people seem to be working on this.
On Mon, Apr 13, 2015 at 12:28 PM, Michael Snoyman
Also, since it's relevant, here's a Github repo with all of the cabal files from Hackage which (thanks to a cron job and Travis CI) automatically updates every 30 minutes:
https://github.com/commercialhaskell/all-cabal-files
On Mon, Apr 13, 2015 at 1:02 PM Michael Snoyman
wrote: Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- Groetjes, Arian

On Mon, Apr 13, 2015 at 10:02:45AM +0000, Michael Snoyman wrote:
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
I finished reading the proposal, the only minor remark I have is on this sentence: " Each signature may be revoked using standard GPG revokation. It is the /key/ being revoked really, not the single signature (in our case it would mean revoking every-package-version-or-revision-signed-by-that-key). This in turn highlights the need for a well defined process on how to handle "key transitions" (task left to the single implementators). A distributed and secure hackage sounds like a dream, I really hope this comes to life!

On Mon, Apr 13, 2015 at 3:21 PM Francesco Ariis
On Mon, Apr 13, 2015 at 10:02:45AM +0000, Michael Snoyman wrote:
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
I finished reading the proposal, the only minor remark I have is on this sentence:
" Each signature may be revoked using standard GPG revokation.
It is the /key/ being revoked really, not the single signature (in our case it would mean revoking every-package-version-or-revision-signed-by-that-key). This in turn highlights the need for a well defined process on how to handle "key transitions" (task left to the single implementators).
I think I was just wrong at that part of the proposal; it wouldn't be "standard GPG revokation" since, as you point out, that's for revoking a key. We'd need a custom revokation mechanism to make this work. But as to your more general point: there was an added layer of indirection that I considered but didn't write up, but I happen to like. The idea would be that all of the authorization lists would work based off of an identifier (e.g., an email address). We would then have a separate mapping between email addresses and GPG public keys, which would follow the same signature scheme that all of the other files in the repo follow. The downside to this is that it redoes the basic GPG keysigning mechanism to some extent, but it does address key transitions more easily. Another possibility would be to encode the release date of a package/version and package/version/revision and use that date for checking validity of keys. That way, old signatures remain valid for perpetuity. I'll admit to my relative lack of experience with GPG, so there's probably some built-in mechanism for addressing this kind of situation which would be better to follow.
A distributed and secure hackage sounds like a dream, I really hope this comes to life!
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/20150413121848.GA3834%40... . For more options, visit https://groups.google.com/d/optout.

This proposal looks great. The one thing I am failing to understand (and I recognize the proposal is in early stages) is how to ensure redundancy in the system. As far as I can tell, much of this proposal discusses the centralized authority of the system (i.e. ensuring secure distribution) and only references (with little detail) the distributed store. For instance, say I host a package on a personal server and one day I decide to shut that server down; is this package now lost forever? I do see this line: "backup download links to S3" but this implies that the someone is willing to pay for S3 storage for all of the packages. Are there plans to adopt a P2P-like model or something similar to support any sort of replication? Public resources like this seem to come and go, so it would be nice to avoid some of the problems associated with high churn in the network. That said, there is an obvious cost to replication. Likewise, the central authority would have to be updated with new, relevant locations to find the file (as it is currently proposed). In any case, as I said before, the proposal looks great! I am looking forward to this. On Monday, April 13, 2015 at 5:02:46 AM UTC-5, Michael Snoyman wrote:
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

I purposely didn't get into those details in this document, as it can be layered on top of the setup I described here. The way I'd say this should be answered is twofold: * FP Complete already hosts all packages on S3, and we intend to continue hosting all packages there in the future * People in the community are welcome (and encouraged) to make redundant copies of packages, and then add hash-to-URL mappings to the main repo giving those redundant copies as additional download locations. In that sense, the FP Complete S3 copy would simply be one of potentially many redundant copies that could exist. On Mon, Apr 13, 2015 at 5:57 PM Dennis J. McWherter, Jr. < dennis@deathbytape.com> wrote:
This proposal looks great. The one thing I am failing to understand (and I recognize the proposal is in early stages) is how to ensure redundancy in the system. As far as I can tell, much of this proposal discusses the centralized authority of the system (i.e. ensuring secure distribution) and only references (with little detail) the distributed store. For instance, say I host a package on a personal server and one day I decide to shut that server down; is this package now lost forever? I do see this line: "backup download links to S3" but this implies that the someone is willing to pay for S3 storage for all of the packages.
Are there plans to adopt a P2P-like model or something similar to support any sort of replication? Public resources like this seem to come and go, so it would be nice to avoid some of the problems associated with high churn in the network. That said, there is an obvious cost to replication. Likewise, the central authority would have to be updated with new, relevant locations to find the file (as it is currently proposed).
In any case, as I said before, the proposal looks great! I am looking forward to this.
On Monday, April 13, 2015 at 5:02:46 AM UTC-5, Michael Snoyman wrote:
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

What security guarantees do we get from this proposal that are not present
from Chris's package signing work?
Part of the goal of the package signing is that we no longer need to trust
Hackage. If it is compromised and packages are compromised, then anyone
using signing tools should automatically reject the compromised packages.
Right now I think the answer is: that this provides a security model for
revisions: it limits what can be done and formalizes the trust of this
process in a cryptographic way. Whereas with Chris's work there is no
concept of a (trusted) revision and a new package must be released?
On Mon, Apr 13, 2015 at 3:02 AM, Michael Snoyman
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8Rq... https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com?utm_medium=email&utm_source=footer . For more options, visit https://groups.google.com/d/optout.

Yes, I think you've summarized the security aspects of this nicely. There's
also the reliability and availability guarantees we get from a distributed
system, but that's outside the realm of security (unless you're talking
about denial of service).
On Wed, Apr 15, 2015 at 7:12 AM Greg Weber
What security guarantees do we get from this proposal that are not present from Chris's package signing work? Part of the goal of the package signing is that we no longer need to trust Hackage. If it is compromised and packages are compromised, then anyone using signing tools should automatically reject the compromised packages.
Right now I think the answer is: that this provides a security model for revisions: it limits what can be done and formalizes the trust of this process in a cryptographic way. Whereas with Chris's work there is no concept of a (trusted) revision and a new package must be released?
On Mon, Apr 13, 2015 at 3:02 AM, Michael Snoyman
wrote: Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
--
You received this message because you are subscribed to the Google Groups
"Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8Rq... https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com?utm_medium=email&utm_source=footer .
For more options, visit https://groups.google.com/d/optout.
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKRanNCnSV%3Ddds4ZDmacN... https://groups.google.com/d/msgid/commercialhaskell/CAKRanNCnSV%3Ddds4ZDmacNO8WMxSgDmEh6acc0StMh%2Btgz%3D09hA%40mail.gmail.com?utm_medium=email&utm_source=footer . For more options, visit https://groups.google.com/d/optout.

On Tue, Apr 14, 2015 at 9:50 PM, Michael Snoyman
Yes, I think you've summarized the security aspects of this nicely. There's also the reliability and availability guarantees we get from a distributed system, but that's outside the realm of security (unless you're talking about denial of service).
Is it possible to separate out the concept of trusted revisions from a distributed hackage (into 2 separate proposals) then? If Hackage wanted to it could implement trusted revisions. Or some other (distributed or non-distributed) package service could implement it (as long as the installer tool knows to check for revisions there, perhaps this would be added to Chris's signing tooling).
On Wed, Apr 15, 2015 at 7:12 AM Greg Weber
wrote: What security guarantees do we get from this proposal that are not present from Chris's package signing work? Part of the goal of the package signing is that we no longer need to trust Hackage. If it is compromised and packages are compromised, then anyone using signing tools should automatically reject the compromised packages.
Right now I think the answer is: that this provides a security model for revisions: it limits what can be done and formalizes the trust of this process in a cryptographic way. Whereas with Chris's work there is no concept of a (trusted) revision and a new package must be released?
On Mon, Apr 13, 2015 at 3:02 AM, Michael Snoyman
wrote: Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
--
You received this message because you are subscribed to the Google Groups
"Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8Rq... https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com?utm_medium=email&utm_source=footer .
For more options, visit https://groups.google.com/d/optout.
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKRanNCnSV%3Ddds4ZDmacN... https://groups.google.com/d/msgid/commercialhaskell/CAKRanNCnSV%3Ddds4ZDmacNO8WMxSgDmEh6acc0StMh%2Btgz%3D09hA%40mail.gmail.com?utm_medium=email&utm_source=footer . For more options, visit https://groups.google.com/d/optout.

On Wed, Apr 15, 2015 at 8:02 AM Greg Weber
On Tue, Apr 14, 2015 at 9:50 PM, Michael Snoyman
wrote: Yes, I think you've summarized the security aspects of this nicely. There's also the reliability and availability guarantees we get from a distributed system, but that's outside the realm of security (unless you're talking about denial of service).
Is it possible to separate out the concept of trusted revisions from a distributed hackage (into 2 separate proposals) then? If Hackage wanted to it could implement trusted revisions. Or some other (distributed or non-distributed) package service could implement it (as long as the installer tool knows to check for revisions there, perhaps this would be added to Chris's signing tooling).
It would be a fundamental shift away from how Hackage does things today. I think the necessary steps would be: 1. Hackage ships all revisions to cabal files somehow (personally, I think it should be doing this anyway). 2. We have a list of trustees who are allowed to edit metadata. The signing work already has to recapture that information for allowed uploaders since Hackage doesn't collect GPG keys 3. Every time a revision is made, the person making the revision would need to sign the new revision I'm open to other ideas, this is just what came to mind first. Michael

On Tue, Apr 14, 2015 at 10:08 PM, Michael Snoyman
On Wed, Apr 15, 2015 at 8:02 AM Greg Weber
wrote: On Tue, Apr 14, 2015 at 9:50 PM, Michael Snoyman
wrote: Yes, I think you've summarized the security aspects of this nicely. There's also the reliability and availability guarantees we get from a distributed system, but that's outside the realm of security (unless you're talking about denial of service).
Is it possible to separate out the concept of trusted revisions from a distributed hackage (into 2 separate proposals) then? If Hackage wanted to it could implement trusted revisions. Or some other (distributed or non-distributed) package service could implement it (as long as the installer tool knows to check for revisions there, perhaps this would be added to Chris's signing tooling).
It would be a fundamental shift away from how Hackage does things today. I think the necessary steps would be:
1. Hackage ships all revisions to cabal files somehow (personally, I think it should be doing this anyway). 2. We have a list of trustees who are allowed to edit metadata. The signing work already has to recapture that information for allowed uploaders since Hackage doesn't collect GPG keys 3. Every time a revision is made, the person making the revision would need to sign the new revision
I'm open to other ideas, this is just what came to mind first.
Perhaps this is not really doable, but I was thinking there should be a proposal for a specification for trusted revisions. These are integration details for Hackage just as the current proposal has some implementation about a distributed package service. I actually think the easiest way to make revisions secure with Hackage is to precisely limit what can be revised. If one can only change an upper bound of an existing dependency that greatly limits the attack vectors.

Hi folks, As I mentioned previously on the commercialhaskell list, we're working on Hackage security for the IHG at the moment. We've finally written up the design for that as a blog post: http://www.well-typed.com/blog/2015/04/improving-hackage-security It includes a section at the end comparing in general terms to this proposal (specifically Chris's part on package signing). The design is basically "The Update Framework" for Hackage. Our current implementation effort for the IHG covers the first part of that design. http://theupdateframework.com/ I think TUF addresses many of the concerns that have been raised in this thread, e.g. about threat models, what signatures actually mean etc. It also covers the question of making the "who's allowed to upload what" information transparent, with proper cryptographic evidence (albeit that's in the second part of the design). So if collectively we can also implement the second part of TUF for Hackage then I think we can address these issues properly. Other things worth noting: * This will finally allow us to have untrusted public mirrors, which is the traditional approach to improving repository reliability. * We're incorporating an existing design for incremental updates of the package index to significantly improve "cabal update" times. I'll chip in elsewhere in this thread with more details about how TUF (or our adaptation of it for hackage) solves some of the problems raised here. Duncan On Mon, 2015-04-13 at 10:02 +0000, Michael Snoyman wrote:
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
-- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Thanks for responding, I intend to go read up on TUF and your blog post
now. One question:
* We're incorporating an existing design for incremental updates
of the package index to significantly improve "cabal update"
times.
Can you give any details about what you're planning here? I put together a
Git repo already that has all of the cabal files from Hackage and which
updates every 30 minutes, and it seems that, instead of reinventing
anything, simply using `git pull` would be the right solution here:
https://github.com/commercialhaskell/all-cabal-files
On Thu, Apr 16, 2015 at 12:34 PM Duncan Coutts
Hi folks,
As I mentioned previously on the commercialhaskell list, we're working on Hackage security for the IHG at the moment.
We've finally written up the design for that as a blog post:
http://www.well-typed.com/blog/2015/04/improving-hackage-security
It includes a section at the end comparing in general terms to this proposal (specifically Chris's part on package signing).
The design is basically "The Update Framework" for Hackage. Our current implementation effort for the IHG covers the first part of that design.
http://theupdateframework.com/
I think TUF addresses many of the concerns that have been raised in this thread, e.g. about threat models, what signatures actually mean etc.
It also covers the question of making the "who's allowed to upload what" information transparent, with proper cryptographic evidence (albeit that's in the second part of the design).
So if collectively we can also implement the second part of TUF for Hackage then I think we can address these issues properly.
Other things worth noting: * This will finally allow us to have untrusted public mirrors, which is the traditional approach to improving repository reliability. * We're incorporating an existing design for incremental updates of the package index to significantly improve "cabal update" times.
I'll chip in elsewhere in this thread with more details about how TUF (or our adaptation of it for hackage) solves some of the problems raised here.
Duncan
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is
On Mon, 2015-04-13 at 10:02 +0000, Michael Snoyman wrote: the
core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1]
https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2]
https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point
in
the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
-- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/1429176843.25663.31.came... . For more options, visit https://groups.google.com/d/optout.

On Thu, 2015-04-16 at 09:52 +0000, Michael Snoyman wrote:
Thanks for responding, I intend to go read up on TUF and your blog post now. One question:
* We're incorporating an existing design for incremental updates of the package index to significantly improve "cabal update" times.
Can you give any details about what you're planning here?
Sure, it's partially explained in the blog post.
I put together a Git repo already that has all of the cabal files from Hackage and which updates every 30 minutes, and it seems that, instead of reinventing anything, simply using `git pull` would be the right solution here:
It's great that we can mirror to lots of different formats so easily :-). I see that we now have two hackage mirror tools, one for mirroring to a hackage-server instance and one for S3. The bit I think is missing is mirroring to a simple directory based archive, e.g. to be served by a normal http server.
From the blog post:
The trick is that the tar format was originally designed to be append only (for tape drives) and so if the server simply updates the index in an append only way then the clients only need to download the tail (with appropriate checks and fallback to a full update). Effectively the index becomes an append only transaction log of all the package metadata changes. This is also fully backwards compatible. The extra detail is that we can use HTTP range requests. These are supported on pretty much all dumb/passive http servers, so it's still possible to host a hackage archive on a filesystem or ordinary web server (this has always been a design goal of the repository format). We use a HTTP range request to get the tail of the tarball, so we only have to download the data that has been added since the client last fetched the index. This is obviously much much smaller than the whole index. For safety (and indeed security) the final tarball content is checked to make sure it matches up with what is expected. Resetting and changing files earlier in the tarball is still possible: if the content check fails then we have to revert to downloading the whole index from scratch. In practice we would not expect this to happen except when completely blowing away a repository and starting again. The advantage of this approach compared to others like rsync or git is that it's fully compatible with the existing format and existing clients. It's also in the typical case a smaller download than rsync and probably similar or smaller than git. It also doesn't need much new from the clients, they just need the same tar, zlib and HTTP features as they have now (e.g. in cabal-install) and don't have to distribute rsync/git/etc binaries on other platforms (e.g. windows). That said, I have no problem whatsoever with there being git or rsync based mirrors. Indeed the central hackage server could provide an rsync point for easy setup for public mirrors (including the package files). -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Thu, Apr 16, 2015 at 1:12 PM Duncan Coutts
On Thu, 2015-04-16 at 09:52 +0000, Michael Snoyman wrote:
Thanks for responding, I intend to go read up on TUF and your blog post now. One question:
* We're incorporating an existing design for incremental updates of the package index to significantly improve "cabal update" times.
Can you give any details about what you're planning here?
Sure, it's partially explained in the blog post.
I put together a Git repo already that has all of the cabal files from Hackage and which updates every 30 minutes, and it seems that, instead of reinventing anything, simply using `git pull` would be the right solution here:
It's great that we can mirror to lots of different formats so easily :-).
I see that we now have two hackage mirror tools, one for mirroring to a hackage-server instance and one for S3. The bit I think is missing is mirroring to a simple directory based archive, e.g. to be served by a normal http server.
From the blog post:
The trick is that the tar format was originally designed to be append only (for tape drives) and so if the server simply updates the index in an append only way then the clients only need to download the tail (with appropriate checks and fallback to a full update). Effectively the index becomes an append only transaction log of all the package metadata changes. This is also fully backwards compatible.
The extra detail is that we can use HTTP range requests. These are supported on pretty much all dumb/passive http servers, so it's still possible to host a hackage archive on a filesystem or ordinary web server (this has always been a design goal of the repository format).
We use a HTTP range request to get the tail of the tarball, so we only have to download the data that has been added since the client last fetched the index. This is obviously much much smaller than the whole index. For safety (and indeed security) the final tarball content is checked to make sure it matches up with what is expected. Resetting and changing files earlier in the tarball is still possible: if the content check fails then we have to revert to downloading the whole index from scratch. In practice we would not expect this to happen except when completely blowing away a repository and starting again.
The advantage of this approach compared to others like rsync or git is that it's fully compatible with the existing format and existing clients. It's also in the typical case a smaller download than rsync and probably similar or smaller than git. It also doesn't need much new from the clients, they just need the same tar, zlib and HTTP features as they have now (e.g. in cabal-install) and don't have to distribute rsync/git/etc binaries on other platforms (e.g. windows).
That said, I have no problem whatsoever with there being git or rsync based mirrors. Indeed the central hackage server could provide an rsync point for easy setup for public mirrors (including the package files).
I don't like this approach at all. There are many tools out there that do a good job of dealing with incremental updates. Instead of using any of those, the idea is to create a brand new approach, implement it in both Hackage Server and cabal-install (two projects that already have a massive bug deficit), and roll it out hoping for the best. There's no explanation here as to how you'll deal with things like cabal file revisions, which are very common these days and seem to necessitate redownloading the entire database in your proposal. Here's my proposal: use Git. If Git isn't available on the host, then revert to the current codepath and download the index. We can roll that out in an hour of work and everyone gets the benefits, without the detriments of creating a new incremental update framework. Also: it seems like your biggest complaint about Git is "distributing Git." Making Git an optional upgrade is one way of solving that. Another approach is: don't use the official Git command line tool, but one of the many other implementations out there that implement the necessary subset of functionality. I'd guess writing that functionality from scratch in Cabal would be a comparable amount of code to what you're proposing. Comments on package signing to be continued later, I haven't finished reading it yet. Michael

On Thu, 2015-04-16 at 10:32 +0000, Michael Snoyman wrote:
On Thu, Apr 16, 2015 at 1:12 PM Duncan Coutts
wrote: On Thu, 2015-04-16 at 09:52 +0000, Michael Snoyman wrote:
Thanks for responding, I intend to go read up on TUF and your blog post now. One question:
* We're incorporating an existing design for incremental updates of the package index to significantly improve "cabal update" times.
Can you give any details about what you're planning here?
Sure, it's partially explained in the blog post.
I put together a Git repo already that has all of the cabal files from Hackage and which updates every 30 minutes, and it seems that, instead of reinventing anything, simply using `git pull` would be the right solution here:
It's great that we can mirror to lots of different formats so easily :-).
I see that we now have two hackage mirror tools, one for mirroring to a hackage-server instance and one for S3. The bit I think is missing is mirroring to a simple directory based archive, e.g. to be served by a normal http server.
From the blog post:
The trick is that the tar format was originally designed to be append only (for tape drives) and so if the server simply updates the index in an append only way then the clients only need to download the tail (with appropriate checks and fallback to a full update). Effectively the index becomes an append only transaction log of all the package metadata changes. This is also fully backwards compatible.
The extra detail is that we can use HTTP range requests. These are supported on pretty much all dumb/passive http servers, so it's still possible to host a hackage archive on a filesystem or ordinary web server (this has always been a design goal of the repository format).
We use a HTTP range request to get the tail of the tarball, so we only have to download the data that has been added since the client last fetched the index. This is obviously much much smaller than the whole index. For safety (and indeed security) the final tarball content is checked to make sure it matches up with what is expected. Resetting and changing files earlier in the tarball is still possible: if the content check fails then we have to revert to downloading the whole index from scratch. In practice we would not expect this to happen except when completely blowing away a repository and starting again.
The advantage of this approach compared to others like rsync or git is that it's fully compatible with the existing format and existing clients. It's also in the typical case a smaller download than rsync and probably similar or smaller than git. It also doesn't need much new from the clients, they just need the same tar, zlib and HTTP features as they have now (e.g. in cabal-install) and don't have to distribute rsync/git/etc binaries on other platforms (e.g. windows).
That said, I have no problem whatsoever with there being git or rsync based mirrors. Indeed the central hackage server could provide an rsync point for easy setup for public mirrors (including the package files).
I don't like this approach at all. There are many tools out there that do a good job of dealing with incremental updates. Instead of using any of those, the idea is to create a brand new approach, implement it in both Hackage Server and cabal-install (two projects that already have a massive bug deficit), and roll it out hoping for the best.
I looked at other incremental HTTP update approaches that would be compatible with the existing format and work with passive http servers. There's one rsync-like thing over http but the update sizes for our case would be considerably larger than this very simple "get the tail, check the secure hash is still right". This approach is minimally disruptive, compatible with the existing format and clients.
There's no explanation here as to how you'll deal with things like cabal file revisions, which are very common these days and seem to necessitate redownloading the entire database in your proposal.
The tarball becomes append only. The tar format works in this way; updated files are simply appended. (This is how incremental backups to tape drives worked in the old days, using the tar format). So no, cabal file revisions will be handled just fine, as will other updates to other metadata. Indeed we get the full transaction history.
Here's my proposal: use Git. If Git isn't available on the host, then revert to the current codepath and download the index. We can roll that out in an hour of work and everyone gets the benefits, without the detriments of creating a new incremental update framework.
I was not proposing to change the repository format significantly (and only in a backwards compatible way). The existing format is pretty simple, using standard old well understood formats and protocols with wide tool support. The incremental update is fairly unobtrusive. Passive http servers don't need to know about it, and clients that don't know about it can just download the whole index as they do now. The security extensions for TUF are also compatible with the existing format and clients. -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts
On Thu, Apr 16, 2015 at 1:12 PM Duncan Coutts
wrote: On Thu, 2015-04-16 at 09:52 +0000, Michael Snoyman wrote:
Thanks for responding, I intend to go read up on TUF and your blog
now. One question:
* We're incorporating an existing design for incremental updates of the package index to significantly improve "cabal update" times.
Can you give any details about what you're planning here?
Sure, it's partially explained in the blog post.
I put together a Git repo already that has all of the cabal files from Hackage and which updates every 30 minutes, and it seems that, instead of reinventing anything, simply using `git pull` would be the right solution here:
It's great that we can mirror to lots of different formats so easily :-).
I see that we now have two hackage mirror tools, one for mirroring to a hackage-server instance and one for S3. The bit I think is missing is mirroring to a simple directory based archive, e.g. to be served by a normal http server.
From the blog post:
The trick is that the tar format was originally designed to be append only (for tape drives) and so if the server simply updates the index in an append only way then the clients only need to download the tail (with appropriate checks and fallback to a full update). Effectively the index becomes an append only transaction log of all the package metadata changes. This is also fully backwards compatible.
The extra detail is that we can use HTTP range requests. These are supported on pretty much all dumb/passive http servers, so it's still possible to host a hackage archive on a filesystem or ordinary web server (this has always been a design goal of the repository format).
We use a HTTP range request to get the tail of the tarball, so we only have to download the data that has been added since the client last fetched the index. This is obviously much much smaller than the whole index. For safety (and indeed security) the final tarball content is checked to make sure it matches up with what is expected. Resetting and changing files earlier in the tarball is still possible: if the content check fails then we have to revert to downloading the whole index from scratch. In practice we would not expect this to happen except when completely blowing away a repository and starting again.
The advantage of this approach compared to others like rsync or git is that it's fully compatible with the existing format and existing clients. It's also in the typical case a smaller download than rsync and probably similar or smaller than git. It also doesn't need much new from the clients, they just need the same tar, zlib and HTTP features as
On Thu, 2015-04-16 at 10:32 +0000, Michael Snoyman wrote: post they
have now (e.g. in cabal-install) and don't have to distribute rsync/git/etc binaries on other platforms (e.g. windows).
That said, I have no problem whatsoever with there being git or rsync based mirrors. Indeed the central hackage server could provide an rsync point for easy setup for public mirrors (including the package files).
I don't like this approach at all. There are many tools out there that do a good job of dealing with incremental updates. Instead of using any of those, the idea is to create a brand new approach, implement it in both Hackage Server and cabal-install (two projects that already have a massive bug deficit), and roll it out hoping for the best.
I looked at other incremental HTTP update approaches that would be compatible with the existing format and work with passive http servers. There's one rsync-like thing over http but the update sizes for our case would be considerably larger than this very simple "get the tail, check the secure hash is still right". This approach is minimally disruptive, compatible with the existing format and clients.
There's no explanation here as to how you'll deal with things like cabal file revisions, which are very common these days and seem to necessitate redownloading the entire database in your proposal.
The tarball becomes append only. The tar format works in this way; updated files are simply appended. (This is how incremental backups to tape drives worked in the old days, using the tar format). So no, cabal file revisions will be handled just fine, as will other updates to other metadata. Indeed we get the full transaction history.
Here's my proposal: use Git. If Git isn't available on the host, then revert to the current codepath and download the index. We can roll that out in an hour of work and everyone gets the benefits, without the detriments of creating a new incremental update framework.
I was not proposing to change the repository format significantly (and only in a backwards compatible way). The existing format is pretty simple, using standard old well understood formats and protocols with wide tool support.
The incremental update is fairly unobtrusive. Passive http servers don't need to know about it, and clients that don't know about it can just download the whole index as they do now.
The security extensions for TUF are also compatible with the existing format and clients.
The theme you seem to be creating here is "compatible with current format." You didn't say it directly, but you've strongly implied that, somehow, Git isn't compatible with existing tooling. Let me make clear that that is, in fact, false[1]: ``` #!/bin/bash set -e set -x DIR=$HOME/.cabal/packages/hackage.haskell.org TAR=$DIR/00-index.tar TARGZ=$TAR.gz git pull mkdir -p "$DIR" rm -f $TAR $TARGZ git archive --format=tar -o "$TAR" master gzip -k "$TAR" ``` I wrote this in 5 minutes. My official proposal is to add code to `cabal` which does the following: 1. Check for the presence of the `git` executable. If not present, download the current tarball 2. Check for existence of ~/.cabal/all-cabal-files (or similar). If present, run `git pull` inside of it. If absent, clone it 3. Run the equivalent of the above shell script to produce the 00-index.tar file (not sure if the .gz is also used by cabal) This seems like such a drastically simpler solution than using byte ranges, modifying Hackage to produce tarballs in an append-only manner, and setting up cabal-install to stitch together and check various pieces of a downloaded file. I was actually planning on proposing this some time next week. Can you tell me the downsides of using Git here, which seems to fit all the benefits you touted of:
pretty simple, using standard old well understood formats and protocols with wide tool support.
Unless Git at 10 years old isn't old enough yet. Michael [1] https://github.com/commercialhaskell/all-cabal-files/commit/133cd026f8a1f99d...

On Thu, 2015-04-16 at 11:18 +0000, Michael Snoyman wrote:
On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts
wrote:
I was not proposing to change the repository format significantly (and only in a backwards compatible way). The existing format is pretty simple, using standard old well understood formats and protocols with wide tool support.
The incremental update is fairly unobtrusive. Passive http servers don't need to know about it, and clients that don't know about it can just download the whole index as they do now.
The security extensions for TUF are also compatible with the existing format and clients.
The theme you seem to be creating here is "compatible with current format." You didn't say it directly, but you've strongly implied that, somehow, Git isn't compatible with existing tooling. Let me make clear that that is, in fact, false[1]:
Sure, one can use git or rsync or other methods to transfer the set of files that makes up a repository or repository index. The point is, existing clients expect both this format and this (http) protocol. There's a number of other minor arguments to be made here about what's simpler and more backwards compatible, but here are two more significant and positive arguments: 1. This incremental update approach works well with the TUF security design 2. This approach to transferring the repository index and files has a much lower security attack surface For 1, the basic TUF approach is based on a simple http server serving a set of files. Because we are implementing TUF for Hackage we picked this update method to go with it. It's really not exotic, the HTTP spec says about byte range requests: "Range supports efficient recovery from partially failed transfers, and supports efficient partial retrieval of large entities." We're doing an efficient partial retrieval of a large entity. For 2, Mathieu elsewhere in this thread pointed to an academic paper about attacks on package repositories and update systems. A surprising number of these are attacks on the download mechanism itself, before you even get to trying to verify individual package signatures. If you read the TUF papers you see that they also list these attacks and address them in various ways. One of them is that the download mechanism needs to know in advance the size (and content hash) of entities it is going to download. Also, we should strive to minimise the amount of complex unaudited code that has to run before we get to checking the signature of the package index (or individual package tarballs). In the TUF design, the only code that runs before verification is downloading two files over HTTP (one that's known to be very small, and the other we already know the length and signed content hash). If we're being paranoid we shouldn't even run any decompression before signature verification. With our implementation the C code that runs before signature verification is either none, or just zlib decompression if we want to do on-the-fly http transport compression, but that's optional if we don't want to trust zlib's security record (though it's extremely widely used). By contrast, if we use rsync or git then there's a massive amount of unaudited C code that is running with your user credentials prior to signature verification. In addition it is likely vulnerable to endless data and slow download attacks (see the papers). -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Thu, Apr 16, 2015 at 2:58 PM Duncan Coutts
On Thu, 2015-04-16 at 11:18 +0000, Michael Snoyman wrote:
On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts
wrote: I was not proposing to change the repository format significantly (and only in a backwards compatible way). The existing format is pretty simple, using standard old well understood formats and protocols with wide tool support.
The incremental update is fairly unobtrusive. Passive http servers don't need to know about it, and clients that don't know about it can just download the whole index as they do now.
The security extensions for TUF are also compatible with the existing format and clients.
The theme you seem to be creating here is "compatible with current format." You didn't say it directly, but you've strongly implied that, somehow, Git isn't compatible with existing tooling. Let me make clear that that is, in fact, false[1]:
Sure, one can use git or rsync or other methods to transfer the set of files that makes up a repository or repository index. The point is, existing clients expect both this format and this (http) protocol.
There's a number of other minor arguments to be made here about what's simpler and more backwards compatible, but here are two more significant and positive arguments:
1. This incremental update approach works well with the TUF security design 2. This approach to transferring the repository index and files has a much lower security attack surface
For 1, the basic TUF approach is based on a simple http server serving a set of files. Because we are implementing TUF for Hackage we picked this update method to go with it. It's really not exotic, the HTTP spec says about byte range requests: "Range supports efficient recovery from partially failed transfers, and supports efficient partial retrieval of large entities." We're doing an efficient partial retrieval of a large entity.
For 2, Mathieu elsewhere in this thread pointed to an academic paper about attacks on package repositories and update systems. A surprising number of these are attacks on the download mechanism itself, before you even get to trying to verify individual package signatures. If you read the TUF papers you see that they also list these attacks and address them in various ways. One of them is that the download mechanism needs to know in advance the size (and content hash) of entities it is going to download. Also, we should strive to minimise the amount of complex unaudited code that has to run before we get to checking the signature of the package index (or individual package tarballs). In the TUF design, the only code that runs before verification is downloading two files over HTTP (one that's known to be very small, and the other we already know the length and signed content hash). If we're being paranoid we shouldn't even run any decompression before signature verification. With our implementation the C code that runs before signature verification is either none, or just zlib decompression if we want to do on-the-fly http transport compression, but that's optional if we don't want to trust zlib's security record (though it's extremely widely used). By contrast, if we use rsync or git then there's a massive amount of unaudited C code that is running with your user credentials prior to signature verification. In addition it is likely vulnerable to endless data and slow download attacks (see the papers).
I never claimed nor intended to imply that range requests are non-standard. In fact, I'm quite familiar with them, given that I implemented that feature of Warp myself! What I *am* claiming as non-standard is using range requests to implement an incremental update protocol of a tar file. Is there any prior art to this working correctly? Do you know that web servers will do what you need and server the byte offsets from the uncompressed tar file instead of the compressed tar.gz? Where are you getting the signatures for, and how does this interact with 00-index.tar.gz files served by non-Hackage systems? On the security front: it seems that we have two options here: 1. Use a widely used piece of software (Git), likely already in use by the vast majority of people reading this mailing list, relied on by countless companies and individuals, holding source code for the kernel of likely every mail server between my fingertips and the people reading this email, to distribute incremental updates. And as an aside: that software has built in support for securely signing commits and verifying those signatures. 2. Write brand new code deep inside two Haskell codebases with little scrutiny to implement a download/update protocol that (to my knowledge) has never been tested anywhere else in the world. Have I misrepresented the two options at all? I get that you've been working on this TUF-based system in private for a while, and are probably heavily invested already in the solutions you came up with in private. But I'm finding it very difficult to see the reasoning to reinventing wheels that need to reinventing. MIchael

On 16-04-2015 14:18, Michael Snoyman wrote: [--snip--]
I never claimed nor intended to imply that range requests are non-standard. In fact, I'm quite familiar with them, given that I implemented that feature of Warp myself! What I *am* claiming as non-standard is using range requests to implement an incremental update protocol of a tar file. Is there any prior art to this working correctly? Do you know that web servers will do what you need and server the byte offsets from the uncompressed tar file instead of the compressed tar.gz?
Why would HTTP servers serve anything other than the raw contents of the file? You usually need special configuration for that sort of thing, e.g. mapping based on requested content type. (Which the client should always supply correctly, regardless.) "Dumb" HTTP servers certainly don't do anything weird here. [--snip--]
On the security front: it seems that we have two options here:
1. Use a widely used piece of software (Git), likely already in use by the vast majority of people reading this mailing list, relied on by countless companies and individuals, holding source code for the kernel of likely every mail server between my fingertips and the people reading this email, to distribute incremental updates. And as an aside: that software has built in support for securely signing commits and verifying those signatures.
I think the point that was being made was that it might not have been hardened sufficiently against mailicious servers (being much more complicated than a HTTP client, for good reasons). I honestly don't know how much such hardening it has received, but I doubt that it's anywhere close to HTTP clients in general. (As to the HTTP client Cabal uses, I wouldn't know.) [--snip--]
I get that you've been working on this TUF-based system in private for a while, and are probably heavily invested already in the solutions you came up with in private. But I'm finding it very difficult to see the reasoning to reinventing wheels that need to reinventing.
That's pretty... uncharitable. Especially given that you also have a horse in this race. (Especially, also considering that your proposal *doesn't* address some of the vulnerabilities mitigated by the TUF work.) Regards,

On Thu, Apr 16, 2015 at 4:36 PM Bardur Arantsson
On 16-04-2015 14:18, Michael Snoyman wrote: [--snip--]
I never claimed nor intended to imply that range requests are non-standard. In fact, I'm quite familiar with them, given that I implemented that feature of Warp myself! What I *am* claiming as non-standard is using range requests to implement an incremental update protocol of a tar file. Is there any prior art to this working correctly? Do you know that web servers will do what you need and server the byte offsets from the uncompressed tar file instead of the compressed tar.gz?
Why would HTTP servers serve anything other than the raw contents of the file? You usually need special configuration for that sort of thing, e.g. mapping based on requested content type. (Which the client should always supply correctly, regardless.)
"Dumb" HTTP servers certainly don't do anything weird here.
There actually is a weird point to browsers and servers around pre-gziped contents, which is what I was trying to get at (but didn't do a clear enough job of doing). There's some ambiguity when sending compressed tarballs as to whether the browser should decompress, for instance. http-client had to implement a workaround for this specifically: https://www.stackage.org/haddock/nightly-2015-04-16/http-client-0.4.11.1/Net...
On the security front: it seems that we have two options here:
1. Use a widely used piece of software (Git), likely already in use by
[--snip--] the
vast majority of people reading this mailing list, relied on by countless companies and individuals, holding source code for the kernel of likely every mail server between my fingertips and the people reading this email, to distribute incremental updates. And as an aside: that software has built in support for securely signing commits and verifying those signatures.
I think the point that was being made was that it might not have been hardened sufficiently against mailicious servers (being much more complicated than a HTTP client, for good reasons). I honestly don't know how much such hardening it has received, but I doubt that it's anywhere close to HTTP clients in general. (As to the HTTP client Cabal uses, I wouldn't know.)
AFAIK, neither of these proposals as they stand have anything to do with security against a malicious server. In both cases, we need to simply trust the server to be sending the right data. Using some kind of signing mechanism is a mitigation against that, such as the GPG signatures I added to all-cabal-files. HTTPS from Hackage would help prevent MITM attacks, and having the 00-index file be cryptographically signed would be another (though I don't know what Duncan has planned here).
[--snip--]
I get that you've been working on this TUF-based system in private for a while, and are probably heavily invested already in the solutions you came up with in private. But I'm finding it very difficult to see the reasoning to reinventing wheels that need to reinventing.
That's pretty... uncharitable. Especially given that you also have a horse in this race.
(Especially, also considering that your proposal *doesn't* address some of the vulnerabilities mitigated by the TUF work.)
I actually really don't have a horse in this race. It seems like a lot of people missed this from the first email I sent, so to repeat myself:
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
My "horse in the race" is a security model that's not around putting all trust in a single entity. Other than that, I'm not invested in any specific direction. Using TUF sounds like a promising idea, but- as I raised in the other thread- I have my concerns. All of that said: the discussion here is about efficient incremental downloads, not package signing. For some reason those two points are getting conflated here. Michael

On 17-04-2015 05:34, Michael Snoyman wrote:
AFAIK, neither of these proposals as they stand have anything to do with security against a malicious server. In both cases, we need to simply trust the server to be sending the right data. Using some kind of signing mechanism is a mitigation against that, such as the GPG signatures I added to all-cabal-files. HTTPS from Hackage would help prevent MITM attacks, and having the 00-index file be cryptographically signed would be another (though I don't know what Duncan has planned here).
Well, TUF (at at least if fully implemented) can certainly limit the amount of damage that a malicious (read: compromised) server can do. Obviously it can't magically make a malicious server behave like a non-malicious one, but it does prevent e.g. the "serve stale data" trick or Slowloris-for-clients*. (*) By clients knowing up-front, in a secure manner, how much data there is to download.
[--snip--]
I get that you've been working on this TUF-based system in private for a while, and are probably heavily invested already in the solutions you came up with in private. But I'm finding it very difficult to see the reasoning to reinventing wheels that need to reinventing.
All of that said: the discussion here is about efficient incremental downloads, not package signing. For some reason those two points are getting conflated here.
I think you might not have been very clear about stating that you were limiting your comments in this subthread to apply only to said mechanism. (Or at least I didn't notice any such statement, but then I might well have missed it.) Another point is: It's often not very useful to talk about things in complete isolation when discussion security systems since there may be non-trivial interplay between the parts -- though TUF tries to limit the amount of interplay (to limit complexity/understandability). Not necessary a major concern in this particular subsystem, but see (*) Regards,

On 17-04-2015 05:34, Michael Snoyman wrote:
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
You say "more secure". Against what? What's the threat model? (Again, sorry if I missed it, it's been a long thread.) Yes, I'd definitely like a more "secure system" against many/all of the threats idenfied in e.g. TUF (perhaps even more, if realistic), but it's hard to evaluate a proposal without an explicitly spelled out threat model. This where adopting bits of TUF seems a lot more appealing than a home-brewed model, at least if we can remain confident that those bits actually mitigates the threats that we want covered. Regards,

On Fri, Apr 17, 2015 at 7:51 AM Bardur Arantsson
On 17-04-2015 05:34, Michael Snoyman wrote:
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
You say "more secure". Against what? What's the threat model? (Again, sorry if I missed it, it's been a long thread.)
Yes, I'd definitely like a more "secure system" against many/all of the threats idenfied in e.g. TUF (perhaps even more, if realistic), but it's hard to evaluate a proposal without an explicitly spelled out threat model. This where adopting bits of TUF seems a lot more appealing than a home-brewed model, at least if we can remain confident that those bits actually mitigates the threats that we want covered.
Instead of copy-pasting bits and pieces of my initial email until the whole thing makes sense, I'll just link to the initial email, which lists some of the security vulnerabilities and gives my disclaimers about my proposal just being a strawman: https://groups.google.com/d/msg/commercialhaskell/PTbC0p_YFvk/8XqS8wDxgqEJ Note that I never intended that list to be exhaustive at all! The point is to see if others have security concerns along these lines as well, seems to be the case. In this thread others and myself have raised a number of other security threats. TUF raises even additional threads. I've asked Duncan[1] about how TUF would address some specific concerns I raised (such as Hackage server being compromised), but I haven't heard a response. My guess is that TUF will ended up being a necessary but insufficient part of a solution here, but I unfortunately don't know enough about Well Typed's intended implementation to say more than that. Michael [1] Both in the mailing list and on Reddit: http://www.reddit.com/r/haskell/comments/32sezy/ongoing_work_to_improve_hack...

On 17-04-2015 07:04, Michael Snoyman wrote:
https://groups.google.com/d/msg/commercialhaskell/PTbC0p_YFvk/8XqS8wDxgqEJ
Note that I never intended that list to be exhaustive at all! The point is to see if others have security concerns along these lines as well, seems to be the case.
Ok, that's fair enough. And: yes! :) FWIW, I think what people have been asking for is exactly *details*, so that the proposal can be evaluated properly. (I realize that this is a non-trivial amount of work.). For example, a good start would be to evaluate your strawman proposal against the TUF criteria and see where it needs to be fleshed out/beefed up, etc.
I've asked Duncan[1] about how TUF would address some specific concerns I raised (such as Hackage server being compromised), but I haven't heard a response. My guess is that TUF will ended up being a necessary but insufficient part of a solution here, but I unfortunately don't know enough about Well Typed's intended implementation to say more than that.
Michael
[1] Both in the mailing list and on Reddit: http://www.reddit.com/r/haskell/comments/32sezy/ongoing_work_to_improve_hack...
I'm reminded of SPJs usual request for a wiki page *with details* discussing pros/cons of all the proposals for new GHC features. Might it be time to start such a page? (Of course this is not meant to imply any particular *rush* per se, but this is obviously becoming a growing concern in the community.) Regards,

I'm reminded of SPJs usual request for a wiki page *with details* discussing pros/cons of all the proposals for new GHC features. Might it be time to start such a page? (Of course this is not meant to imply any particular *rush* per se, but this is obviously becoming a growing concern in the community.)
I think it must be the first step. Otherwise, it's hard to evaluate the proposals. It would be great if both designs could be compared side by side. I'd suggest to create a file in the commercial haskell repo (so that authors of both designs (and others) could freely edit it) with a list of things that people care about, which should be as specific as possible. For example: | FPComplete | Well-Typed | --------------------------------------------------------------- Design document | https://... | https://... | Does this design protect from these attacks? | FPComplete | Well-Typed | -------------------------------------------------------------- Attack1 | yes | no | Attack1Comment | because of so and so | because of so and so | Attack2 | no | yes | Attack2Comment | because of so and so | because of so and so | Attack3 | no | no | Attack3Comment | because of so and so | because of so and so | ... Features: | | FPComplete | Well-Typed | --|---------------------------------------------------------- 1 |Allows for third-party mirrors | yes | yes | 2 |Comment regarding 1 | ... | ... | Estimated effort: | | FPComplete | Well-Typed | --|----------------------------------------------------------------- 1 | Tools required | git, ... | ... | 2 | Tools that need to be changed | ... | ... | 3 | Time required for 2 (hours) | ... | ... | 4 | Size of changes required for 2 (LOC) | ... | ... | Possibly with comments, too.

On Fri, Apr 17, 2015 at 9:56 AM Nikita Karetnikov
I'm reminded of SPJs usual request for a wiki page *with details* discussing pros/cons of all the proposals for new GHC features. Might it be time to start such a page? (Of course this is not meant to imply any particular *rush* per se, but this is obviously becoming a growing concern in the community.)
I think it must be the first step. Otherwise, it's hard to evaluate the proposals. It would be great if both designs could be compared side by side. I'd suggest to create a file in the commercial haskell repo (so that authors of both designs (and others) could freely edit it) with a list of things that people care about, which should be as specific as possible. For example:
| FPComplete | Well-Typed | --------------------------------------------------------------- Design document | https://... | https://... |
Does this design protect from these attacks?
| FPComplete | Well-Typed | -------------------------------------------------------------- Attack1 | yes | no | Attack1Comment | because of so and so | because of so and so | Attack2 | no | yes | Attack2Comment | because of so and so | because of so and so | Attack3 | no | no | Attack3Comment | because of so and so | because of so and so | ...
Features:
| | FPComplete | Well-Typed | --|---------------------------------------------------------- 1 |Allows for third-party mirrors | yes | yes | 2 |Comment regarding 1 | ... | ... |
Estimated effort:
| | FPComplete | Well-Typed | --|----------------------------------------------------------------- 1 | Tools required | git, ... | ... | 2 | Tools that need to be changed | ... | ... | 3 | Time required for 2 (hours) | ... | ... | 4 | Size of changes required for 2 (LOC) | ... | ... |
Possibly with comments, too.
This is a great idea, thank you both for raising it. I was discussing something similar with others in a text chat earlier this morning. I've gone ahead and put together a page to cover this discussion: https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/... The document definitely needs more work, this is just meant to get the ball rolling. As usual with the commercialhaskell repo, if anyone wants edit access, just request it on the issue tracker. Or most likely, send a PR and you'll get a commit bit almost magically ;) Michael

On 17-04-2015 10:17, Michael Snoyman wrote:
This is a great idea, thank you both for raising it. I was discussing something similar with others in a text chat earlier this morning. I've gone ahead and put together a page to cover this discussion:
https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/...
The document definitely needs more work, this is just meant to get the ball rolling. As usual with the commercialhaskell repo, if anyone wants edit access, just request it on the issue tracker. Or most likely, send a PR and you'll get a commit bit almost magically ;)
Thank you. Just to make sure that I understand -- is this page only meant to cover the original "strawman proposal" at the start of this thread, or...? Maybe you intend for this to be extended in a detailed way under the "Long-term solutions" heading? I was imagining a wiki page which could perhaps start out by collecting all the currently identified possible threats in a table, and then all "participants" could perhaps fill in how their suggestion addresses those threats (or tell us why we shouldn't care about this particular threat). Of course other relevent non-threat considerations might be relevant to add to such a table, such as: how prevalent is the software/idea we're basing this on? does this have any prior implementation (e.g. the append-to-tar and expect that web servers will behave sanely thing)? etc. (I realize that I'm asking for a lot of work, but I think it's going to be necessary, at least if there's going to be consensus and not just a de-facto "winner".) Regards,

On Sat, Apr 18, 2015 at 12:20 AM Bardur Arantsson
On 17-04-2015 10:17, Michael Snoyman wrote:
This is a great idea, thank you both for raising it. I was discussing something similar with others in a text chat earlier this morning. I've gone ahead and put together a page to cover this discussion:
https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/...
The document definitely needs more work, this is just meant to get the
ball
rolling. As usual with the commercialhaskell repo, if anyone wants edit access, just request it on the issue tracker. Or most likely, send a PR and you'll get a commit bit almost magically ;)
Thank you. Just to make sure that I understand -- is this page only meant to cover the original "strawman proposal" at the start of this thread, or...?
Maybe you intend for this to be extended in a detailed way under the "Long-term solutions" heading?
I was imagining a wiki page which could perhaps start out by collecting all the currently identified possible threats in a table, and then all "participants" could perhaps fill in how their suggestion addresses those threats (or tell us why we shouldn't care about this particular threat). Of course other relevent non-threat considerations might be relevant to add to such a table, such as: how prevalent is the software/idea we're basing this on? does this have any prior implementation (e.g. the append-to-tar and expect that web servers will behave sanely thing)? etc.
(I realize that I'm asking for a lot of work, but I think it's going to be necessary, at least if there's going to be consensus and not just a de-facto "winner".)
Hi Bardur, I don't think I have any different intention for this page than you've identified. In fact, I thought that I had clearly said exactly what you described when I said:
There are various ideas at play already. The bullets are not intended to be full representations of the proposals, but rather high level summaries. We should continue to expand this page with more details going forward.
If this is unclear somehow, please tell me. But my intention absolutely is that many people can edit this page to add their ideas and we can flesh out a complete solution. Michael

Hi all,
last week, I found some time to write up a very simple proposal that
addresses the following goals simultaneously:
- maintain a difficult to forge public audit log of Hackage updates;
- make downloads from Hackage mirrors just as trustworthy as
downloading from Hackage itself;
- guarantee that `cabal update` is always pulling the freshest package
index (called "snapshots" in the proposal), and detect when this might
not be the case;
- implement the first half of TUF (namely the index signing part
discussed in Duncan's blog post, not the author package signing part)
with fewer metadata files and in a way that reuses existing tooling;
- get low-implementation-cost, straightforward and incremental `cabal update`.
After a preliminary review from a few colleagues and friends in the
community, here is the proposal, in the form of Commercial Haskell
wiki page:
https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hacka...
The design constraints here are:
- stay backwards compatible where the cost for doing so is low.
- reuse existing tooling and mechanisms, especially when it comes to
key management, snapshot identity, and distributing signatures.
- Focus on the above 5 goals only, because they happen to all be
solvable by changing a single piece of mechanism. But strive to reuse
whatever mechanism others are proposing to solve other goals (e.g.
certification of provenance using author package signing, as Chris
Done has already proposed).
To that effect, the tl;dr is that I'm proposing that we just use Git
for maintaining the Hackage package index, that we use Git for
synchronizing this locally, and that we use Git commit signatures for
implementing the first half of TUF. The Git tooling currently assumes
GnuPG keys for signatures, so I'm proposing that we use GnuPG keys for
signing, and that we manage key revocation and any trust delegation
between keys using GnuPG and its existing infrasture.
I estimate the total effort necessary here to be the equivalent of 5-6
full time days overall. However, I have not pooled the necessary
resources to carry that out yet. I'd like to get feedback first before
going ahead with this, but in meantime,
** if there are any volunteers that would like to signal their intent
to help with the implementation effort then please add your name at
the bottom of the wiki page. **
Best,
Mathieu
On 18 April 2015 at 20:11, Michael Snoyman
On Sat, Apr 18, 2015 at 12:20 AM Bardur Arantsson
wrote: On 17-04-2015 10:17, Michael Snoyman wrote:
This is a great idea, thank you both for raising it. I was discussing something similar with others in a text chat earlier this morning. I've gone ahead and put together a page to cover this discussion:
https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/...
The document definitely needs more work, this is just meant to get the ball rolling. As usual with the commercialhaskell repo, if anyone wants edit access, just request it on the issue tracker. Or most likely, send a PR and you'll get a commit bit almost magically ;)
Thank you. Just to make sure that I understand -- is this page only meant to cover the original "strawman proposal" at the start of this thread, or...?
Maybe you intend for this to be extended in a detailed way under the "Long-term solutions" heading?
I was imagining a wiki page which could perhaps start out by collecting all the currently identified possible threats in a table, and then all "participants" could perhaps fill in how their suggestion addresses those threats (or tell us why we shouldn't care about this particular threat). Of course other relevent non-threat considerations might be relevant to add to such a table, such as: how prevalent is the software/idea we're basing this on? does this have any prior implementation (e.g. the append-to-tar and expect that web servers will behave sanely thing)? etc.
(I realize that I'm asking for a lot of work, but I think it's going to be necessary, at least if there's going to be consensus and not just a de-facto "winner".)
Hi Bardur,
I don't think I have any different intention for this page than you've identified. In fact, I thought that I had clearly said exactly what you described when I said:
There are various ideas at play already. The bullets are not intended to be full representations of the proposals, but rather high level summaries. We should continue to expand this page with more details going forward.
If this is unclear somehow, please tell me. But my intention absolutely is that many people can edit this page to add their ideas and we can flesh out a complete solution.
Michael
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Minor update. Some of your points about checking signatures before
unpacking made me curious about what Git had to offer in these
circumstances. For those like me who were unaware of the functionality, it
turns out that Git has the option to reject non-signed commits, just run:
git pull --verify-signatures
I've set up the Travis job that pulls from Hackage to sign its commits with
the GPG key I've attached to this email (fingerprint E595 AD42 14AF A6BB
1552 0B23 E40D 74D6 D6CF 60FD).
On Thu, Apr 16, 2015 at 2:58 PM Duncan Coutts
On Thu, 2015-04-16 at 11:18 +0000, Michael Snoyman wrote:
On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts
wrote: I was not proposing to change the repository format significantly (and only in a backwards compatible way). The existing format is pretty simple, using standard old well understood formats and protocols with wide tool support.
The incremental update is fairly unobtrusive. Passive http servers don't need to know about it, and clients that don't know about it can just download the whole index as they do now.
The security extensions for TUF are also compatible with the existing format and clients.
The theme you seem to be creating here is "compatible with current format." You didn't say it directly, but you've strongly implied that, somehow, Git isn't compatible with existing tooling. Let me make clear that that is, in fact, false[1]:
Sure, one can use git or rsync or other methods to transfer the set of files that makes up a repository or repository index. The point is, existing clients expect both this format and this (http) protocol.
There's a number of other minor arguments to be made here about what's simpler and more backwards compatible, but here are two more significant and positive arguments:
1. This incremental update approach works well with the TUF security design 2. This approach to transferring the repository index and files has a much lower security attack surface
For 1, the basic TUF approach is based on a simple http server serving a set of files. Because we are implementing TUF for Hackage we picked this update method to go with it. It's really not exotic, the HTTP spec says about byte range requests: "Range supports efficient recovery from partially failed transfers, and supports efficient partial retrieval of large entities." We're doing an efficient partial retrieval of a large entity.
For 2, Mathieu elsewhere in this thread pointed to an academic paper about attacks on package repositories and update systems. A surprising number of these are attacks on the download mechanism itself, before you even get to trying to verify individual package signatures. If you read the TUF papers you see that they also list these attacks and address them in various ways. One of them is that the download mechanism needs to know in advance the size (and content hash) of entities it is going to download. Also, we should strive to minimise the amount of complex unaudited code that has to run before we get to checking the signature of the package index (or individual package tarballs). In the TUF design, the only code that runs before verification is downloading two files over HTTP (one that's known to be very small, and the other we already know the length and signed content hash). If we're being paranoid we shouldn't even run any decompression before signature verification. With our implementation the C code that runs before signature verification is either none, or just zlib decompression if we want to do on-the-fly http transport compression, but that's optional if we don't want to trust zlib's security record (though it's extremely widely used). By contrast, if we use rsync or git then there's a massive amount of unaudited C code that is running with your user credentials prior to signature verification. In addition it is likely vulnerable to endless data and slow download attacks (see the papers).
-- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/1429185521.25663.103.cam... . For more options, visit https://groups.google.com/d/optout.

On Thu, Apr 16, 2015 at 03:28:10PM +0000, Michael Snoyman wrote:
Minor update. Some of your points about checking signatures before unpacking made me curious about what Git had to offer in these circumstances. For those like me who were unaware of the functionality, it turns out that Git has the option to reject non-signed commits, just run:
git pull --verify-signatures
I've set up the Travis job that pulls from Hackage to sign its commits with the GPG key I've attached to this email (fingerprint E595 AD42 14AF A6BB 1552 0B23 E40D 74D6 D6CF 60FD).
Nice one! One thing I, as a developer of a tool that consumes the Hackage index[1], would like to see is a bit more meta data, in particular - alternative download URLs for the source - hashes of the source (probably needs to be per URL) I thought I saw something about this in the thread, but going through it again I can't seem to find it. Would this sort of thing also be included in "improvements to package hosting"? /M [1]: http://hackage.haskell.org/package/cblrepo -- Magnus Therning OpenPGP: 0xAB4DFBA4 email: magnus@therning.org jabber: magnus@therning.org twitter: magthe http://therning.org/magnus There's a big difference between making something easy to use and making it productive. -- Adam Bosworth

On Fri, Apr 17, 2015 at 1:01 AM Magnus Therning
On Thu, Apr 16, 2015 at 03:28:10PM +0000, Michael Snoyman wrote:
Minor update. Some of your points about checking signatures before unpacking made me curious about what Git had to offer in these circumstances. For those like me who were unaware of the functionality, it turns out that Git has the option to reject non-signed commits, just run:
git pull --verify-signatures
I've set up the Travis job that pulls from Hackage to sign its commits with the GPG key I've attached to this email (fingerprint E595 AD42 14AF A6BB 1552 0B23 E40D 74D6 D6CF 60FD).
Nice one!
One thing I, as a developer of a tool that consumes the Hackage index[1], would like to see is a bit more meta data, in particular
- alternative download URLs for the source - hashes of the source (probably needs to be per URL)
I thought I saw something about this in the thread, but going through it again I can't seem to find it. Would this sort of thing also be included in "improvements to package hosting"?
/M
My strawman proposal did include the idea of identifying a package via its hash, and then providing redundant URLs for download (some of those URLs possibly being non-HTTP, such as a special URL to refer to contents within a Git repository). But as I keep saying, that was a strawman proposal, not to be taken as a final design. That said, simply adding that information to the 00-index file seems like an easy win. The hashes, at the very least, would fit in well. Michael

On 17 April 2015 at 05:25, Michael Snoyman
On Fri, Apr 17, 2015 at 1:01 AM Magnus Therning
wrote: On Thu, Apr 16, 2015 at 03:28:10PM +0000, Michael Snoyman wrote:
Minor update. Some of your points about checking signatures before unpacking made me curious about what Git had to offer in these circumstances. For those like me who were unaware of the functionality, it turns out that Git has the option to reject non-signed commits, just run:
git pull --verify-signatures
I've set up the Travis job that pulls from Hackage to sign its commits with the GPG key I've attached to this email (fingerprint E595 AD42 14AF A6BB 1552 0B23 E40D 74D6 D6CF 60FD).
Nice one!
One thing I, as a developer of a tool that consumes the Hackage index[1], would like to see is a bit more meta data, in particular
- alternative download URLs for the source - hashes of the source (probably needs to be per URL)
I thought I saw something about this in the thread, but going through it again I can't seem to find it. Would this sort of thing also be included in "improvements to package hosting"?
/M
My strawman proposal did include the idea of identifying a package via its hash, and then providing redundant URLs for download (some of those URLs possibly being non-HTTP, such as a special URL to refer to contents within a Git repository). But as I keep saying, that was a strawman proposal, not to be taken as a final design.
That said, simply adding that information to the 00-index file seems like an easy win. The hashes, at the very least, would fit in well.
I knew I'd seen it somewhere :) Yes, the addition of more meta data is an easy win and can be done before the dust has settled on the issue of how to achieve trust :) One thing I personally think is nice with OCaml's opam is that its package database is in a git repo (on github) and that adding packages is a matter of submitting a patch. I'd very much like to see a future where I can get a package onto Hackage by 1. cloning the Hackage package git repo 2. add and commit a .cabal file and meta data about where my package can be found, e.g. something like url="GIT=http://github/myname/mypkg.git;TAG=v1.0.2" sha512="..." 3. submit a pull request /M -- Magnus Therning OpenPGP: 0xAB4DFBA4 email: magnus@therning.org jabber: magnus@therning.org twitter: magthe http://therning.org/magnus

I'd like to step back from the technical discussion here for a moment and expand a bit on a point at the end of my previous email, which is really about process. After I first uploaded a blog post about service architectures and package distribution that was a recent interest of mine, I was very surprised and happy to hear that actually several parties had not only been already thinking about these very topics but moreover already have various small prototypes lying around. This was also the case for *secure* package distribution. What puzzled me, however, is that this came in the form of multiple private messages from mutiple sources sometimes referring to multiple said parties only vaguely and without identifying them. A similar story occurred when folks first started evoking package signing some years ago. Be it on robust identification of the provenance of packages, distribution packages and their metadata, more robust sandboxes or any other topic that touches upon our core infrastructure and tooling, it would be really great if people made themselves known and came forth with a) the requirements they seek to work against, b) their ideas to solve them and c) the resources they need or are themselves willing to bring to bear. It ultimately hurts the community when people repeatedly say things to the effect of, "yep, I hear you, interesting topic, I have a really cool solution to all of what you're saying - will be done Real Soon Now(tm)", or are happy to share details but only within a limited circle of cognoscenti. Because the net result is that other interested parties either unknowingly duplicate effort, or stall thinking that others are tackling the issue, sometimes for years. I know that the IHG has been interested in more secure package distribution for a very long time now, so it's really great that Duncan and Austin have now ("finally") taken the time to write up their current plan, moreover with a discussion of how it addresses a specific threat model, and make it known to the rest of the community that they have secured partial funding from the IHG. I know there other efforts out there, it would be great if they all came out of the woodwork. And in the future, if we could all be mindful to *publish* proposals and intents *upfront* when it comes to our shared community infrastructure and community tooling (rather than months or years later). I believe that's what is at the core of an *open* process for community developments. Ok, end of meta point, I for one am keen to dive back into the technical points that have been brought up in this thread already. :)

On April 16, 2015 at 8:39:40 AM, Mathieu Boespflug (mboes@tweag.net) wrote:
It ultimately hurts the community when people repeatedly say things to the effect of, "yep, I hear you, interesting topic, I have a really cool solution to all of what you're saying - will be done Real Soon Now(tm)", or are happy to share details but only within a limited circle of cognoscenti. Because the net result is that other interested parties either unknowingly duplicate effort, or stall thinking that others are tackling the issue, sometimes for years.
I think this is a valid concern. Let me make a suggestion as to why this does not happen as much as we might like as well (other than not-enough-time which is always a common reason). Knowing a little about different people’s style of working on open source projects, I have observed that some people are keen to throw out lots of ideas and blog while their projects are in the very early stages of formation. Sometimes this leads to useful discussions, sometimes it leads to lots of premature bikeshedding. But, often, other people don’t feel comfortable throwing out what they know are rough and unfinished thoughts to the world. They would rather either polish the proposal more fully, or would like to have a sufficient proof-of-concept that they feel confident the idea is actually tractable. I do not mean to suggest one or the other style is “better” — just that these are different ways that people are comfortable working, and they are hardwired rather deeply into their habits. In a single commercial development environment, these things are relatively more straightforward to mediate, because project plans are often set top down, and there are in fact people whose job it is to amalgamate information between different developers and teams. In an open source community things are necessarily looser. There are going to be a range of such styles and approaches, and while it is sort of a pain to negotiate between all of them, I don’t really see an alternative. So let me pose the opposite thing too: if there is a set of concerns/ideas involving core infrastructure and possible future plans, it would be good to reach out to the people most involved with that work and check if they have any projects underway but perhaps not widely announced that you might want to be aware of. I know that it feels it would be better to have more frequent updates on what projects are kicking around and what timetables. But contrariwise, it also feels it would be better to have more people investigate more as they start to pursue such projects. Also, it is good to have different proposals on the table, so that we can compare them and stack up what they do and don’t solve more clearly. So, to an extent, I welcome duplication of proposals as long as the discussion doesn’t fragment too far. And it is also good to have a few proofs-of-concept floating about to help pin down the issues better. All this is also very much in the open source spirit. One idea I have been thinking about, is a Birds of a Feather meeting at the upcoming ICFP in Vancouver focused just on Haskell Open-Source Infrastructure. That way a variety of people with a range of different ideas/projects/etc. could all get together in one room and share what they’re worried about and what they’re working on and what they’re maybe vaguely contemplating on working on. It’s great to see so much interest from so many quarters in various systems and improvements. Now to try and facilitate a bit more (loose) coordination between these endeavors! Cheers, Gershom P.S. as a general point to bystanders in this conversation — it seems to me one of the best ways to help the pace of “big ticket” cabal/hackage-server work would be to take a look at their outstanding lists of tracker issues and see if you feel comfortable jumping in on the smaller stuff. The more we can keep the little stuff under control, the better for the developers as a whole to start to implement more sweeping changes.

Thank you for that Gershom. I think everything that you're saying in that last email is very much on the mark. Multiple proposals is certainly a good thing for diversity, a good thing to help take our infrastructure in a good direction, and a good thing to help it evolve over time. It's true that most of us are volunteer contributors, working on improving infrastructure only so long as it's fun. So it's not always easy to ask for more upfront clarity and pitch perfect coordination. Then again as a community we make more progress faster when a little bit of process is followed. While a millions different tools or libraries to do the same thing can coexist just fine, with infrastructure that's much more difficult. A single global view of all code that people choose to contribute as open source is much healthier than a fragmented set of sub communities each working with their own infrastructure. So the degree of coordination required to make infrastructure evolve is much higher. To this end, I'd like to strongly encourage all interested parties to publish into the open proposals covering one or both of the topics that are currently hot infrastructure topics in the community: 1. reliable and efficient distribution of package metadata, package content and of incremental updates thereof. 2. robust and convenient checking of the provenance of a package version and policies for rejecting such package versions as potentially unsafe. These two topics overlap of course, so as has been the case so far often folks will be addressing both simultaneously. I submit that it would be most helpful if these proposals were structured as follows: * Requirements addressed by the proposal (including *thread model* where relevant) * Technical details * Ideally, some indication of the resources needed and a timeline. I know that the last point is of particular interest to commercial users, who like predictability in order to decide whether or not they need to be chipping in their own meagre resources to make the proposal happen and happen soon. But to some extent so does everyone else: no one likes to see the same discussions drag on for 2+ years. Openness really helps here - if things end up dragging out others can pick up the baton where it was left lying. So far we have at least 2 proposals that cover at least the first two sections above: * Chris Done's package signing proposal: https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-... * Duncan Coutts and Austin Seipp's proposal for improving Hackage security: http://www.well-typed.com/blog/2015/04/improving-hackage-security/ There are other draft (or "strawman") proposals (including one of mine) floating around out there, mentioned earlier in this thread. And then some (including prototype implementations) that I can say I and others have engaged with via private communication, but it would really help this discussion move forward if they were public.
One idea I have been thinking about, is a Birds of a Feather meeting at the upcoming ICFP in Vancouver focused just on Haskell Open-Source Infrastructure.
I think that's a great idea.
Best,
Mathieu
On 16 April 2015 at 17:06, Gershom B
On April 16, 2015 at 8:39:40 AM, Mathieu Boespflug (mboes@tweag.net) wrote:
It ultimately hurts the community when people repeatedly say things to the effect of, "yep, I hear you, interesting topic, I have a really cool solution to all of what you're saying - will be done Real Soon Now(tm)", or are happy to share details but only within a limited circle of cognoscenti. Because the net result is that other interested parties either unknowingly duplicate effort, or stall thinking that others are tackling the issue, sometimes for years.
I think this is a valid concern. Let me make a suggestion as to why this does not happen as much as we might like as well (other than not-enough-time which is always a common reason). Knowing a little about different people’s style of working on open source projects, I have observed that some people are keen to throw out lots of ideas and blog while their projects are in the very early stages of formation. Sometimes this leads to useful discussions, sometimes it leads to lots of premature bikeshedding. But, often, other people don’t feel comfortable throwing out what they know are rough and unfinished thoughts to the world. They would rather either polish the proposal more fully, or would like to have a sufficient proof-of-concept that they feel confident the idea is actually tractable. I do not mean to suggest one or the other style is “better” — just that these are different ways that people are comfortable working, and they are hardwired rather deeply into their habits.
In a single commercial development environment, these things are relatively more straightforward to mediate, because project plans are often set top down, and there are in fact people whose job it is to amalgamate information between different developers and teams. In an open source community things are necessarily looser. There are going to be a range of such styles and approaches, and while it is sort of a pain to negotiate between all of them, I don’t really see an alternative.
So let me pose the opposite thing too: if there is a set of concerns/ideas involving core infrastructure and possible future plans, it would be good to reach out to the people most involved with that work and check if they have any projects underway but perhaps not widely announced that you might want to be aware of. I know that it feels it would be better to have more frequent updates on what projects are kicking around and what timetables. But contrariwise, it also feels it would be better to have more people investigate more as they start to pursue such projects.
Also, it is good to have different proposals on the table, so that we can compare them and stack up what they do and don’t solve more clearly. So, to an extent, I welcome duplication of proposals as long as the discussion doesn’t fragment too far. And it is also good to have a few proofs-of-concept floating about to help pin down the issues better. All this is also very much in the open source spirit.
One idea I have been thinking about, is a Birds of a Feather meeting at the upcoming ICFP in Vancouver focused just on Haskell Open-Source Infrastructure. That way a variety of people with a range of different ideas/projects/etc. could all get together in one room and share what they’re worried about and what they’re working on and what they’re maybe vaguely contemplating on working on. It’s great to see so much interest from so many quarters in various systems and improvements. Now to try and facilitate a bit more (loose) coordination between these endeavors!
Cheers, Gershom
P.S. as a general point to bystanders in this conversation — it seems to me one of the best ways to help the pace of “big ticket” cabal/hackage-server work would be to take a look at their outstanding lists of tracker issues and see if you feel comfortable jumping in on the smaller stuff. The more we can keep the little stuff under control, the better for the developers as a whole to start to implement more sweeping changes.

On Thu, 2015-04-16 at 14:39 +0200, Mathieu Boespflug wrote:
I'd like to step back from the technical discussion here for a moment and expand a bit on a point at the end of my previous email, which is really about process.
I should apologise for not publishing our design earlier. To be fair I did mention several times on the commercialhaskell mailing list earlier this year that we were working on an index signing based approach. Early on in the design process we did not appreciate how much TUF overlaps with a GPG-author-signing based approach, we had thought they were much more orthogonal. My other excuse is that I was on holiday while much of the recent design discussion on Chris and your proposals had been going on. And finally, writing up comprehensible explanations is tricky and time consuming. By ultimately these are just excuses. We do always intend to do things openly in a collaborative way, the Cabal and hackage development is certainly open in that way, and we certainly never hold things back as closed source. In this case Austin and I have been doing intensive design work, and it was easier for us to do that between ourselves initially given that we're doing it on work time. I accept that we should have got this out earlier, especially since it turns out the other designs do have some overlap in terms of goals and guarantees.
Ok, end of meta point, I for one am keen to dive back into the technical points that have been brought up in this thread already. :)
Incidentally, having read your post on splitting things up a bit when I got back from holiday, I agree there are certainly valid complaints there. I'm not at all averse to factoring the hackage-server implementation slightly differently, perhaps so that the core index and package serving is handled by a smaller component (e.g. a dumb http server). For 3rd party services, the goal has always been for the hackage-server impl to provide all of its data in useful formats. No doubt that can be improved. Pull requests gratefully accepted. I see this security stuff as a big deal for the reliability because it will allow us to use public untrusted mirrors. That's why it's important to cover every package. That and perhaps a bit of refactoring of the hackage server should give us a very reliable system. -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

A couple quick points:
- The IHG proposal is partly motivated by staying backwards-compatible, but I think we shouldn't put a premium on this. Non-https versions of cabal should imo be deprecated once there's an alternative, so most people will need to upgrade anyway. We can run an "old-hackage" instance for those who can't.
- To the extent that it doesn't become an impedence mismatch, deduplicating effort with revision control systems (e.g. git) seems very desirable:
a) It can reduce maintainers' work -- e.g. It's probably a long road getting maintainers to all sign their packages and others' revisions, but the majority are already doing this with git.
b) It's easier to trust -- the amount of vetting and hardening is orders of magnitude more, and Cabal/cabal-install is already a complex machine
c) It's already been built -- including (conservatively) hundreds of security corner-cases that would need to be built/maintained/trusted (see "it's easier to trust")
Nix is an example of a package manager that very succesfully defers part of its workload to git and other vcs.
Tom
El Apr 16, 2015, a las 12:02, Duncan Coutts
On Thu, 2015-04-16 at 14:39 +0200, Mathieu Boespflug wrote:
I'd like to step back from the technical discussion here for a moment and expand a bit on a point at the end of my previous email, which is really about process.
I should apologise for not publishing our design earlier. To be fair I did mention several times on the commercialhaskell mailing list earlier this year that we were working on an index signing based approach.
Early on in the design process we did not appreciate how much TUF overlaps with a GPG-author-signing based approach, we had thought they were much more orthogonal.
My other excuse is that I was on holiday while much of the recent design discussion on Chris and your proposals had been going on.
And finally, writing up comprehensible explanations is tricky and time consuming.
By ultimately these are just excuses. We do always intend to do things openly in a collaborative way, the Cabal and hackage development is certainly open in that way, and we certainly never hold things back as closed source. In this case Austin and I have been doing intensive design work, and it was easier for us to do that between ourselves initially given that we're doing it on work time. I accept that we should have got this out earlier, especially since it turns out the other designs do have some overlap in terms of goals and guarantees.
Ok, end of meta point, I for one am keen to dive back into the technical points that have been brought up in this thread already. :)
Incidentally, having read your post on splitting things up a bit when I got back from holiday, I agree there are certainly valid complaints there. I'm not at all averse to factoring the hackage-server implementation slightly differently, perhaps so that the core index and package serving is handled by a smaller component (e.g. a dumb http server). For 3rd party services, the goal has always been for the hackage-server impl to provide all of its data in useful formats. No doubt that can be improved. Pull requests gratefully accepted.
I see this security stuff as a big deal for the reliability because it will allow us to use public untrusted mirrors. That's why it's important to cover every package. That and perhaps a bit of refactoring of the hackage server should give us a very reliable system.
-- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Incidentally, having read your post on splitting things up a bit when I got back from holiday, I agree there are certainly valid complaints there. I'm not at all averse to factoring the hackage-server implementation slightly differently, perhaps so that the core index and package serving is handled by a smaller component (e.g. a dumb http server). For 3rd party services, the goal has always been for the hackage-server impl to provide all of its data in useful formats. No doubt that can be improved. Pull requests gratefully accepted.
Awesome. Sounds like we're in broad agreement.
I see this security stuff as a big deal for the reliability because it will allow us to use public untrusted mirrors. That's why it's important to cover every package. That and perhaps a bit of refactoring of the hackage server should give us a very reliable system.
Indeed - availability by both reliability and redundancy. I still have some catching up to do on the technical content of your proposal and others - let me comment on that later. But either way I can certainly agree with the goal of reducing the size of the trusted base while simultaneously expanding the number of points of distribution. In the meantime, mirrors already exist (e.g. http://hackage.fpcomplete.com/), but as you say, they need to be trusted, in addition to having to trust Hackage. Thanks again for your detailed blog post and the context it provides. Best, Mathieu

Hi All,
I have only just found the time to read through this discussion. I thought
perhaps I would offer a few thoughts.
It seems that we are all in agreement that the security of Hackage/Cabal is
a problem: insecure transmission and no way to verify package authorship.
This is something which I feel we must address:
Where I work we have a lot of compliance to consider, and a few products
require us to provide vary degrees of assurance about the code we link
against. This usually leads to the decision to use a third party piece of
kit. Going forward, we will need to replace these systems with our own
solutions. I would prefer to use Haskell, but swapping out Hackage/Cabal
due to security concerns is undesirable from my point of view and the lack
of package security will be a show-stopper for senior management.
Using git and S3 as Michael suggests seems a good solution to me. To my
mind, the increased transparency, ability to create mirrors of S3 and git
to access the package metadata offers a number of desirable features.
Regarding the use of git: I don't think that we need to implement our own
solution, and depending on git is not an issue: Most of our CI uses git
anyway.
A final point I feel I must raise is that it seems that FP Complete are
going to be footing the bill for the S3 hosting. Long term, this seems
unfair to FP Complete. Is this something that the haskell.org could take
on? Or at the very least some other mechanism to either pay for or offer
compensation long term?
Kinds Regards,
- B.
On 13 April 2015 at 11:02, Michael Snoyman
Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8Rq... https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com?utm_medium=email&utm_source=footer . For more options, visit https://groups.google.com/d/optout.

storage for every package ever released on hackage in the history of
haskell on s3 totals < 30 cents per month (probably closer to 10 cents).
If their s3 host had super high usage, the most it'd cost per month
bandwidth is 50-90 dollars (and likely overestimating by at least 10x). a
small engineering team probably spends more than that on coffee per month.
haskell.org hosting and infrastructure is largely donated by various
organizations. If you can get AWS to top the rackspace+dreamhost infra
sponsorship, Gershom and others would probably love to hear about it.
On Sun, May 3, 2015 at 7:09 AM, Blake Rain
Hi All,
I have only just found the time to read through this discussion. I thought perhaps I would offer a few thoughts.
It seems that we are all in agreement that the security of Hackage/Cabal is a problem: insecure transmission and no way to verify package authorship. This is something which I feel we must address:
Where I work we have a lot of compliance to consider, and a few products require us to provide vary degrees of assurance about the code we link against. This usually leads to the decision to use a third party piece of kit. Going forward, we will need to replace these systems with our own solutions. I would prefer to use Haskell, but swapping out Hackage/Cabal due to security concerns is undesirable from my point of view and the lack of package security will be a show-stopper for senior management.
Using git and S3 as Michael suggests seems a good solution to me. To my mind, the increased transparency, ability to create mirrors of S3 and git to access the package metadata offers a number of desirable features.
Regarding the use of git: I don't think that we need to implement our own solution, and depending on git is not an issue: Most of our CI uses git anyway.
A final point I feel I must raise is that it seems that FP Complete are going to be footing the bill for the S3 hosting. Long term, this seems unfair to FP Complete. Is this something that the haskell.org could take on? Or at the very least some other mechanism to either pay for or offer compensation long term?
Kinds Regards,
- B.
On 13 April 2015 at 11:02, Michael Snoyman
wrote: Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:
* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs) * A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files
In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.
However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:
* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial * A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4] * Any kind of system level vulnerability could allow an attacker to compromise the server in the same way
Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.
I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.
[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-...
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror [4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now [5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8Rq... https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com?utm_medium=email&utm_source=footer . For more options, visit https://groups.google.com/d/optout.
-- You received this message because you are subscribed to the Google Groups "Commercial Haskell" group. To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com. To post to this group, send email to commercialhaskell@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CANUq-hHWexCeL%2Bp%2BWSU... https://groups.google.com/d/msgid/commercialhaskell/CANUq-hHWexCeL%2Bp%2BWSU7PNgpdpWo_j0uPJ47-ui7QRjdktZ9sg%40mail.gmail.com?utm_medium=email&utm_source=footer .
For more options, visit https://groups.google.com/d/optout.
participants (14)
-
amindfv@gmail.com
-
Arian van Putten
-
Bardur Arantsson
-
Blake Rain
-
Carter Schonwald
-
Dennis J. McWherter, Jr.
-
Duncan Coutts
-
Francesco Ariis
-
Gershom B
-
Greg Weber
-
Magnus Therning
-
Mathieu Boespflug
-
Michael Snoyman
-
Nikita Karetnikov