
Forgot the list. On Fri, Feb 1, 2013 at 10:21 AM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
Trying to avoid the wrath of Ketil I'll refrain from suggesting to do anything, I'll just explain why git is good at this, and not arbitrary. :-)
Most systems that I know of to verify *anything*, use merkle trees, or something very similar. http://en.wikipedia.org/wiki/Hash_tree
For example the TPM chip on your motherboard, used for example to ensure the integrity of the Google Chromebook and Windows BitLocker http://en.wikipedia.org/wiki/Trusted_Platform_Module (simplified example: in secure memory it stores H1=hash(microcode), then H2=hash(H1 || BIOS), then H3=hash(H2 || MBR), then H4=hash(H3 || kernel), ...).
Or the integrity of the bitcoin currency. https://en.bitcoin.it/wiki/Protocol_specification#Merkle_Trees
So these are pretty different systems, but it all boils down to doing cryptographic secure hashes over a previous hash + new data to ensure integrity of the new combined data. Given only one verified hash in such a system, no part of the data, nor its history of mutation can be forged. "History" can mean which software runs on your computer (TPM), which transactions are valid (Bitcoin), or which commits have been done in a SCM (git, mercurial).
So git is not magical, it is just a practical implementation of something that works. Any other *general* solution will be based on similar basic principles. Mercurial does this and there is a GPG extension for it.
Bazaar does not use a SHA1-based content addressable storage, so while a signed commit signs the tree, it does not represent the history (no "hash of hash", only "hash" if you look at it as a merkle tree), but it does chain commits. To verify a tree + history, *all* commits must be signed, which is fragile IMO.
Regarding Darcs, my understanding is that it deliberately rejects hashing the tree, so it is not clear to me how to verify tree+history. Patches can be signed, but as long as patches are independent, there is no "hash of hash" component which makes it difficult to see how one can verify the tree. My understanding of darcs is very limited though.
But to be *practical* the rest of the workflow should be secure as well, so you need:
1. A way to distribute the merkle tree (git pull/clone/push). Distribution is of the data that is to be signed is required for security, because otherwise the representation of the data itself (web view or 'git diff') can be compromised. Signatures have no meaning if you cannot trust that you know what you sign. 2. A way to sign a change to the merkle tre (git commit -S, git tag -s etc) 3. A way to have multiple signatures on a given hash (i.e. commit, or tag, or whatever it is called in a particular merkle tree implementation). This is required to avoid catastrophic "owning" of core developers. If required, I do think that multiple signatures can be emulated by a structured set of commits that have single signatures though. 3. A way to reliably do code reviews on the changes to the data (git diff) This is really the same as 1). We cannot reliably do 'git diff' unless the developers do it on their own equipment, thus the system must be distributed. 4. Given the requirement for a distributed merkle tree, some merge strategy is needed. It is thus practical, though not required, to have good support for this. (Btw, even the bitcoin hash chain has a merge strategy - the tree with the most compute power will win, and others are forced to "rebase" their transactions on that tree)
So my choice of git is not arbitrary. The way git works is pretty fundamental to verifying the integrity of stuff.
Though when I have looked through the other options, mercurial might be a better fit since it is supported on Windows.
Trying to solve this problem from scratch might not be such a good idea, because it might be very close to a reimplementation of git or mercurial. Or maybe it is a good idea for someone who has some time on their hands. Just be aware that the requirements for verifying anything is very close to what existing distributed SCM systems do.
Alexander
On Fri, Feb 1, 2013 at 3:32 AM, Kevin Quick
wrote: Git has the ability to solve all of this.
...
2. Uploads to hackage either happen through commits to the git
repository, or an old-style upload to hackage automatically creates a new anonymous branch in the git repository. 3. The git repository is authorative. Signing releases, code reviews etc. all happens through the git repositories. This gives us all the flexibility of a git-style trust model.
...
5. Who owns which package names can be held in a separate meta-tree git
repository, and can have consensus requirements on commits. 6. This special meta-tree can also contain suggested verification keys for commits to the other hackage git trees. It can even contain keys that protect Haskell namespaces in general, so that no hackage package can overwrite a protected Haskell namespace. 7. As backward compatibility, the meta-tree can sign simple hashes of already existing packages on hackage.
...
1. There could be some git magic script that downloads the signed git tag
objects only (small data set). Then another script would generate a git-compatible SHA1 of the extracted tarball, given that the tarball was fetched from hackage. 2. Or cabal-install could fetch directly from git repositories and use standard git verification. 3. Or a trusted machine creates tarballs from the git repositories, signs them and uploads them to hackage.
Without details of git's trust/verification model, it's difficult to see how this particular SCM tool provides the trust capabilities being discussed any better than a more focused solution. Additionally, the use of git is also difficult for many Windows users (80MB installed footprint, last I tried). git has a much broader solution space than simply ensuring the integrity of package downloads, especially when "there could be some git magic script" that is still not identified and appears to have the same insecurities as the package download/upload itself.
Instead of using the "git" solution and looking for problems to solve with it, IMHO we should work from clearly defined problem to solution in general terms as our class, and then determine what specific tools represent an instance of that solution class.
-- -KQ
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe