Re: [Haskell-cafe] Ticking time bomb

1 Feb 2013


      Forgot the list.


On Fri, Feb 1, 2013 at 10:21 AM, Alexander Kjeldaas <
alexander.kjeldaas@gmail.com> wrote:
...
Trying to avoid the wrath of Ketil I'll refrain from suggesting to do
anything, I'll just explain why git is good at this, and not arbitrary. :-)
Most systems that I know of to verify *anything*, use merkle trees, or
something very similar.
http://en.wikipedia.org/wiki/Hash_tree
For example the TPM chip on your motherboard, used for example to ensure
the integrity of the Google Chromebook and Windows BitLocker
http://en.wikipedia.org/wiki/Trusted_Platform_Module
(simplified example: in secure memory it stores H1=hash(microcode), then
H2=hash(H1 || BIOS), then H3=hash(H2 || MBR), then H4=hash(H3 || kernel),
...).
Or the integrity of the bitcoin currency.
https://en.bitcoin.it/wiki/Protocol_specification#Merkle_Trees
So these are pretty different systems, but it all boils down to doing
cryptographic secure hashes over a previous hash + new data to ensure
integrity of the new combined data.  Given only one verified hash in such a
system, no part of the data, nor its history of mutation can be forged.
 "History" can mean which software runs on your computer (TPM), which
transactions are valid (Bitcoin), or which commits have been done in a SCM
(git, mercurial).
So git is not magical, it is just a practical implementation of something
that works.  Any other *general* solution will be based on similar basic
principles.  Mercurial does this and there is a GPG extension for it.
Bazaar does not use a SHA1-based content addressable storage, so while a
signed commit signs the tree, it does not represent the history (no "hash
of hash", only "hash" if you look at it as a merkle tree), but it does
chain commits. To verify a tree + history, *all* commits must be signed,
which is fragile IMO.
Regarding Darcs, my understanding is that it deliberately rejects hashing
the tree, so it is not clear to me how to verify tree+history.  Patches can
be signed, but as long as patches are independent, there is no "hash of
hash" component which makes it difficult to see how one can verify the
tree.  My understanding of darcs is very limited though.
But to be *practical* the rest of the workflow should be secure as well,
so you need:
1. A way to distribute the merkle tree (git pull/clone/push).
    Distribution is of the data that is to be signed is required for
security, because otherwise the representation of the data itself (web view
or 'git diff') can be compromised.  Signatures have no meaning if you
cannot trust that you know what you sign.
2. A way to sign a change to the merkle tre (git commit -S, git tag -s etc)
3. A way to have multiple signatures on a given hash (i.e. commit, or tag,
or whatever it is called in a particular merkle tree implementation).
    This is required to avoid catastrophic "owning" of core developers.
 If required, I do think that multiple signatures can be emulated by a
structured set of commits that have single signatures though.
3. A way to reliably do code reviews on the changes to the data (git diff)
    This is really the same as 1).  We cannot reliably do 'git diff'
unless the developers do it on their own equipment, thus the system must be
distributed.
4. Given the requirement for a distributed merkle tree, some merge
strategy is needed.  It is thus practical, though not required, to have
good support for this.
    (Btw, even the bitcoin hash chain has a merge strategy - the tree with
the most compute power will win, and others are forced to "rebase" their
transactions on that tree)
So my choice of git is not arbitrary.  The way git works is pretty
fundamental to verifying the integrity of stuff.
Though when I have looked through the other options, mercurial might be a
better fit since it is supported on Windows.
Trying to solve this problem from scratch might not be such a good idea,
because it might be very close to a reimplementation of git or mercurial.
 Or maybe it is a good idea for someone who has some time on their hands.
 Just be aware that the requirements for verifying anything is very close
to what existing distributed SCM systems do.
Alexander
On Fri, Feb 1, 2013 at 3:32 AM, Kevin Quick  wrote:
...
Git has the ability to solve all of this.
...
...
2. Uploads to hackage either happen through commits to the git
...
repository,
or an old-style upload to hackage automatically creates a new anonymous
branch in the git repository.
3. The git repository is authorative.  Signing releases, code reviews
etc.
all happens through the git repositories.  This gives us all the
flexibility of a git-style trust model.
...
5. Who owns which package names can be held in a separate meta-tree git
...
repository, and can have consensus requirements on commits.
6. This special meta-tree can also contain suggested verification keys
for
commits to the other hackage git trees.  It can even contain keys that
protect Haskell namespaces in general, so that no hackage package can
overwrite a protected Haskell namespace.
7. As backward compatibility, the meta-tree can sign simple hashes of
already existing packages on hackage.
...
1. There could be some git magic script that downloads the signed git tag
...
objects only (small data set).  Then another script would generate a
git-compatible SHA1 of the extracted tarball, given that the tarball was
fetched from hackage.
2. Or cabal-install could fetch directly from git repositories and use
standard git verification.
3. Or a trusted machine creates tarballs from the git repositories, signs
them and uploads them to hackage.
Without details of git's trust/verification model, it's difficult to see
how this particular SCM tool provides the trust capabilities being
discussed any better than a more focused solution.  Additionally, the use
of git is also difficult for many Windows users (80MB installed footprint,
last I tried).  git has a much broader solution space than simply ensuring
the integrity of package downloads, especially when "there could be some
git magic script" that is still not identified and appears to have the same
insecurities as the package download/upload itself.
Instead of using the "git" solution and looking for problems to solve
with it, IMHO we should work from clearly defined problem to solution in
general terms as our class, and then determine what specific tools
represent an instance of that  solution class.
--
-KQ
______________________________**_________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe