Adding Content-Addressable Storage to GHC

18 Mar 2020

      Hi all,

Is there any effort or designs ongoing to add CAS (content-addressable storage) to GHC, as in Unison? <
https://www.unisonweb.org/docs/tour/>

== The idea ==

The summary of the idea is simply that top-level declarations can be addressed by a hash of their contents. Recursive definitions are transformed into the worker/wrapper to eliminate the self-referencing issue of hashing.

== Why I want this ==

There are lots of advantages to this, but the one that excites me the most is that we can move to running tests, especially property tests, at compile-time.

The main downside to running tests at compile-time, as seen done with template-haskell is that you will re-run tests every time the module is recompiled, making your dev cycle slower. However, if your tests are keyed upon CAS hashes, then those hashes are only invalidated when individual declarations actually change. This means the re-running of tests becomes granular at the declaration-level. When a single test completes, either successfully or not, you can cache the result and lookup the result next time, using e.g. the SHA512 of the expression evaluated.

Therefore you could change a single function in a library and it would only re-run the tests that are actually affected, rather than running all the tests in the whole module, and rather than the more typical approach which is running ALL tests in a test suite just because one thing changed.

If you can couple tests with code then you can avoid the decoupling of code from the tests.

== Implementation approaches ==

There are various ways to implement this with varying degrees of satisfaction:

1. Use TH: reify declarations, inspect the AST, and produce a SHA512. Use ambient values such as the GHC version, instances in scope, extensions, ghc options, etc. With TH, I'm confident that you can only achieve an imperfect hash because I doubt that all information is available to TH.

Names that come from external packages could be treated as CAS'd at the scope of the package's installed hash. Ideally, you could have granularity into other packages. But it's not a necessity if you just want caching for your current development package.

2. Use a source plugin. A source plugin is already capable of accessing all GHC context information, so this might lead to more of a perfect hash.

3. Add it to GHC directly. Exposing a `expressionSHA512 :: Exp -> ByteString` could be one imaginary way to access this information. With such a function you could implement caching of fine-grained tests.

A related discussion is the deterministic builds: https://gitlab.haskell.org/ghc/ghc/wikis/deterministic-builds

Anyone else exploring this?

Cheers,

Chris

Chris Done

Alan & Kim Zimmerman

Carter Schonwald

Tom Ellis

tags

participants (4)