Fingerprinting Haskell Objects

7 Oct 2014

      Hello everybody,

I have a little question I wanted to run by the folks here. I've run into
it several times over the past few years and would love to lock down a good
answer.

What's the best way to "fingerprint" a Haskell object into, say,
ByteString, so that this fingerprint can be used as the "lookup key" in a
database (for example) and be trusted that it will remain constant over
time even as the underlying libraries evolve?

Here's a simple example:

   - Say I'm building a manual index on top of a key-value store (redis,
   dynamodb, etc.)

   - I want my keys to be arbitrary tuples (or similar records) that may
   contain various fields in them

   - I would like to avoid ad-hoc, hand-written MyTuple -> ByteString and
   ByteString -> MyTuple conversions. However, Generic derivations,
   template-haskell, etc. are acceptable

   - Notice how your fingerprint, which is used as a lookup key in the
   database, has to remain stationary. If it changes even by a single bit over
   time for the same MyTuple, the key-value store will NOT be able to find the
   index associated with MyTuple at this later time

Here are some ideas (and related concepts) I've considered and used over
the years:

   - Hand-write a "Prism' MyTuple ByteString". This works, but is tedious
   and error-prone.

   - Use Serialize/Binary and trust that the encode/decode pair will
   produce results consistently in 5 years (dangerous territory!)

   - Use SafeCopy, which is great for ensuring timeless decoding of the
   *value* in the index, but can we be sure that fingerprint (MyTuple ->
   ByteString) conversion is persistent? What if SafeCopy authors one day
   decide to encode tuples differently? They would write the migrations to
   transparently handle legacy code for *values*, but not for *keys*. Also
   notice here how migrations help with the ByteString -> MyTuple leg, but do
   not ensure MyTuple -> ByteString produces the same ByteString over time.

   - Hashable would've been nice, but there is NO guarantee of persistent
   results, even across multiple runs of the same code

What would be your preferred solution?

Thank you,
Oz

Ozgun Ataman

Alexander Kjeldaas

Kyle Marek-Spartz

Alexander Kjeldaas

Ozgun Ataman

tags

participants (3)