
I have a Map. It's a lovely Map, with keys and values and everything. It's not _that_ large. Few 10s of MB at most. Unfortunately, I need to persist it somewhat reliably. I'd somewhat like to avoid having to use an external database (obviously a key/value store like Riak would work, but that's a major dependency to impose on the system) so I'm wondering if there is a low tech way to do this. I can control concurrent access to the file (or whatever), and the file system is robust. So that part is fine. I just need to externalize the map. I'm wondering if just using cereal or so would be sufficient (there is a Serialize instance, of course), or whether I should be using some acid-state thing, or a Haskell binding to gdbm, or sqlite, or... Any suggestions? AfC Sydney

Maybe acid-state [1]? [1] http://hackage.haskell.org/package/acid-state-0.12.1/docs/Data-Acid.html On Wed, Dec 18, 2013 at 7:58 PM, Andrew Cowie < andrew@operationaldynamics.com> wrote:
I have a Map. It's a lovely Map, with keys and values and everything. It's not _that_ large. Few 10s of MB at most. Unfortunately, I need to persist it somewhat reliably.
I'd somewhat like to avoid having to use an external database (obviously a key/value store like Riak would work, but that's a major dependency to impose on the system) so I'm wondering if there is a low tech way to do this.
I can control concurrent access to the file (or whatever), and the file system is robust. So that part is fine. I just need to externalize the map.
I'm wondering if just using cereal or so would be sufficient (there is a Serialize instance, of course), or whether I should be using some acid-state thing, or a Haskell binding to gdbm, or sqlite, or...
Any suggestions?
AfC Sydney
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Clark. Key ID : 0x78099922 Fingerprint: B292 493C 51AE F3AB D016 DD04 E5E3 C36F 5534 F907

Hi.
On 19 December 2013 00:58, Andrew Cowie
I'm wondering if just using cereal or so would be sufficient (there is a Serialize instance, of course), or whether I should be using some acid-state thing, or a Haskell binding to gdbm, or sqlite, or...
I've used cereal without any problems for a similar purpose before. I'd say go for it: it is very easy to use and pretty fast as well. Ozgur

On 19 December 2013 11:58, Andrew Cowie
I have a Map. It's a lovely Map, with keys and values and everything. It's not _that_ large. Few 10s of MB at most. Unfortunately, I need to persist it somewhat reliably.
I'd somewhat like to avoid having to use an external database (obviously a key/value store like Riak would work, but that's a major dependency to impose on the system) so I'm wondering if there is a low tech way to do this.
I can control concurrent access to the file (or whatever), and the file system is robust. So that part is fine. I just need to externalize the map.
I'm wondering if just using cereal or so would be sufficient (there is a Serialize instance, of course), or whether I should be using some acid-state thing, or a Haskell binding to gdbm, or sqlite, or...
Any suggestions?
Do you need concurrent access to the on-disk map, or will you just load, modify, then store explicitly? Do you care about data corruption, eg. if your program/hardware fails during writing? I'd probably just use sqlite via something like esqueleto, because it's easy to inspect the stored data outside of your program. Conrad.

Hey Conrad, On Thu, 2013-12-19 at 14:24 +1100, Conrad Parker wrote:
Do you need concurrent access to the on-disk map, or will you just load, modify, then store explicitly?
The latter. There will be occasional updates to the map, but I can signal the workers using it to reload and/or restart them periodically.
Do you care about data corruption, eg. if your program/hardware fails during writing?
My backing store gives me atomic writes, so that's not a problem here [makes lots of problems go away. Remarkable, really].
I'd probably just use sqlite via something like esqueleto, because it's easy to inspect the stored data outside of your program.
Yeah, if external inspection were necessary that'd definitely be a good reason to go that way for sure. The report from Ozgur that just serializing out a Map structure was workable is encouraging, though. I'll start with that. AfC Sydney

Andrew Cowie
Yeah, if external inspection were necessary that'd definitely be a good reason to go that way for sure. The report from Ozgur that just serializing out a Map structure was workable is encouraging, though. I'll start with that.
Pardon the digression, but I'd just like to appreciate this way of thinking. There's a rant by Bob Martin [1] that concludes:
"We are heading into an interesting time. A time when the prohibition against different data storage mechanisms has been lifted, and we are free to experiment with many novel new approaches. But as we play with our CouchDBs and our Mongos and BigTables, remember this: The database is just a detail that you don’t need to figure out right away."
A project I'm working on uses a persistent append-only list, which is currently implemented like this, almost verbatim: async . forever $ atomically (readTChan queue) >>= writeFile path . Aeson.encode Files are trivial to back up and generally easy to work with. Since it's just JSON, I can grep and mess with it easily with command-line tools. And since the writing is done in a separate thread reading from a queue, I don't need to worry about locking. I think this will be alright for a good while, and when the project outgrows it, I'll just migrate to some other solution. Probably acid-state, because the version migration stuff seems really useful. [1]: Bob Martin's rant "No DB", http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html -- Mikael Brockman

Thanks for the reference. I agree with the rant word by word.
I use tcache http://hackage.haskell.org/package/TCache. It is a cache
with access and update in the STM monad and each element can have its own
persistence, defined by the programmer. So an element can be the result of
a web service request for example from AWS, another from a database and a
third from anywhere. the three can participate in the same STM transaction
in memory and update their respective storages, if they are modified.
These are the kinds of things are not possible in conventional databases.
It is easy to create a almost a perfect product if you establishes the
rules of perfection and you sit at the center of the development process
that is what the SQL databases did for a long time. The DBs stayed at the
protective womb of the back-office, with a few queries per second and being
consistent with themselves and with nothing else. Now things have changed.
We need their STM transactions working for us close to fresh application
data at full speed, not in the backoffice. We need our data spread across
different locations. We have no other option. We need to synchronize and
integrate more than ever, so we need software and developers that can
figure out what the data is about by looking at it, so the schema must be
implicit in the data and so on.
2013/12/19 Mikael Brockman
Andrew Cowie
writes: Yeah, if external inspection were necessary that'd definitely be a good reason to go that way for sure. The report from Ozgur that just serializing out a Map structure was workable is encouraging, though. I'll start with that.
Pardon the digression, but I'd just like to appreciate this way of thinking. There's a rant by Bob Martin [1] that concludes:
"We are heading into an interesting time. A time when the prohibition against different data storage mechanisms has been lifted, and we are free to experiment with many novel new approaches. But as we play with our CouchDBs and our Mongos and BigTables, remember this: The database is just a detail that you don’t need to figure out right away."
A project I'm working on uses a persistent append-only list, which is currently implemented like this, almost verbatim:
async . forever $ atomically (readTChan queue) >>= writeFile path . Aeson.encode
Files are trivial to back up and generally easy to work with. Since it's just JSON, I can grep and mess with it easily with command-line tools. And since the writing is done in a separate thread reading from a queue, I don't need to worry about locking.
I think this will be alright for a good while, and when the project outgrows it, I'll just migrate to some other solution. Probably acid-state, because the version migration stuff seems really useful.
[1]: Bob Martin's rant "No DB", http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html
-- Mikael Brockman
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Alberto.
participants (6)
-
Alberto G. Corona
-
Andrew Cowie
-
Clark Gaebel
-
Conrad Parker
-
Mikael Brockman
-
Ozgur Akgun