Map with incremental serialization

Hello everyone, I'm not sure if something like this already exists so I figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

There's "acid-state", which provides a log-based persistence model. I.e.,
it persists the modifications you make to the data structure.
2014-10-10 23:09 GMT+04:00 Britt Mathis
Hello everyone, I'm not sure if something like this already exists so I figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject. Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: Hello everyone, I'm not sure if something like this already exists so I figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I was told about acid-state on irc, but I wasn't sure if it did the
incremental part - it looks like it will be perfect, thank you. And I will
definitely check out that paper as well.
On Oct 10, 2014 3:37 PM, "Nicolas Trangez"
On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.
Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: Hello everyone, I'm not sure if something like this already exists so
I
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I recommend looking into Data.Acid.Remote if you plan on using acid-state.
Will allow you to inspect your state using ghci.
http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...
On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis
I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez"
wrote: On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.
Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: Hello everyone, I'm not sure if something like this already exists
so I
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Cell: 1.630.740.8204

Someone on irc mentioned acid-state remote (it may have even been you), I
will definitely be using it. My only worry is that I only have 512MB of Ram
to work with atm, which should be fine until I have several thousand users.
I see on the acid state website that I would need to integrate with
something else to provide the ability to have a data structure larger than
memory (I may be missing or misunderstanding something), would it be
trivial to add this in later or should I be worried about it now?
On Oct 10, 2014 4:10 PM, "David Johnson"
I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.
http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...
On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis
wrote: I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez"
wrote: On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.
Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: Hello everyone, I'm not sure if something like this already exists
so I
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Cell: 1.630.740.8204

I don´t know the new versions, but acid-state is not incremental. it writes
everithing.
the package TCache is incremental. it uses HasTables instead of a map. It
works in the STM monad. And it is incremental with configurable persistence.
https://hackage.haskell.org/package/TCache
2014-10-10 22:15 GMT+02:00 Britt Mathis
Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now? On Oct 10, 2014 4:10 PM, "David Johnson"
wrote: I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.
http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...
On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis
wrote: I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez"
wrote: On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.
Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: Hello everyone, I'm not sure if something like this already exists
so I
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Cell: 1.630.740.8204
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Alberto.

Britt,
If you use the remote module you will have the ability to create multiple
states that can reside on multiple machines, so you won't be limited by the
RAM capacity of just one machine, and you can then use RPC calls to query
them from a web server or command line app.
In order to minimize your memory footprint and keep things fast you could
use an IntMap to store your users (i.e. IntMap User). You could also
install ekg on the server that broadcasts your state to get a good idea of
just how large your state is. Since acid-state transactions are pure they
complement parallelism nicely as well.
Alberto,
Acid state has two logs on disk, one for events and another for
checkpoints. Checkpoints represent the entire serialized state. Events are
incremental changes on the state. Writing checkpoints is an expensive
operation (since the entire state will be written to disk from the MVar),
and shouldn't be done very often.
Note on usability:
If you have long event files, and you are constantly opening and closing
your state you will most likely experience poor performance. Since
acid-state will attempt to read the entire event log to recreate the state
in memory. Someone can correct me if I'm wrong.
On Fri, Oct 10, 2014 at 3:15 PM, Britt Mathis
Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now? On Oct 10, 2014 4:10 PM, "David Johnson"
wrote: I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.
http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...
On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis
wrote: I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez"
wrote: On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.
Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: Hello everyone, I'm not sure if something like this already exists
so I
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Cell: 1.630.740.8204
-- Cell: 1.630.740.8204

"Acid state has two logs on disk, one for events and another for
checkpoints. " - does this mean that I choose which to do (for
instance, I want the state to be saved every 10 minutes or so but
obviously just the events), or does acid-state handle this internally?
Essentially the app should (in a perfect world) never go offline, but
I need something stored on disk to handle the event that it crashes,
etc. I want the functionality that a map provides (index by a
username, get a user, etc.). I know that a traditional database would
give me the fault tolerance I want, but I don't need all of the
features of a traditional database like postgre. All I need is some
data structure that stores key value pairs, but also writes the state
to disk at some interval of time (10 minutes, etc.). I could obviously
just dump the entire thing every time but this is going to be a ton of
overhead I would like to avoid.
Let's say the map is currently [("foo", "bar"), ("baz", "quux")] and
this is mirrored on disk. If I add another pair ("spam", "eggs") to
the map, at the moment this is only in memory. At some time in the
future, I need to replicate this addition in the structure on disk.
There are two main ways I could do this (that I know of) - I could
write all three pairs, or I could only add ("spam", "eggs") to the
structure on disk somehow. This is the behavior I want.
On Fri, Oct 10, 2014 at 4:52 PM, David Johnson
Britt,
If you use the remote module you will have the ability to create multiple states that can reside on multiple machines, so you won't be limited by the RAM capacity of just one machine, and you can then use RPC calls to query them from a web server or command line app.
In order to minimize your memory footprint and keep things fast you could use an IntMap to store your users (i.e. IntMap User). You could also install ekg on the server that broadcasts your state to get a good idea of just how large your state is. Since acid-state transactions are pure they complement parallelism nicely as well.
Alberto,
Acid state has two logs on disk, one for events and another for checkpoints. Checkpoints represent the entire serialized state. Events are incremental changes on the state. Writing checkpoints is an expensive operation (since the entire state will be written to disk from the MVar), and shouldn't be done very often.
Note on usability:
If you have long event files, and you are constantly opening and closing your state you will most likely experience poor performance. Since acid-state will attempt to read the entire event log to recreate the state in memory. Someone can correct me if I'm wrong.
On Fri, Oct 10, 2014 at 3:15 PM, Britt Mathis
wrote: Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now?
On Oct 10, 2014 4:10 PM, "David Johnson"
wrote: I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.
http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...
On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis
wrote: I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well.
On Oct 10, 2014 3:37 PM, "Nicolas Trangez"
wrote: On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.
You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.
Nicolas
2014-10-10 23:09 GMT+04:00 Britt Mathis
: > Hello everyone, I'm not sure if something like this already exists > so I > figured I would ask here first before I reinvent the wheel. What I > need is > a data structure that behaves like Data.Map, but is serialized to > disk > incrementally so I only have the overhead of writing what has > changed every > time. > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Cell: 1.630.740.8204
-- Cell: 1.630.740.8204

On Sat, Oct 11, 2014 at 10:06 AM, Britt Mathis
"Acid state has two logs on disk, one for events and another for checkpoints. " - does this mean that I choose which to do (for instance, I want the state to be saved every 10 minutes or so but obviously just the events), or does acid-state handle this internally?
It appends events to a journal automatically, but only saves checkpoints when you explicitly tell it to. Usually checkpointing happens when the server shuts down (saveCheckpointAndClose) but you probably want to trigger it manually every night or so. This process runs in the background, so your application can stay online while the checkpoint is saved. Note that saving checkpoints is only an optimization; acid-state can reconstruct your data from the event log alone. Having a recent checkpoint does makes startup/recovery faster though, and saves space on disk. Hackage is a very prominent user of acid-state. You might like to look into how they do things. Chris

Another option would be to use something like sqlite or lmdb as a library.
On Oct 11, 2014 4:12 AM, "Chris Wong"
On Sat, Oct 11, 2014 at 10:06 AM, Britt Mathis
wrote: "Acid state has two logs on disk, one for events and another for checkpoints. " - does this mean that I choose which to do (for instance, I want the state to be saved every 10 minutes or so but obviously just the events), or does acid-state handle this internally?
It appends events to a journal automatically, but only saves checkpoints when you explicitly tell it to.
Usually checkpointing happens when the server shuts down (saveCheckpointAndClose) but you probably want to trigger it manually every night or so. This process runs in the background, so your application can stay online while the checkpoint is saved.
Note that saving checkpoints is only an optimization; acid-state can reconstruct your data from the event log alone. Having a recent checkpoint does makes startup/recovery faster though, and saves space on disk.
Hackage is a very prominent user of acid-state. You might like to look into how they do things.
Chris _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (7)
-
Alberto G. Corona
-
Britt Mathis
-
Carter Schonwald
-
Chris Wong
-
David Johnson
-
Nicolas Trangez
-
Nikita Volkov