Map with incremental serialization

older
Pretty Printing Libraries with an...

Britt Mathis

10 Oct 2014 10 Oct '14

7:09 p.m.

Hello everyone, I'm not sure if something like this already exists so I figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

Attachments:

attachment.html (text/html — 324 bytes)

Show replies by date

Nikita Volkov

10 Oct 10 Oct

7:29 p.m.

There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure. 2014-10-10 23:09 GMT+04:00 Britt Mathis :

...

Hello everyone, I'm not sure if something like this already exists so I figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Nicolas Trangez

7:37 p.m.

On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...

There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject. Nicolas

...

2014-10-10 23:09 GMT+04:00 Britt Mathis :

...
Hello everyone, I'm not sure if something like this already exists so I figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Britt Mathis

7:55 p.m.

I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez" wrote:

...

On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.

Nicolas

...
2014-10-10 23:09 GMT+04:00 Britt Mathis :

...
Hello everyone, I'm not sure if something like this already exists so

I

...
...
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

David Johnson

8:10 p.m.

I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci. http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h... On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis wrote:

...

I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez" wrote:

...
On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.

Nicolas

...
2014-10-10 23:09 GMT+04:00 Britt Mathis :

...
Hello everyone, I'm not sure if something like this already exists

so I

...
...
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Cell: 1.630.740.8204

Britt Mathis

8:15 p.m.

Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now? On Oct 10, 2014 4:10 PM, "David Johnson" wrote:

...

I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.

http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...

On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis wrote:

...
I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez" wrote:

...
On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.

Nicolas

...
2014-10-10 23:09 GMT+04:00 Britt Mathis :

...
Hello everyone, I'm not sure if something like this already exists

so I

...
...
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Cell: 1.630.740.8204

Alberto G. Corona

8:36 p.m.

I don´t know the new versions, but acid-state is not incremental. it writes everithing. the package TCache is incremental. it uses HasTables instead of a map. It works in the STM monad. And it is incremental with configurable persistence. https://hackage.haskell.org/package/TCache 2014-10-10 22:15 GMT+02:00 Britt Mathis :

...

Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now? On Oct 10, 2014 4:10 PM, "David Johnson" wrote:

...
I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.

http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...

On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis wrote:

...
I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez" wrote:

...
On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.

Nicolas

...
2014-10-10 23:09 GMT+04:00 Britt Mathis :

...
Hello everyone, I'm not sure if something like this already exists

so I

...
...
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Cell: 1.630.740.8204

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Alberto.

David Johnson

8:52 p.m.

Britt, If you use the remote module you will have the ability to create multiple states that can reside on multiple machines, so you won't be limited by the RAM capacity of just one machine, and you can then use RPC calls to query them from a web server or command line app. In order to minimize your memory footprint and keep things fast you could use an IntMap to store your users (i.e. IntMap User). You could also install ekg on the server that broadcasts your state to get a good idea of just how large your state is. Since acid-state transactions are pure they complement parallelism nicely as well. Alberto, Acid state has two logs on disk, one for events and another for checkpoints. Checkpoints represent the entire serialized state. Events are incremental changes on the state. Writing checkpoints is an expensive operation (since the entire state will be written to disk from the MVar), and shouldn't be done very often. Note on usability: If you have long event files, and you are constantly opening and closing your state you will most likely experience poor performance. Since acid-state will attempt to read the entire event log to recreate the state in memory. Someone can correct me if I'm wrong. On Fri, Oct 10, 2014 at 3:15 PM, Britt Mathis wrote:

...

Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now? On Oct 10, 2014 4:10 PM, "David Johnson" wrote:

...
I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.

http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...

On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis wrote:

...
I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well. On Oct 10, 2014 3:37 PM, "Nicolas Trangez" wrote:

...
On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.

Nicolas

...
2014-10-10 23:09 GMT+04:00 Britt Mathis :

...
Hello everyone, I'm not sure if something like this already exists

so I

...
...
figured I would ask here first before I reinvent the wheel. What I need is a data structure that behaves like Data.Map, but is serialized to disk incrementally so I only have the overhead of writing what has changed every time.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Cell: 1.630.740.8204

-- Cell: 1.630.740.8204

Britt Mathis

9:06 p.m.

"Acid state has two logs on disk, one for events and another for checkpoints. " - does this mean that I choose which to do (for instance, I want the state to be saved every 10 minutes or so but obviously just the events), or does acid-state handle this internally? Essentially the app should (in a perfect world) never go offline, but I need something stored on disk to handle the event that it crashes, etc. I want the functionality that a map provides (index by a username, get a user, etc.). I know that a traditional database would give me the fault tolerance I want, but I don't need all of the features of a traditional database like postgre. All I need is some data structure that stores key value pairs, but also writes the state to disk at some interval of time (10 minutes, etc.). I could obviously just dump the entire thing every time but this is going to be a ton of overhead I would like to avoid. Let's say the map is currently [("foo", "bar"), ("baz", "quux")] and this is mirrored on disk. If I add another pair ("spam", "eggs") to the map, at the moment this is only in memory. At some time in the future, I need to replicate this addition in the structure on disk. There are two main ways I could do this (that I know of) - I could write all three pairs, or I could only add ("spam", "eggs") to the structure on disk somehow. This is the behavior I want. On Fri, Oct 10, 2014 at 4:52 PM, David Johnson wrote:

...

Britt,

If you use the remote module you will have the ability to create multiple states that can reside on multiple machines, so you won't be limited by the RAM capacity of just one machine, and you can then use RPC calls to query them from a web server or command line app.

In order to minimize your memory footprint and keep things fast you could use an IntMap to store your users (i.e. IntMap User). You could also install ekg on the server that broadcasts your state to get a good idea of just how large your state is. Since acid-state transactions are pure they complement parallelism nicely as well.

Alberto,

Acid state has two logs on disk, one for events and another for checkpoints. Checkpoints represent the entire serialized state. Events are incremental changes on the state. Writing checkpoints is an expensive operation (since the entire state will be written to disk from the MVar), and shouldn't be done very often.

Note on usability:

If you have long event files, and you are constantly opening and closing your state you will most likely experience poor performance. Since acid-state will attempt to read the entire event log to recreate the state in memory. Someone can correct me if I'm wrong.

On Fri, Oct 10, 2014 at 3:15 PM, Britt Mathis wrote:

...
Someone on irc mentioned acid-state remote (it may have even been you), I will definitely be using it. My only worry is that I only have 512MB of Ram to work with atm, which should be fine until I have several thousand users. I see on the acid state website that I would need to integrate with something else to provide the ability to have a data structure larger than memory (I may be missing or misunderstanding something), would it be trivial to add this in later or should I be worried about it now?

On Oct 10, 2014 4:10 PM, "David Johnson" wrote:

...
I recommend looking into Data.Acid.Remote if you plan on using acid-state. Will allow you to inspect your state using ghci.

http://hackage.haskell.org/package/acid-state-0.12.2/docs/Data-Acid-Remote.h...

On Fri, Oct 10, 2014 at 2:55 PM, Britt Mathis wrote:

...
I was told about acid-state on irc, but I wasn't sure if it did the incremental part - it looks like it will be perfect, thank you. And I will definitely check out that paper as well.

On Oct 10, 2014 3:37 PM, "Nicolas Trangez" wrote:

...
On Fri, 2014-10-10 at 23:29 +0400, Nikita Volkov wrote:

...
There's "acid-state", which provides a log-based persistence model. I.e., it persists the modifications you make to the data structure.

You might also be interested in the paper 'Generic Storage in Haskell' by Sebastiaan Visser, and his thesis about the same subject.

Nicolas

...
2014-10-10 23:09 GMT+04:00 Britt Mathis :

> Hello everyone, I'm not sure if something like this already exists > so I > figured I would ask here first before I reinvent the wheel. What I > need is > a data structure that behaves like Data.Map, but is serialized to > disk > incrementally so I only have the overhead of writing what has > changed every > time. > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Cell: 1.630.740.8204

-- Cell: 1.630.740.8204

Chris Wong

11 Oct 11 Oct

8:12 a.m.

On Sat, Oct 11, 2014 at 10:06 AM, Britt Mathis wrote:

...

"Acid state has two logs on disk, one for events and another for checkpoints. " - does this mean that I choose which to do (for instance, I want the state to be saved every 10 minutes or so but obviously just the events), or does acid-state handle this internally?

It appends events to a journal automatically, but only saves checkpoints when you explicitly tell it to. Usually checkpointing happens when the server shuts down (saveCheckpointAndClose) but you probably want to trigger it manually every night or so. This process runs in the background, so your application can stay online while the checkpoint is saved. Note that saving checkpoints is only an optimization; acid-state can reconstruct your data from the event log alone. Having a recent checkpoint does makes startup/recovery faster though, and saves space on disk. Hackage is a very prominent user of acid-state. You might like to look into how they do things. Chris

Carter Schonwald

2:36 p.m.

Another option would be to use something like sqlite or lmdb as a library. On Oct 11, 2014 4:12 AM, "Chris Wong" wrote:

...

On Sat, Oct 11, 2014 at 10:06 AM, Britt Mathis wrote:

...
"Acid state has two logs on disk, one for events and another for checkpoints. " - does this mean that I choose which to do (for instance, I want the state to be saved every 10 minutes or so but obviously just the events), or does acid-state handle this internally?

It appends events to a journal automatically, but only saves checkpoints when you explicitly tell it to.

Usually checkpointing happens when the server shuts down (saveCheckpointAndClose) but you probably want to trigger it manually every night or so. This process runs in the background, so your application can stay online while the checkpoint is saved.

Note that saving checkpoints is only an optimization; acid-state can reconstruct your data from the event log alone. Having a recent checkpoint does makes startup/recovery faster though, and saves space on disk.

Hackage is a very prominent user of acid-state. You might like to look into how they do things.

Chris _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

4104

Age (days ago)

4105

Last active (days ago)

List overview

Download

10 comments

7 participants

participants (7)

Alberto G. Corona
Britt Mathis
Carter Schonwald
Chris Wong
David Johnson
Nicolas Trangez
Nikita Volkov