DB vs read/show for persisting large data

Hi, It has been on my todo list for some time now. I'd like to write a GTD tool that has dependency tracking support. Haskell seems like a good choice for this. I was wondering if there has been any past attempts with this? One thing that has been bothering me has been this - the persistence of data. Should I use sqlite(or someother DB) or should I use Haskell's read/show functions to read from and write to a file? I am slightly not inclined towards NOT using DB because I want to implement all the business logic in Haskell. I want to avoid having to generate SQL. It'll be great if I could get some feedback on the "read/show" approach - is this even a viable option? Regards, Kashyap

On Wed, Dec 14, 2011 at 3:31 PM, C K Kashyap
Hi,
It has been on my todo list for some time now. I'd like to write a GTD tool that has dependency tracking support. Haskell seems like a good choice for this. I was wondering if there has been any past attempts with this?
One thing that has been bothering me has been this - the persistence of data. Should I use sqlite(or someother DB) or should I use Haskell's read/show functions to read from and write to a file? I am slightly not inclined towards NOT using DB because I want to implement all the business logic in Haskell. I want to avoid having to generate SQL.
It'll be great if I could get some feedback on the "read/show" approach - is this even a viable option?
Regards, Kashyap
Definite *don't* use read/show: if you make any updates to your data structures, all old files will be lost. I would recommend either using some standard file format (JSON/YAML... or even XML if you like) or using a database. If you want to avoid writing SQL, Persistent[1] may be a good fit. Michael [1] http://www.yesodweb.com/book/persistent

Excerpts from Michael Snoyman's message of Wed Dec 14 14:34:30 +0100 2011:
On Wed, Dec 14, 2011 at 3:31 PM, C K Kashyap
wrote: Definite *don't* use read/show: if you make any updates to your data structures, all old files will be lost.
Well you can work around it: data MyDataV1 = { name :: String } deriving (Read,Show) then you make an update: data MyDataV2 = { name :: String, age : Int } deriving (Read,Show) then you can do let (v1 :: MyDataV1) = tryReadDataToMaybe data let (v2 :: MyDataV2) = tryReadDataToMaybe data let real_data = upgrade v1 `or` v2 But you already see that you start writing boilerplate code. It can be done for easy data structures .. But it soon will be a night mare if you have complex data. If you use a version control system you don't loose your data - it will just be "hard to update". For prototyping deriving binary or read/show instances are a nice way to get started. serialization to JSON/XML can be implemented later when you change your data format as well eventually. So it depends on your task. If you want to use read/show etc you have to think about file locking and such. Have a look at the "derive" package (hackage) which can derive more instances than just read/show (eg json). You can still use a sqlite database use it as binary storage... Depends on whether all your data fits into memory. Marc Weber

On 14/12/11 13:59, Marc Weber wrote:
Excerpts from Michael Snoyman's message of Wed Dec 14 14:34:30 +0100 2011:
On Wed, Dec 14, 2011 at 3:31 PM, C K Kashyap
wrote: Definite *don't* use read/show: if you make any updates to your data structures, all old files will be lost. Well you can work around it:
data MyDataV1 = { name :: String } deriving (Read,Show)
then you make an update:
data MyDataV2 = { name :: String, age : Int } deriving (Read,Show)
then you can do let (v1 :: MyDataV1) = tryReadDataToMaybe data let (v2 :: MyDataV2) = tryReadDataToMaybe data let real_data = upgrade v1 `or` v2
But you already see that you start writing boilerplate code. It can be done for easy data structures .. But it soon will be a night mare if you have complex data.
If you use a version control system you don't loose your data - it will just be "hard to update". [snip]
I ran into this very nightmare in one project, and was recommend safecopy [0] by someone on the #haskell IRC channel. I've not (yet) used it but it looks very nice! [0] http://hackage.haskell.org/package/safecopy Claude

On Wed, Dec 14, 2011 at 4:22 PM, Claude Heiland-Allen
On 14/12/11 13:59, Marc Weber wrote:
Excerpts from Michael Snoyman's message of Wed Dec 14 14:34:30 +0100 2011:
On Wed, Dec 14, 2011 at 3:31 PM, C K Kashyap
wrote: Definite *don't* use read/show: if you make any updates to your data structures, all old files will be lost. Well you can work around it:
data MyDataV1 = { name :: String } deriving (Read,Show)
then you make an update:
data MyDataV2 = { name :: String, age : Int } deriving (Read,Show)
then you can do let (v1 :: MyDataV1) = tryReadDataToMaybe data let (v2 :: MyDataV2) = tryReadDataToMaybe data let real_data = upgrade v1 `or` v2
But you already see that you start writing boilerplate code. It can be done for easy data structures .. But it soon will be a night mare if you have complex data.
If you use a version control system you don't loose your data - it will just be "hard to update".
[snip]
I ran into this very nightmare in one project, and was recommend safecopy [0] by someone on the #haskell IRC channel. I've not (yet) used it but it looks very nice!
That also happens to be one of the best module descriptions I've ever read. Michael

On 14 December 2011 15:22, Claude Heiland-Allen
I ran into this very nightmare in one project, and was recommend safecopy [0] by someone on the #haskell IRC channel. I've not (yet) used it but it looks very nice!
Or better yet, use acid-state which is build on top of safecopy: http://acid-state.seize.it/ Bas

Thank you so much .. I am going to try out acid-state. I've been shying
away from template-haskell ... but from the looks of it,
acid-state/safecopy can do what I am looking for.
Regards,
Kashyap
On Thu, Dec 15, 2011 at 12:13 AM, Bas van Dijk
On 14 December 2011 15:22, Claude Heiland-Allen
wrote: I ran into this very nightmare in one project, and was recommend safecopy [0] by someone on the #haskell IRC channel. I've not (yet) used it but it looks very nice!
Or better yet, use acid-state which is build on top of safecopy:
Bas

If what bothers you is writing SQL code (and I could easily understand),
you may wanna check persistent. It uses Template Haskell to generate for
you the necessary marshalling and tables definition, so you just handle
haskell datatypes.
(^^ Michael just outposted [1] me).
For json serialization, aeson (normal or enumerator-based flavour) might be
nice.
For brutal binary serialization, you may like binary or cereal (I don't
know the dis/advantages of both, except that the last time I checked,
cereal only handled strict bytestrings).
For XML I don't know, since use it the least I can.
[1] I don't know if there is such a word. Sorry, I'm french.
2011/12/14 C K Kashyap
Hi,
It has been on my todo list for some time now. I'd like to write a GTD tool that has dependency tracking support. Haskell seems like a good choice for this. I was wondering if there has been any past attempts with this?
One thing that has been bothering me has been this - the persistence of data. Should I use sqlite(or someother DB) or should I use Haskell's read/show functions to read from and write to a file? I am slightly not inclined towards NOT using DB because I want to implement all the business logic in Haskell. I want to avoid having to generate SQL.
It'll be great if I could get some feedback on the "read/show" approach - is this even a viable option?
Regards, Kashyap
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I got mixed up with something else : forget about enumerator-based version
of aeson, it does not exist.
2011/12/14 Yves Parès
If what bothers you is writing SQL code (and I could easily understand), you may wanna check persistent. It uses Template Haskell to generate for you the necessary marshalling and tables definition, so you just handle haskell datatypes. (^^ Michael just outposted [1] me).
For json serialization, aeson (normal or enumerator-based flavour) might be nice.
For brutal binary serialization, you may like binary or cereal (I don't know the dis/advantages of both, except that the last time I checked, cereal only handled strict bytestrings).
For XML I don't know, since use it the least I can.
[1] I don't know if there is such a word. Sorry, I'm french.
2011/12/14 C K Kashyap
Hi,
It has been on my todo list for some time now. I'd like to write a GTD tool that has dependency tracking support. Haskell seems like a good choice for this. I was wondering if there has been any past attempts with this?
One thing that has been bothering me has been this - the persistence of data. Should I use sqlite(or someother DB) or should I use Haskell's read/show functions to read from and write to a file? I am slightly not inclined towards NOT using DB because I want to implement all the business logic in Haskell. I want to avoid having to generate SQL.
It'll be great if I could get some feedback on the "read/show" approach - is this even a viable option?
Regards, Kashyap
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2011/12/14 Yves Parès
For brutal binary serialization, you may like binary or cereal (I don't know the dis/advantages of both, except that the last time I checked, cereal only handled strict bytestrings).
BTW, if we can cope with strict bytestrings (if we don't have a too big input), would you recommend binary or cereal? What would you use to auto-derive the Binary/Serialize classes? The 'derive' package? The problem is that it has a lot of dependencies you maybe don't need if you jut want serialization, plus it relies on TH so it grows both compilation time and executable size.

On 14 December 2011 15:02, Yves Parès
The 'derive' package? The problem is that it has a lot of dependencies you maybe don't need if you jut want serialization, plus it relies on TH so it grows both compilation time and executable size.
Well you can use the stand alone executable to generate the code for instances and only ship that with your package. No added dependencies at all. Ozgur
participants (8)
-
Bas van Dijk
-
C K Kashyap
-
Claude Heiland-Allen
-
Marc Weber
-
Michael Snoyman
-
Ozgur Akgun
-
Yitzchak Gale
-
Yves Parès