Consensus about databases / serialization

Peter Verswyvelen

2 Jan 2008 2 Jan '08

11:50 a.m.

As I'm a selfmade man, I never really studied relational databases in detail. My intuition told me that the "relational" part was not really suitable for the 3D data, 2D images, animation curves, state machines, and other data I encountered in the videogame and animation business. I could always get away with files, and for the applications I needed to deploy, plugging in a couple of extra gigabytes of RAM and serializing the "object" state to disk was more practical, cheaper and faster. However, a couple of years ago I started studying computer science (I seem to do the theory after the practice, weird behavior ;-) at the Open University, and one of the exams I did was about databases. Initially this course convinced me that databases are actually very nice, but the course ended with a topic on object oriented databases, which were designed to make storing data like "3D models, graphs, networks, and complex datastructures" more practical. Duh. Since then, I did deploy a few commercial applications for customers using databases, which worked fine for the typical "simple/flat" database data. I hated embedding a dynamic untyped language like SQL, as much as I hated embedding code in HTML or XML. IMHO it feels UGLY and unsafe. Regarding the other popular data storage format - XML - I did use that a lot, but it seems like going back to the stone ages, when hierarchical stores/databases got invented (and ditched?) Now, initially after an introduction to Microsoft's LINQ, and recently having read a very brief overview of HAppS, it seems I'm not the only one with those "feelings". Ouch, this introduction got way to long, sorry about that ;-) Finally some practical questions: . regarding Haskell and databases, the page http://haskell.org/haskellwiki/Libraries_and_tools/Database_interfaces describes a few, but which are the ones that are stable and practical? Any user experiences? . HApps is not listed in the page above, because it does not use databases? Is HApps reliable or experimental, and does it scale well? Any success stories? . regarding Haskell and serialization, I don't think that implementing Read/Show is a good way for real serialization, so what other options exist? I could find some libraries at http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Data, but again which are the most practical and stable? When programming in C++/MFC and C#/.NET, I tended to develop my own serialization frameworks because I used that for many things, like logging commands to disk, performing undo/redo, intra and inter process cut/copy/paste, save/load, etc. . Regarding serialization, I'm kinda curious how ADTs and even GADTs are stored and retrieved in a relational database? I guess it could be done using BLOBs and serialization to ByteStrings, so bypassing a lot of the database table structures? . If I would want to experiment with say HAppS, the way I understand it, I first would first have to study "Scratch your boilerplate" and Template Haskell, and maybe some other language features? I'm still new to Haskell, and the road to understanding all language elements and extensions is very long, so sequentially learning it would be insane I guess. I have no practical experience with TH, but I spent a long time trying to do "aspect oriented programming" in C# without success, so TH looks uber to me. Thanks a lot and best wishes for 2008? Peter

Attachments:

attachment.html (text/html — 8.9 KB)

Show replies by date

Salvatore Insalaco

2 Jan 2 Jan

12:20 p.m.

...

· regarding Haskell and databases, the page http://haskell.org/haskellwiki/Libraries_and_tools/Database_interfaces describes a few, but which are the ones that are stable and practical? Any user experiences?

During my experiments I found Takusen (http://darcs.haskell.org/takusen/) and HDBC (http://software.complete.org/hdbc) very useful, even if I liked Takusen interface more.

...

· regarding Haskell and serialization, I don't think that implementing Read/Show is a good way for real serialization, so what other options exist?

I could suggest Data.Binary (http://code.haskell.org/binary/), that is very well performing and supported. There are ways to generate instances of Binary automatically. I like the "Derive" approach most (http://www.cs.york.ac.uk/fp/darcs/derive/derive.htm), as it uses Template Haskell and does not require separate pre-processing.

Cristian Baboi

2:18 p.m.

I recommend you read "Extending the database relational model to capture more meaning" by E.F. Codd. On Wed, 02 Jan 2008 13:50:46 +0200, Peter Verswyvelen wrote:

...

As I'm a selfmade man, I never really studied relational databases in detail. My intuition told me that the "relational" part was not really suitable for the 3D data, 2D images, animation curves, state machines, and other data I encountered in the videogame and animation business. I could always get away with files, and for the applications I needed to deploy, plugging in a couple of extra gigabytes of RAM and serializing the "object" state to disk was more practical, cheaper and faster.

However, a couple of years ago I started studying computer science (I seem to do the theory after the practice, weird behavior ;-) at the Open University, and one of the exams I did was about databases. Initially this course convinced me that databases are actually very nice, but the course ended with a topic on object oriented databases, which were designed to make storing data like "3D models, graphs, networks, and complex datastructures" more practical. Duh.

Since then, I did deploy a few commercial applications for customers using databases, which worked fine for the typical "simple/flat" database data. I hated embedding a dynamic untyped language like SQL, as much as I hated embedding code in HTML or XML. IMHO it feels UGLY and unsafe. Regarding the other popular data storage format - XML - I did use that a lot, but it seems like going back to the stone ages, when hierarchical stores/databases got invented (and ditched?)

Now, initially after an introduction to Microsoft's LINQ, and recently having read a very brief overview of HAppS, it seems I'm not the only one with those "feelings".

Ouch, this introduction got way to long, sorry about that ;-)

Finally some practical questions:

. regarding Haskell and databases, the page http://haskell.org/haskellwiki/Libraries_and_tools/Database_interfaces describes a few, but which are the ones that are stable and practical? Any user experiences?

. HApps is not listed in the page above, because it does not use databases? Is HApps reliable or experimental, and does it scale well? Any success stories?

. regarding Haskell and serialization, I don't think that implementing Read/Show is a good way for real serialization, so what other options exist? I could find some libraries at http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Data, but again which are the most practical and stable? When programming in C++/MFC and C#/.NET, I tended to develop my own serialization frameworks because I used that for many things, like logging commands to disk, performing undo/redo, intra and inter process cut/copy/paste, save/load, etc.

. Regarding serialization, I'm kinda curious how ADTs and even GADTs are stored and retrieved in a relational database? I guess it could be done using BLOBs and serialization to ByteStrings, so bypassing a lot of the database table structures?

. If I would want to experiment with say HAppS, the way I understand it, I first would first have to study "Scratch your boilerplate" and Template Haskell, and maybe some other language features? I'm still new to Haskell, and the road to understanding all language elements and extensions is very long, so sequentially learning it would be insane I guess. I have no practical experience with TH, but I spent a long time trying to do "aspect oriented programming" in C# without success, so TH looks uber to me.

Thanks a lot and best wishes for 2008?

Peter

Jeff Polakow

2:54 p.m.

Hello, I use HDBC for ODBC database access, and HAppS as a web server. I am fairly happy with both. Here are some further thoughts...

...

Finally some practical questions: · regarding Haskell and databases, the page http://haskell. org/haskellwiki/Libraries_and_tools/Database_interfaces describes a few, but which are the ones that are stable and practical? Any user experiences?

HDBC is fairly stable (although its ODBC driver crashes ghc 6.8 on windows). I think HSQL is similarly stable. Takusen offers a slightly higher-level interface and some performance guarantees; it is a nice system but lacks support for ODBC (supposedly this is in the works). HaskelDB is probably the ideal database access system for Haskell, however the distribution was in bad shape (no documentation, hard to compile, etc.) the last I looked maybe 6 months ago.

...

· HApps is not listed in the page above, because it does not use databases? Is HApps reliable or experimental, and does it scale well? Any success stories?

HAppS is a general server framework for Haskell. HAppS is very appealing because it allows you to dynamically create pages directly with Haskell. HAppS encourages storing your server state in memory, but it is easy to read in state on the fly from external sources. The only caveat with HAppS is that the system has been in active development for the past few months is just starting (I hope) to settle down; thus useful documentation/examples are hard to find, but the HAppS developers are pretty good at replying to help requests on the HAppS IRC and the HAppS mailing list. I am currently using an old (and stable) version of HAppS but expect to upgrade to the latest version soon.

...

· If I would want to experiment with say HAppS, the way I understand it, I first would first have to study ?Scratch your boilerplate? and Template Haskell, and maybe some other language features? I?m still new to Haskell, and the road to understanding all language elements and extensions is very long, so sequentially learning it would be insane I guess. I have no practical experience with TH, but I spent a long time trying to do ?aspect oriented programming? in C# without success, so TH looks uber to me?

While HAppS does use SYB and TH, you don't need to understand them to effectively use HAppS; of course you'll need to understand them, at least basic TH, to understand the details of what HAppS is doing. hope that helps, Jeff --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Steve Lihn

4:23 p.m.

...

Hello, I use HDBC for ODBC database access, and HAppS as a web server. I am fairly happy with both. Here are some further

I have started documenting the Database Wikibook, in particular, about HDBC. It is still very rough at this time, but something is better than nothing :-) If you want to add more content, certainly welcome! http://en.wikibooks.org/wiki/Haskell/Database On 1/2/08, Jeff Polakow wrote: thoughts...

...

...
Finally some practical questions:

...

...
· regarding Haskell and databases, the page http://haskell.

org/haskellwiki/Libraries_and_tools/Database_interfaces describes a

few, but which are the ones that are stable and practical? Any user

...

experiences?

...

...

...

HDBC is fairly stable (although its ODBC driver crashes ghc 6.8 on windows). I think HSQL is similarly stable. Takusen offers a slightly higher-level interface and some performance guarantees; it is a nice system but lacks support for ODBC (supposedly this is in the works). HaskelDB is probably the ideal database access system for Haskell, however the distribution was in bad shape (no documentation, hard to compile, etc.) the last I looked maybe 6 months ago.

Justin Bailey

11:22 p.m.

I can speak to haskelldb a little, see below: On Jan 2, 2008 3:50 AM, Peter Verswyvelen wrote:

...

· regarding Haskell and databases, the page http://haskell.org/haskellwiki/Libraries_and_tools/Database_interfaces describes a few, but which are the ones that are stable and practical? Any user experiences?

I started looking at haskell database libraries to generate SQL for me. Haskelldb does this well - it uses a higher-level representation of queries based on "relational algebra" (also the basis of SQL) which is pretty easy to understand if you know SQL. It takes care of a lof the details of generating SQL strings, and does it in a mostly type-safe way. It is a bit complicated to install the library and all its dependencies, because it can work with 3+ (mysql, postgres, odbc) databases using two different backends (hdbc and hsql). I chose to go with HDBC because it compiled on Windows and postgres because thats what we have at my workplace. Once I got it built and installed its worked well for me. Until the most recent versions though, it added a "distinct" operator to all select statements. I submitted a patch which was accepted and now that behavior is no longer the default. It is semi-actively maintained by the original authors and Bjorn, at least, has been very responsive to my queries on the haskelldb-users mailing list. He also has made minor updates to keep it compiling with the latest GHC and Cabal. Hope that helps! Justin

Peter Verswyvelen

3 Jan 3 Jan

1:58 p.m.

Looks good! I liked relational algebra much much more than SQL, so I'll certainly have to look into that. Thanks, Peter Justin Bailey wrote:

...

I can speak to haskelldb a little, see below:

On Jan 2, 2008 3:50 AM, Peter Verswyvelen wrote:

...
· regarding Haskell and databases, the page http://haskell.org/haskellwiki/Libraries_and_tools/Database_interfaces describes a few, but which are the ones that are stable and practical? Any user experiences?

I started looking at haskell database libraries to generate SQL for me. Haskelldb does this well - it uses a higher-level representation of queries based on "relational algebra" (also the basis of SQL) which is pretty easy to understand if you know SQL. It takes care of a lof the details of generating SQL strings, and does it in a mostly type-safe way.

It is a bit complicated to install the library and all its dependencies, because it can work with 3+ (mysql, postgres, odbc) databases using two different backends (hdbc and hsql). I chose to go with HDBC because it compiled on Windows and postgres because thats what we have at my workplace. Once I got it built and installed its worked well for me.

Until the most recent versions though, it added a "distinct" operator to all select statements. I submitted a patch which was accepted and now that behavior is no longer the default. It is semi-actively maintained by the original authors and Bjorn, at least, has been very responsive to my queries on the haskelldb-users mailing list. He also has made minor updates to keep it compiling with the latest GHC and Cabal.

Hope that helps!

Justin

Yitzchak Gale

2:45 p.m.

Peter Verswyvelen wrote:

...

Looks good! I liked relational algebra much much more than SQL, so I'll certainly have to look into that.

I agree. I have not tried haskelldb yet, but I would like to. My impression from some previous posts is that because of the high-level approach, it is difficult to control the precise SQL that is generated. In practice, you almost always have to do some tweaking that is at least DB-dependent, and often application dependent. Is there any way to do that in haskelldb? If not, is there an obvious way to add it? Thanks, Yitz

Peter Verswyvelen

3:56 p.m.

Yitz wrote:

...

My impression from some previous posts is that because of the high-level approach, it is difficult to control the precise SQL that is generated. In practice, you almost always have to do some tweaking that is at least DB-dependent, and often application dependent.

Lihn, Steve

6:34 p.m.

For small queries, it does not matter much which approach you choose. But for large, complex queries, such 3-table join (especial Star Transformation) and/or large data set (millions of rows involved in large data warehouses), the performance will differ by order of magnitude, depending on how things are optimized. Steve -----Original Message----- From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Peter Verswyvelen Subject: RE: [Haskell-cafe] Consensus about databases / serialization Yitz wrote:

...

My impression from some previous posts is that because of the high-level approach, it is difficult to control the precise SQL that is generated. In practice, you almost always have to do some tweaking that is at least DB-dependent, and often application dependent.

Can't the same be said regarding SQL itself? It sometimes needs tweaking. That's the problem with any high level abstraction no? Just like in Haskell you sometimes have to use strictness tweaks. Of course having an extra layer on top of SQL will make the tweaking more difficult :) Peter ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------

Yitzchak Gale

9:15 p.m.

Lihn, Steve wrote:

...

For small queries, it does not matter much which approach you choose. But for large, complex queries, such 3-table join (especial Star Transformation) and/or large data set (millions of rows involved in large data warehouses), the performance will differ by order of magnitude, depending on how things are optimized.

Ah, yes. and that brings up another issue - how do the various backends scale for: - large SQL passed in - results with many records - records with many fields - records/fields with many bytes - several cursors What laziness options are available? -Yitz

Yitzchak Gale

9:09 p.m.

I wrote:

...

...
... to control the precise SQL that is generated. In practice, you almost always have to do some tweaking that is at least DB-dependent, and often application dependent.

Peter Verswyvelen wrote:

...

Can't the same be said regarding SQL itself? It sometimes needs tweaking. That's the problem with any high level abstraction no?

Certainly. In an ideal world, you could just write your queries in straightforward SQL and the DB would figure out what to do. But in real life, that is not how it works. So that complexity then gets passed up to the Haskell interface layers. Again, in an ideal world you would like to imagine that a high-level interface like haskelldb would be smart enough to compile any relational algebraic expression into SQL that will do the Right Thing for the given backend. But that would be very difficult. For example - there may be things you need to tweak that are both application-dependent and DB dependent. So to be usable in a serious DB project, there would have to be some kind of hooks that would allow you to tweak the SQL. After doing that - what have we gained by taking the high-level approach to begin with? I'm not sure. I would like to hear about people's thoughts and experiences on this. -Yitz

Peter Verswyvelen

9:32 p.m.

I see. But ouch, exactly the same could be said for Haskell no? :) Naaah... -----Original Message----- From: sefer.org@gmail.com [mailto:sefer.org@gmail.com] On Behalf Of Yitzchak Gale Sent: Thursday, January 03, 2008 10:09 PM To: Peter Verswyvelen Cc: Justin Bailey; Haskell-Cafe Subject: Re: [Haskell-cafe] Consensus about databases / serialization I wrote:

...

...
... to control the precise SQL that is generated. In practice, you almost always have to do some tweaking that is at least DB-dependent, and often application dependent.

Peter Verswyvelen wrote:

...

Can't the same be said regarding SQL itself? It sometimes needs tweaking. That's the problem with any high level abstraction no?

Yitzchak Gale

10:01 p.m.

Peter Verswyvelen wrote:

...

I see. But ouch, exactly the same could be said for Haskell no? :) Naaah...

Actually, that is one of the things that is so impressive about Haskell. It starts at such a high level, with such beautiful and powerful abstractions. But if needed, you can optimize down through many layers. All the way down to what they do on the Shootout, where they compete against C. It took a huge amount of effort over many years to achieve that. DB support still has a long way to go, but it is great to see that people are working on it at varying levels of abstraction. -Yitz

Brandon S. Allbery KF8NH

4 Jan 4 Jan

1:32 a.m.

On Jan 3, 2008, at 16:32 , Peter Verswyvelen wrote:

...

I see. But ouch, exactly the same could be said for Haskell no? :)

Optimization by quasirandom insertion of bangs / seq? Already there :) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

6406

Age (days ago)

6408

Last active (days ago)

List overview

Download

14 comments

9 participants

participants (9)

Brandon S. Allbery KF8NH
Cristian Baboi
Jeff Polakow
Justin Bailey
Lihn, Steve
Peter Verswyvelen
Salvatore Insalaco
Steve Lihn
Yitzchak Gale