Re: [Haskell-cafe] Amazon AWS storage best to use with Haskell?

On Mon, Oct 31, 2011 at 6:50 PM, John Lenz
CouchDB works great, although I decided to go with SimpleDB since then it is amazon's problem to scale and allocate disk and so forth, which I like better. For couchdb, you can use my package couchdb-enumerator on hackage.
Regarding CouchDB. So far I have my records keyed by Id and stored in Data.Map which I serialize to text file. Using Data.Map functions I do many operations with these records including mapping functions over keys and values, accumulation, lookup, intersection, union etc. When I move this data to CouchDB and start using couchdb-enumerator to work with it, how natural will it be to implement all these functions that I use from Data.Map? Or maybe it makes more sense to store my serialized Data.Map as a blob in CouchDB? And do not use views or similar CouchDB / SimpleDB interfaces at all? Just retrieve necessary blob and deserialize it to Data.Map, update and then store modified blob to CouchDB again?
It would be great if somebody had time to implement Data.List, Data.Map, etc on top of generic NoSQL DB interface with specific instances for CouchDB, SimpleDB, etc.

For distributed execution you can look at the recent work on "CloudHaskell":
https://github.com/jepst/CloudHaskell
http://groups.google.com/group/cloudhaskell
As for a programming model -- Philip Trinder et. al have a version of
monad-par that works in a distributed way over CloudHaskell, likewise
CloudHaskell itself provides a simple "Task" layer.
For a NOSQL layer -- I'm looking for the answer to that same question
myself! We've been experimenting with Cassandra (used via the hscassandra
package based in turn on cassandra-thrift). Already it's clear that there
are many areas that need work. The Haskell code generated by Thrift itself
has a lot of room for improvement (for the intrepid hacker: cycles there
would be well-spent).
We haven't tried CouchDB yet. Please keep us posted on what you find.
I don't know if any one has a clean way for hooking a simple Haskell-ish
interface (e.g. Data.Map) up to a persistence layer. But it seems like
there have been a bunch of papers on "database supported haskell" and the
like. One of them must have solved this!
http://hackage.haskell.org/package/DSH
Cheers,
-Ryan
On Mon, Oct 31, 2011 at 4:53 PM, dokondr
On Mon, Oct 31, 2011 at 6:50 PM, John Lenz
wrote: CouchDB works great, although I decided to go with SimpleDB since then it is amazon's problem to scale and allocate disk and so forth, which I like better. For couchdb, you can use my package couchdb-enumerator on hackage.
Regarding CouchDB. So far I have my records keyed by Id and stored in Data.Map which I serialize to text file. Using Data.Map functions I do many operations with these records including mapping functions over keys and values, accumulation, lookup, intersection, union etc. When I move this data to CouchDB and start using couchdb-enumerator to work with it, how natural will it be to implement all these functions that I use from Data.Map? Or maybe it makes more sense to store my serialized Data.Map as a blob in CouchDB? And do not use views or similar CouchDB / SimpleDB interfaces at all? Just retrieve necessary blob and deserialize it to Data.Map, update and then store modified blob to CouchDB again?
It would be great if somebody had time to implement Data.List, Data.Map, etc on top of generic NoSQL DB interface with specific instances for CouchDB, SimpleDB, etc.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Nov 1, 2011 at 12:07 AM, Ryan Newton
... For a NOSQL layer -- I'm looking for the answer to that same question myself! We've been experimenting with Cassandra (used via the hscassandra package based in turn on cassandra-thrift). Already it's clear that there are many areas that need work. The Haskell code generated by Thrift itself has a lot of room for improvement (for the intrepid hacker: cycles there would be well-spent).
Any example code of using hscassandra package would really help!

Any example code of using hscassandra package would really help!
I'll ask my student. We may have some simple examples. Also, I have no idea as to their quality but I was pleasantly surprised to find three different amazon related packages on Hackage (simply by searching for the word "Amazon" in the package list). http://hackage.haskell.org/package/hS3 http://hackage.haskell.org/package/hSimpleDB http://hackage.haskell.org/package/aws It would be great to know if these work. -Ryan

On Tue, Nov 1, 2011 at 5:03 AM, Ryan Newton
Any example code of using hscassandra package would really help!
I'll ask my student. We may have some simple examples.
Also, I have no idea as to their quality but I was pleasantly surprised to find three different amazon related packages on Hackage (simply by searching for the word "Amazon" in the package list).
http://hackage.haskell.org/package/hS3 http://hackage.haskell.org/package/hSimpleDB http://hackage.haskell.org/package/aws
It would be great to know if these work.
Thinking about how to implement Data.Map on top of hscassandra or any other key-value storage ... For example creating new map with "fromList" will require to store *all* (key, value) list elements in external storage at once. How to deal with laziness in this case?

Word of caution Understand the semantics (and cost profile) of the AWS services first - you can't just open a HTTP connection and dribble data out over several days and hope for things to work. It is not a system that has that sort of laziness at its heart. AWS doesn't supply a traditional remote file store semantics - is queuing, simple database and object store have all been designed for large scale systems being offered as a service to a (potentially hostile) large set of users - you can see that in the way that things are designed. There are all sorts of (sensible from their point of view) performance related limits and retries. The challenge in designing nice clean layers on top of AWS is how/when to hide the transient/load related failures. Neil On 1 Nov 2011, at 06:21, dokondr wrote:
On Tue, Nov 1, 2011 at 5:03 AM, Ryan Newton
wrote: Any example code of using hscassandra package would really help! I'll ask my student. We may have some simple examples.
Also, I have no idea as to their quality but I was pleasantly surprised to find three different amazon related packages on Hackage (simply by searching for the word "Amazon" in the package list).
http://hackage.haskell.org/package/hS3 http://hackage.haskell.org/package/hSimpleDB http://hackage.haskell.org/package/aws
It would be great to know if these work.
Thinking about how to implement Data.Map on top of hscassandra or any other key-value storage ... For example creating new map with "fromList" will require to store *all* (key, value) list elements in external storage at once. How to deal with laziness in this case?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies
Word of caution
Understand the semantics (and cost profile) of the AWS services first - you can't just open a HTTP connection and dribble data out over several days and hope for things to work. It is not a system that has that sort of laziness at its heart.
AWS doesn't supply a traditional remote file store semantics - is queuing, simple database and object store have all been designed for large scale systems being offered as a service to a (potentially hostile) large set of users - you can see that in the way that things are designed. There are all sorts of (sensible from their point of view) performance related limits and retries.
The challenge in designing nice clean layers on top of AWS is how/when to hide the transient/load related failures.
As a straw-man approach I would go first to NData.Map backed by Data.Map with addition of "flush" function to write Data.Map to external key-value store / NoSQL DB. Another requirement for NData.Map is concurrent consistency, so different clients could modify its state preserving "happen-before" relationship. For this I would add to NData.Map a "reftresh" function, that updates local copy from external key-value store. As for hSimpleDB package, it looks like it doesn't build on ghc7: http://hackage.haskell.org/package/hSimpleDB
The hSimpleDB package
Interface to Amazon's SimpleDB service. PropertiesVersions0.1 http://hackage.haskell.org/package/hSimpleDB-0.1, 0.2 http://hackage.haskell.org/package/hSimpleDB-0.2, *0.3*Dependencies base http://hackage.haskell.org/package/base-3.0.3.2 (≥3 & ≤4), bytestring http://hackage.haskell.org/package/bytestring-0.9.2.0, Cryptohttp://hackage.haskell.org/package/Crypto-4.2.4, dataenc http://hackage.haskell.org/package/dataenc-0.14.0.2, HTTPhttp://hackage.haskell.org/package/HTTP-4000.1.2, hxt http://hackage.haskell.org/package/hxt-9.1.4, networkhttp://hackage.haskell.org/package/network-2.3.0.7, old-locale http://hackage.haskell.org/package/old-locale-1.0.0.3, old-time http://hackage.haskell.org/package/old-time-1.0.0.7, utf8-string http://hackage.haskell.org/package/utf8-string-0.3.7License BSD3AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007MaintainerDavid Himmelstrup
CategoryDatabasehttp://hackage.haskell.org/packages/archive/pkg-list.html#cat:database, Web http://hackage.haskell.org/packages/archive/pkg-list.html#cat:web, Networkhttp://hackage.haskell.org/packages/archive/pkg-list.html#cat:networkUpload dateThu Sep 17 17:09:26 UTC 2009Uploaded byDavidHimmelstrupBuilt onghc-6.10, ghc-6.12Build failureghc-7.0 (loghttp://hackage.haskell.org/packages/archive/hSimpleDB/0.3/logs/failure/ghc-7... )

We use AWS extensively. We use the aws package and have contributed to it,
specifically SQS functionality. I will give you the rundown of what we do.
We moved off of SimpleDb and now use mondodb. The reason is that simple db
seemed to have problems with write pressure and there are not good tools
for profiling your queries. My main application is extremely write heavy
with a single instance needing to do 100s or 1000s of writes a second.
Mongodb has worked well for us. I am scared of things like cassandra having
looked at the code, however some people have made it work.
We store data such as crawled web pages in S3. The files are lzma
compressed and the data format is built on protocol buffers. We picked lzma
for both storage costs of cold data and the fact that the pipe between S3
and EC2 is somewhat limited and we want to make the most effective use of
it as possible.
In my experience AWS simulators are more trouble than they are worth since
they don't accurately model the way AWS will respond to you under load. The
free tier at AWS should allow you to experiment with building an app. The
first couple of months of development cost us less than $1.
Steve
On Tue, Nov 1, 2011 at 1:27 AM, dokondr
On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies < semanticphilosopher@gmail.com> wrote:
Word of caution
Understand the semantics (and cost profile) of the AWS services first - you can't just open a HTTP connection and dribble data out over several days and hope for things to work. It is not a system that has that sort of laziness at its heart.
AWS doesn't supply a traditional remote file store semantics - is queuing, simple database and object store have all been designed for large scale systems being offered as a service to a (potentially hostile) large set of users - you can see that in the way that things are designed. There are all sorts of (sensible from their point of view) performance related limits and retries.
The challenge in designing nice clean layers on top of AWS is how/when to hide the transient/load related failures.
As a straw-man approach I would go first to NData.Map backed by Data.Map with addition of "flush" function to write Data.Map to external key-value store / NoSQL DB. Another requirement for NData.Map is concurrent consistency, so different clients could modify its state preserving "happen-before" relationship. For this I would add to NData.Map a "reftresh" function, that updates local copy from external key-value store.
As for hSimpleDB package, it looks like it doesn't build on ghc7: http://hackage.haskell.org/package/hSimpleDB
The hSimpleDB package
Interface to Amazon's SimpleDB service. PropertiesVersions0.1 http://hackage.haskell.org/package/hSimpleDB-0.1, 0.2 http://hackage.haskell.org/package/hSimpleDB-0.2, *0.3* Dependenciesbase http://hackage.haskell.org/package/base-3.0.3.2 (≥3 & ≤4), bytestring http://hackage.haskell.org/package/bytestring-0.9.2.0, Crypto http://hackage.haskell.org/package/Crypto-4.2.4, dataenchttp://hackage.haskell.org/package/dataenc-0.14.0.2, HTTP http://hackage.haskell.org/package/HTTP-4000.1.2, hxthttp://hackage.haskell.org/package/hxt-9.1.4, network http://hackage.haskell.org/package/network-2.3.0.7, old-localehttp://hackage.haskell.org/package/old-locale-1.0.0.3, old-time http://hackage.haskell.org/package/old-time-1.0.0.7, utf8-string http://hackage.haskell.org/package/utf8-string-0.3.7 LicenseBSD3AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007Maintainer David Himmelstrup
CategoryDatabasehttp://hackage.haskell.org/packages/archive/pkg-list.html#cat:database, Web http://hackage.haskell.org/packages/archive/pkg-list.html#cat:web, Networkhttp://hackage.haskell.org/packages/archive/pkg-list.html#cat:network Upload dateThu Sep 17 17:09:26 UTC 2009Uploaded byDavidHimmelstrupBuilt on ghc-6.10, ghc-6.12Build failureghc-7.0 (loghttp://hackage.haskell.org/packages/archive/hSimpleDB/0.3/logs/failure/ghc-7... ) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Steve, thanks for sharing your experience with AWS!
At the moment I have evaluated several NoSQL storage solutions including
SimpleDB, Riak, MongoDB and Cassandra. Lessons learned:
1) Storage that SimpleDB provides is too low-level and not very convenient
to store dictionaries and other b-tree data structures that my app. works
with.
2) "simpledb/dev" simulator is out of date and does not support the
complete feature set of SimpleDB today. Thus, without major rewrite
"simpledb/dev" emulator can not be used for the development.
3) SimpleDB storage is 100% specific to Amazon framework. From this follows
that developing directly to SimpleDB interface will make app not portable
across different cloud platforms.
4) Cassandra row/column abstraction is awkward for Data.Map structures that
my app needs.
5) Riak provides convenient bucket/key/value abstraction and works in
robust to failure node framework. REST/JSON protocol is simple to use, yet
it is inefficient for data exchanges used by my app. I couldn't find simple
libraries for binary exchange that Riak also supports.
6) MongoDB answers my requirements best of all - it is powerful on a server
side (Javascript filters, etc) and works with efficient communication
protocol based on BSON data exchange.
I also plan to use RabitMQ for communication between several Haskell
processes and Java Web front-end that my app incorporates.
It would be great to know what tools people use in the cloud (AWS, etc.) to
communicate Web front-end with rest of the (Haskell) system ?
What Haskell tools to build Web front-end?
Thanks!
Dmitri
On Wed, Nov 16, 2011 at 9:01 PM, Steve Severance
We use AWS extensively. We use the aws package and have contributed to it, specifically SQS functionality. I will give you the rundown of what we do.
We moved off of SimpleDb and now use mondodb. The reason is that simple db seemed to have problems with write pressure and there are not good tools for profiling your queries. My main application is extremely write heavy with a single instance needing to do 100s or 1000s of writes a second. Mongodb has worked well for us. I am scared of things like cassandra having looked at the code, however some people have made it work.
We store data such as crawled web pages in S3. The files are lzma compressed and the data format is built on protocol buffers. We picked lzma for both storage costs of cold data and the fact that the pipe between S3 and EC2 is somewhat limited and we want to make the most effective use of it as possible.
In my experience AWS simulators are more trouble than they are worth since they don't accurately model the way AWS will respond to you under load. The free tier at AWS should allow you to experiment with building an app. The first couple of months of development cost us less than $1.
Steve
On Tue, Nov 1, 2011 at 1:27 AM, dokondr
wrote: On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies < semanticphilosopher@gmail.com> wrote:
Word of caution
Understand the semantics (and cost profile) of the AWS services first - you can't just open a HTTP connection and dribble data out over several days and hope for things to work. It is not a system that has that sort of laziness at its heart.
AWS doesn't supply a traditional remote file store semantics - is queuing, simple database and object store have all been designed for large scale systems being offered as a service to a (potentially hostile) large set of users - you can see that in the way that things are designed. There are all sorts of (sensible from their point of view) performance related limits and retries.
The challenge in designing nice clean layers on top of AWS is how/when to hide the transient/load related failures.
As a straw-man approach I would go first to NData.Map backed by Data.Map with addition of "flush" function to write Data.Map to external key-value store / NoSQL DB. Another requirement for NData.Map is concurrent consistency, so different clients could modify its state preserving "happen-before" relationship. For this I would add to NData.Map a "reftresh" function, that updates local copy from external key-value store.
As for hSimpleDB package, it looks like it doesn't build on ghc7: http://hackage.haskell.org/package/hSimpleDB
The hSimpleDB package
Interface to Amazon's SimpleDB service. PropertiesVersions0.1 http://hackage.haskell.org/package/hSimpleDB-0.1, 0.2 http://hackage.haskell.org/package/hSimpleDB-0.2, *0.3* Dependenciesbase http://hackage.haskell.org/package/base-3.0.3.2 (≥3 & ≤4), bytestringhttp://hackage.haskell.org/package/bytestring-0.9.2.0, Crypto http://hackage.haskell.org/package/Crypto-4.2.4, dataenchttp://hackage.haskell.org/package/dataenc-0.14.0.2, HTTP http://hackage.haskell.org/package/HTTP-4000.1.2, hxthttp://hackage.haskell.org/package/hxt-9.1.4, network http://hackage.haskell.org/package/network-2.3.0.7, old-localehttp://hackage.haskell.org/package/old-locale-1.0.0.3, old-time http://hackage.haskell.org/package/old-time-1.0.0.7, utf8-string http://hackage.haskell.org/package/utf8-string-0.3.7 LicenseBSD3AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007Maintainer David Himmelstrup
CategoryDatabasehttp://hackage.haskell.org/packages/archive/pkg-list.html#cat:database, Web http://hackage.haskell.org/packages/archive/pkg-list.html#cat:web, Networkhttp://hackage.haskell.org/packages/archive/pkg-list.html#cat:network Upload dateThu Sep 17 17:09:26 UTC 2009Uploaded byDavidHimmelstrupBuilt on ghc-6.10, ghc-6.12Build failureghc-7.0 (loghttp://hackage.haskell.org/packages/archive/hSimpleDB/0.3/logs/failure/ghc-7... ) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

We use all three (in various ways as they have arrived on the scene over time) in production systems. On 1 Nov 2011, at 02:03, Ryan Newton wrote:
Any example code of using hscassandra package would really help!
I'll ask my student. We may have some simple examples.
Also, I have no idea as to their quality but I was pleasantly surprised to find three different amazon related packages on Hackage (simply by searching for the word "Amazon" in the package list).
http://hackage.haskell.org/package/hS3 http://hackage.haskell.org/package/hSimpleDB http://hackage.haskell.org/package/aws
It would be great to know if these work.
-Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
dokondr
-
Neil Davies
-
Ryan Newton
-
Steve Severance