We use AWS extensively. We use the aws package and have contributed to it, specifically SQS functionality. I will give you the rundown of what we do.

We moved off of SimpleDb and now use mondodb. The reason is that simple db seemed to have problems with write pressure and there are not good tools for profiling your queries. My main application is extremely write heavy with a single instance needing to do 100s or 1000s of writes a second. Mongodb has worked well for us. I am scared of things like cassandra having looked at the code, however some people have made it work.

We store data such as crawled web pages in S3. The files are lzma compressed and the data format is built on protocol buffers. We picked lzma for both storage costs of cold data and the fact that the pipe between S3 and EC2 is somewhat limited and we want to make the most effective use of it as possible.

In my experience AWS simulators are more trouble than they are worth since they don't accurately model the way AWS will respond to you under load. The free tier at AWS should allow you to experiment with building an app. The first couple of months of development cost us less than $1.

Steve

On Tue, Nov 1, 2011 at 1:27 AM, dokondr <dokondr@gmail.com> wrote:


On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies <semanticphilosopher@gmail.com> wrote:
Word of caution

Understand the semantics (and cost profile) of the AWS services first - you can't just open a HTTP connection and dribble data out over several days and hope for things to work. It is not a system that has that sort of laziness at its heart.

AWS doesn't supply a traditional remote file store semantics - is queuing, simple database and object store have all been designed for large scale systems being offered as a service to a (potentially hostile) large set of users - you can see that in the way that things are designed. There are all sorts of (sensible from their point of view) performance related limits and retries.

The challenge in designing nice clean layers on top of AWS is how/when to hide the transient/load related failures.



As a straw-man approach I would go first to NData.Map backed by Data.Map with addition of "flush" function  to write Data.Map to external key-value store / NoSQL DB.
Another requirement for NData.Map is concurrent consistency, so different clients could modify its state preserving "happen-before" relationship. For this I would add to NData.Map a "reftresh" function, that updates local copy from  external key-value store.

As for hSimpleDB package, it looks like it doesn't build on ghc7:
http://hackage.haskell.org/package/hSimpleDB
 

The hSimpleDB package

Interface to Amazon's SimpleDB service.

Properties

Versions0.1, 0.2, 0.3
Dependenciesbase (≥3 & ≤4), bytestring, Crypto, dataenc, HTTP, hxt, network, old-locale, old-time, utf8-string
LicenseBSD3
AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007
Maintainer David Himmelstrup <lemmih@gmail.com>
CategoryDatabase, Web, Network
Upload dateThu Sep 17 17:09:26 UTC 2009
Uploaded byDavidHimmelstrup
Built on ghc-6.10, ghc-6.12
Build failureghc-7.0 (log)
 

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe