It's hard to answer without knowing what kinds of queries he's doing, but in the past, I've used csv-conduit to parse the raw data, convert the data to some Haskell ADT, and then used standard conduit processing to perform analyses in a streaming manner.
Hi,
just the other day I talked to a friend of mine who works for an online
radio service who told me he was currently looking into how best work
with assorted usage data: currently 250 million entries as a 12GB in a
csv comprising of information such as which channel was tuned in for how
long with which user agent and what not.
He accidentally ran into K and Q programming language (*1) which
apparently work nicely for this as unfamiliar as it might seem.
This certainly is not my area of expertise at all. I was just wondering
how some of you would suggest to approach this with Haskell. How would
you most efficiently parse such data evaluating custom queries ?
Thanks for your time,
Tobi
[1] (http://en.wikipedia.org/wiki/K_(programming_language)
[2] http://en.wikipedia.org/wiki/Q_(programming_language_from_Kx_Systems)
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe