What he could do is encode the column values to appropriate lengths of Word's to reduce the size -- to make it fit in ram. E.g listening times as seconds, browsers as categorical variables (in statistics terms), etc. If some of the columns are arbitrary length strings, then it seems possible to get 12GB down by more than half.

If he doesn't know Haskell, then I'd suggest using another language. (Years ago I tried to do a bigger uni project in Haskell-- being a noob --and failed miserably.)

On Nov 12, 2014 10:45 AM, "Tobias Pflug" <tobias.pflug@gmx.net> wrote:

Hi,

just the other day I talked to a friend of mine who works for an online radio service who told me he was currently looking into how best work with assorted usage data: currently 250 million entries as a 12GB in a csv comprising of information such as which channel was tuned in for how long with which user agent and what not.

He accidentally ran into K and Q programming language (*1) which apparently work nicely for this as unfamiliar as it might seem.

This certainly is not my area of expertise at all. I was just wondering how some of you would suggest to approach this with Haskell. How would you most efficiently parse such data evaluating custom queries ?

Thanks for your time,
Tobi

[1] (http://en.wikipedia.org/wiki/K_(programming_language)
[2] http://en.wikipedia.org/wiki/Q_(programming_language_from_Kx_Systems)
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe