
Kenneth Hoste ha scritto:
Hello,
I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave.
I kind of have an algorithm in mind that I want to implement using Haskell, but up until now, the main issue has been to find a way to efficiently represent the data...
For people who are not familiar with the Netflix data, in short, it consist of roughly 100M (1e8) user ratings (1-5, integer) for 17,770 different movies, coming from 480,109 different users.
Hi Kenneth. I have written a simple program that parses the Netflix training data set, using this data structure: type MovieRatings = IntMap (UArr Word32, UArr Word8) The ratings are grouped by movies. The parsing is done in: real 8m32.476s user 3m5.276s sys 0m8.681s On a DELL Inspiron 6400 notebook, Intel Core2 T7200 @ 2.00GHz, and 2 GB memory. However the memory used is about 1.4 GB. How did you manage to get 700 MB memory usage? Note that the minimum space required is about 480 MB (assuming 4 byte integer for the ID, and 1 byte integer for rating). Using a 4 byte integer for both ID and rating, the space required is about 765 MB. 1.5 GB is the space required if one uses a total of 16 bytes to store both the ID and the rating. Maybe it is the garbage collector that does not release memory to the operating system? Thanks Manlio Perillo