
Hello, I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave. I kind of have an algorithm in mind that I want to implement using Haskell, but up until now, the main issue has been to find a way to efficiently represent the data... For people who are not familiar with the Netflix data, in short, it consist of roughly 100M (1e8) user ratings (1-5, integer) for 17,770 different movies, coming from 480,109 different users. The idea I have in mind requires fast access to all the ratings for a particular user, so I was thinking about an IntMap in terms of user ids, which maps to movie id/rating pairs somehow. To see if I could efficiently represent the data set in this way, I wrote a small Haskell program (attached) which uses the following data type: type data = IntMap (UArray Int Word8) -- map of user IDs to ratings (lacks movie IDs) For convenience, and because I've been discussing this with various people in #haskell @ IRC, the code is also available here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=1671#a1671 . I'm aware that the way in which the UArray's are constructed (i.e. by adding a single element each time), is far from efficient performance-wise, but that's not the point here... By reformatting the raw data, I can easily read in the data more efficiently. The issue I ran into is that changing the data type to the following, doesn't yield any significant different in memory usage. type data = IntMap (UArray Int Int) -- use Int instead of Word8 for a user rating Someone (dolio) @ #haskell suggested that maybe UArray is not byte packed for Word8, which would cause little difference with a UArray containing Int's, but someone else (dons @ #ghc) was able to tell me it _is_ byte packed. Does anyone know why the Word8 version is not significantly better in terms of memory usage? greetings, Kenneth PS: My adventures on trying to tackle the Netflix Prize problem with Haskell can be followed at http://boegel.kejo.be. -- Kenneth Hoste ELIS - Ghent University email: kenneth.hoste@elis.ugent.be blog: http://www.elis.ugent.be/~kehoste/blog website: http://www.elis.ugent.be/~kehoste