
Kenneth Hoste ha scritto:
[...] The 26m/700MB I mentioned on my blog was on my ancient PowerBook G4 (1.5GHz PowerPC G4, 1.25G).
I redid the same experiment on our iMac (Core2 Duo, 2.0 GHz, 3.0G), i.e.: - read in all the data - count the number of keys in the IntMap (which should be 17,770, i.e. the number of movies) - compute the mean overall movie rating (which should be 3.6033)
That task was done in 3m42s, using just 632M of memory, using the following command:
./netflix ./training_set/ +RTS -A128M -s
The -A option makes sure GC isn't cleaning up stuff too frequently, while -s just reports some statistics.
Ok, thanks.
The way in which I'm reading in the data is somewhat different from yours. I construct the IntMap from the ground up, i.e. starting with an empty IntMap and using foldM, together with a function readMovie with the following type:
readMovie :: IntMap (UArray Int Int, UArray Int Word8) -> FilePath -> IO (IntMap (UArray Int Int, UArray Int Word8))
In readMovie, I'm using the 'insert' function provided by the IntMap module, which justs insert a new key-value pair in the existing IntMap.
This is the same thing I was doing in a previous version. Some tests showed me that the performances (and memory usage) were pratically the same, so I switched to the new implementation.
Your approach is very different: you create 17,770 IntMaps with a single key/value pair in them, and then use union to combine them all.
I profiled your approach on the same iMac (using the same +RTS options), and it needed 4m33s to run, using 948M of memory.
This is very strange, because in my tests I got very similar execution time and memory usage. Did you ran my program? Or did you just modified your code?
I think my approach is turning out better because I'm:
- building up the IntMap using 'empty' and 'insert', instead of combining 17,770 'singleton' IntMaps (which probably results better GC behavior) - using UArray instead of Urr (although I don't know if that actually makes a difference here)
I hope this helps you with figuring out what the bottleneck is on your side. It took me several days to come up with this approach, with the help from various Haskellers at IRC, so I'm surely no expert...
I've thrown my current code online at http://boegel.kejo.be/files/Netflix_read-and-parse_24-02-2009.hs , let me know if it's helpful in any way...
Thanks for sharing! I have executed your program: real 6m6.394s user 1m6.444s sys 0m7.180s 640 MB usage So memory usage don't changes with new versions of GHC (did you tried the new parallel garbage collector?) As for running time, it is my Hard Disk that is slower (the CPU is the same, but I have 2G of RAM). After reading your code I have a suspect. I use ByteString.Lazy to read the file content, and you instead use a strict ByteString. So I have updated the code to use a strict ByteString (maybe an even better solution is to use bytestring-mmap? [1]) I have executed the program, using the same RTS flags as yours: real 6m13.523s user 0m53.931s sys 0m7.812s 815 MB usage This is an huge improvement! Now I have to check if using insert will further improve memory usage. It's unfortunate that GC is not friendly with singleton + union solution.
Also, I was indeed using GHC 6.10.1, although I'm unsure to what extent that matter.
greetings,
Kenneth
[1] unfortunately with bytestring-mmap I'm not sure to have control over resources usage Regards Manlio