Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types

14 Mar 2009

      Daniel Peebles ha scritto:
...
I have added UAProd-based Binary instances to my copy of the uvector
repo at http://patch-tag.com/publicrepos/pumpkin-uvector .
I can confirm that it works for me.

However I have now a memory problem with data decoding.

I need to serialize the Netflix Prize training dataset.
When I parse the data from original data set, memory usage is about 640 
MB [1].

But when I load the data serialized and compressed, (as [UArr (Word32 
*:* Word8)]), memory usage is about 840 MB...

The culprit is probably the decoding of the list (17770 elements).

[1] I have written a script in Python that parse the data, and it only
     takes 491 MB
     (using a list of a tuple with two compact arrays from numpy).
     So, GHC has memory problems here.

Thanks  Manlio Perillo