binary package: memory problem decoding an IntMap

Hi. I'm having memory problems decoding a big IntMap. The data structure is: IntMap (UArr (Word16 :*: Word8)) There are 480189 keys, and a total of 100480507 elements (Netflix Prize). The size of the encoded (and compressed) data is 184 MB. When I load data from the Netflix Prize data set, total memory usage is 1030 Mb. However when I try to decode the data, memory usage grows too much (even using the -F1.1 option in the RTS). The problem seems to be with `fromAscList` function, defined as: fromList :: [(Key,a)] -> IntMap a fromList xs = foldlStrict ins empty xs where ins t (k,x) = insert k x t (by the way, why IntMap module does not use Data.List.foldl'?). The `ins` function is not strict. This seems an hard problem to solve. First of all, IntMap should provide strict variants of the implemented functions. And the binary package should choose whether use the strict or lazy version. For me, the simplest solution is to serialize the association list obtained from `toAscList` function, instead of directly serialize the IntMap. The question is: can I "reuse" the data already serialized? Is the binary format of `IntMap a` and `[(Int, a)]` compatible? Thanks Manlio Perillo

Manlio Perillo ha scritto:
Hi.
I'm having memory problems decoding a big IntMap.
The data structure is:
IntMap (UArr (Word16 :*: Word8))
There are 480189 keys, and a total of 100480507 elements (Netflix Prize). The size of the encoded (and compressed) data is 184 MB.
When I load data from the Netflix Prize data set, total memory usage is 1030 Mb.
It seems there is a problem with tuples, too. I have a: [(Word16, UArr (Word32 :*:* Word8))] This eats more memory than it should, since tuples are decoded lazily. Manlio

Excerpts from Manlio Perillo's message of Sun Apr 05 22:41:57 +0200 2009:
Manlio Perillo ha scritto:
Hi.
I'm having memory problems decoding a big IntMap.
The data structure is:
IntMap (UArr (Word16 :*: Word8))
There are 480189 keys, and a total of 100480507 elements (Netflix Prize). The size of the encoded (and compressed) data is 184 MB.
When I load data from the Netflix Prize data set, total memory usage is 1030 Mb.
It seems there is a problem with tuples, too.
I have a: [(Word16, UArr (Word32 :*:* Word8))]
This eats more memory than it should, since tuples are decoded lazily.
Why not switch to [(Word16 :*: UArr (Word32 :*: Word8))] then? -- Nicolas Pouillard

Nicolas Pouillard ha scritto:
Excerpts from Manlio Perillo's message of Sun Apr 05 22:41:57 +0200 2009:
Manlio Perillo ha scritto:
Hi.
[...]
I have a: [(Word16, UArr (Word32 :*:* Word8))]
This eats more memory than it should, since tuples are decoded lazily.
Why not switch to [(Word16 :*: UArr (Word32 :*: Word8))] then?
I finally made some tests today, and I can confirm that using :*: reduces memory usage. Thanks Manlio

Manlio Perillo ha scritto:
[...]
It seems there is a problem with tuples, too.
I have a: [(Word16, UArr (Word32 :*:* Word8))]
This eats more memory than it should, since tuples are decoded lazily.
My bad, sorry. I simply solved by using a strict consumer (foldl' instead of foldl). Manlio
participants (2)
-
Manlio Perillo
-
Nicolas Pouillard