Re: [Haskell-cafe] UArray Word16 Word32 uses twice as much memory as it should?

19 Nov 2008

      Bulat Ziganshin wrote:
...
Hello Arne,
Wednesday, November 19, 2008, 11:57:01 AM, you wrote:
...
finding that it uses about twice as much memory as I had anticipated.
Hello, and thank you for your reply.
...
it may be
1) GC problem (due to GC haskell programs occupies 2-3x more memory
than actually used)
I wasn't aware of that - but it should be possible to trigger a GC after 
loading a whole lot of data?
...
2) additional data (you not said how long each small array. you should
expect 10-30 additional bytes used for every array)
The arrays represent the netflix data set: 100 000 000 ratings, given 
for 17770 films.

For each the films, I want to hold (on average, roughly) 2000 ratings, 
held as one person id (32-bit) and one rating (8-bit), in  the respctive 
arrays.

(In addition, I want to be able to load the inversion of this data: for 
all persons, I want to hold their ratings in a similar way:
16-bit film id, 8-bit rating. There are 480000 persons, so this should 
be on average 200 entries per person.

I have coded a few approaches to inverting this, but I can't allocate 
the array before traversing the data, because I don't know the sizes.

How can one go about inverting this data in memory?

It seems that any kind of laziness will fill the whole memory before I 
have traversed the whole set - and if I use several accumArrays, it 
seems that it will hold the whole uncompacted dataset in memory between 
accumArrays.

Ideally I want to hold all ratings as well as statistics for all films,  
and the same for all the persons - and then have room to spare for 
running an algorithm...

Best regards,
Arne D Halvorsen

Re: [Haskell-cafe] UArray Word16 Word32 uses twice as much memory as it should?

Arne Dehli Halvorsen