Re: [Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

26 Feb 2009

      Kenneth Hoste ha scritto:
...
[...]
However, as I posted yesterday, I've been able to circumvent the issue 
by rethinking my data type, i.e. using
the ~18K movie IDs as key instead of the 480K user IDs, which radically 
limits the overhead...
Well, but what if you really need the original data structure, for 
better data processing?
...
That way, I'm able to fit the data set in <700M of memory, without 
having to reorganize the raw data.
...
The uvector package implements a vector of unboxed types, and has an 
snocU operation, to append an element to the array.
I don't know how efficient it is, however.
...
By the way, about uvector: it has a Stream data type, and you can 
build a vector from a stream.
Thanks for letting me know, I'll keep this in mind.
Let me know if there are performance improvements.

Arrays are one of the few things I dislike in Haskell, and all the 
available array/vector packages cause me some confusion.

Regards   Manlio

Re: [Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

Manlio Perillo