
Kenneth Hoste ha scritto:
[...] However, as I posted yesterday, I've been able to circumvent the issue by rethinking my data type, i.e. using the ~18K movie IDs as key instead of the 480K user IDs, which radically limits the overhead...
Well, but what if you really need the original data structure, for better data processing?
That way, I'm able to fit the data set in <700M of memory, without having to reorganize the raw data.
The uvector package implements a vector of unboxed types, and has an snocU operation, to append an element to the array.
I don't know how efficient it is, however.
By the way, about uvector: it has a Stream data type, and you can build a vector from a stream.
Thanks for letting me know, I'll keep this in mind.
Let me know if there are performance improvements. Arrays are one of the few things I dislike in Haskell, and all the available array/vector packages cause me some confusion. Regards Manlio