uvector package appendU: memory leak?

Hi. As with a previous post, I think I have found a possible memory problem with the uvector package. I have this data structure (for, again, my Netflix Prize project): IntMap (UArr (Word16 :*: Word8)) I was adding elements to the map using something like: v = map singletonU (a :*: b) insertWith appendU k v m However doing this eats a *lot* of memory. Today I have rewritten my program to use `alter` and `snocU`: append Nothing = Just $ singletonU (a :*: b) append (Just u) = Just $ snocU u (a :*: b) alter append k m This, finally, works. Unfortunately I'm still not able to load the entire Netflix Prize training data set, grouping ratings by customers, because my PC has only 2 GB of RAM. The required memory is about 2 GB, but when the system start to swap, I have to kill the program. So the question is: why appending an array of only one element to an existing array causes memory problems? This should be pratically the same as adding an element. Thanks Manlio

manlio_perillo:
Hi.
As with a previous post, I think I have found a possible memory problem with the uvector package.
I have this data structure (for, again, my Netflix Prize project):
IntMap (UArr (Word16 :*: Word8))
I was adding elements to the map using something like:
v = map singletonU (a :*: b)
insertWith appendU k v m
However doing this eats a *lot* of memory.
Today I have rewritten my program to use `alter` and `snocU`:
append Nothing = Just $ singletonU (a :*: b) append (Just u) = Just $ snocU u (a :*: b)
alter append k m
This, finally, works.
Unfortunately I'm still not able to load the entire Netflix Prize training data set, grouping ratings by customers, because my PC has only 2 GB of RAM. The required memory is about 2 GB, but when the system start to swap, I have to kill the program.
So the question is: why appending an array of only one element to an existing array causes memory problems?
It must copy the entire array. -- Don

Don Stewart ha scritto:
[...]
So the question is: why appending an array of only one element to an existing array causes memory problems?
It must copy the entire array.
Isn't it the same with snocU? And, since the final result is the same, what happens to the temporary memory used for array copying? I have executed the program with: +RTS -A128M -s -c -F1.1 -RTS The memory seems to leak.
-- Don
Thanks Manlio

manlio_perillo:
Don Stewart ha scritto:
[...]
So the question is: why appending an array of only one element to an existing array causes memory problems?
It must copy the entire array.
Isn't it the same with snocU?
And, since the final result is the same, what happens to the temporary memory used for array copying?
I have executed the program with: +RTS -A128M -s -c -F1.1 -RTS
The memory seems to leak.
Send me a test case. -- Don

Don Stewart ha scritto:
manlio_perillo:
Don Stewart ha scritto:
[...]
So the question is: why appending an array of only one element to an existing array causes memory problems?
It must copy the entire array.
Isn't it the same with snocU?
And, since the final result is the same, what happens to the temporary memory used for array copying?
I have executed the program with: +RTS -A128M -s -c -F1.1 -RTS
The memory seems to leak.
Send me a test case.
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3071 But Claus was right, appendU is lazy; this seems to be the cause of the problem. However now I don't really understand why the two implementations differs in lazyness. Or, to ask a different question, how can I make the version using insertWith strict? Thanks Manlio

But Claus was right, appendU is lazy; this seems to be the cause of the problem.
appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why).
However now I don't really understand why the two implementations differs in lazyness.
Or, to ask a different question, how can I make the version using insertWith strict?
deja vu:-( http://www.haskell.org/pipermail/haskell-cafe/2009-March/057032.html As you've noticed, alter also allows to enforce strictness. But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram): The test is a silly loop, using consU a lot (which isn't uvectors main target), to copy an array. The difference between the plain loop and the unfolded loop shows that compile-time fusion can improve this, suggesting that no runtime fusion is taking place. The simulated runtime fusion version demonstrates (by staying within lists till the end, only then switching back to arrays, avoiding all the intermediate partial array copies). The old bytesting cons thread I linked to was about integrating real runtime fusion into things like bytestring/uvector. Claus

Claus Reinke ha scritto:
But Claus was right, appendU is lazy; this seems to be the cause of the problem.
appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why).
Ok, I see. But, IMHO, this should be clearly documented. I have updated my test code: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103 The interesting thing is that using appendU is *faster*. Using appendU: real 0m38.736s user 0m38.638s sys 0m0.048s Using snocU: real 0m41.279s user 0m40.939s sys 0m0.040s Memory usage is the same.
However now I don't really understand why the two implementations differs in lazyness.
Or, to ask a different question, how can I make the version using insertWith strict?
deja vu:-( http://www.haskell.org/pipermail/haskell-cafe/2009-March/057032.html
Your are right, sorry. The problem is that at that time I was not able to fully understand the code! However, reading the code now, I prefer my version using alter. By the way, about insertWith/alter; from IntMap documentation: insertWithKey: O(min(n,W) alter: O(log n) So, alter is more efficient than insertWithKey? And what is that `W` ?
As you've noticed, alter also allows to enforce strictness.
But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram):
I'm not sure to fully understand the code. But, again, IMHO it does not apply to my original problem.
[...]
Thanks Manlio

Can I close this ticket as not being to do with uvector? -- Don manlio_perillo:
Claus Reinke ha scritto:
But Claus was right, appendU is lazy; this seems to be the cause of the problem.
appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why).
Ok, I see. But, IMHO, this should be clearly documented.
I have updated my test code: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103
The interesting thing is that using appendU is *faster*.
Using appendU: real 0m38.736s user 0m38.638s sys 0m0.048s
Using snocU: real 0m41.279s user 0m40.939s sys 0m0.040s
Memory usage is the same.
However now I don't really understand why the two implementations differs in lazyness.
Or, to ask a different question, how can I make the version using insertWith strict?
deja vu:-( http://www.haskell.org/pipermail/haskell-cafe/2009-March/057032.html
Your are right, sorry. The problem is that at that time I was not able to fully understand the code!
However, reading the code now, I prefer my version using alter.
By the way, about insertWith/alter; from IntMap documentation:
insertWithKey: O(min(n,W) alter: O(log n)
So, alter is more efficient than insertWithKey? And what is that `W` ?
As you've noticed, alter also allows to enforce strictness.
But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram):
I'm not sure to fully understand the code. But, again, IMHO it does not apply to my original problem.
[...]
Thanks Manlio

Don Stewart ha scritto:
Can I close this ticket as not being to do with uvector?
Yes, thanks; and sorry for the noise. But it may be interesting to study this the example I have pasted: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103 I find it a bit surprising that using appendU is actually faster than using snocU.
The interesting thing is that using appendU is *faster*.
Using appendU: real 0m38.736s user 0m38.638s sys 0m0.048s
Using snocU: real 0m41.279s user 0m40.939s sys 0m0.040s
Manlio

Can I close this ticket as not being to do with uvector? -- Don
You did notice the suggestion that performance of uvector and bytestring could be improved drastically if compile-time fusion would be augmented with runtime fusion? Claus http://www.haskell.org/pipermail/haskell-cafe/2009-March/058911.html http://www.haskell.org/pipermail/haskell-cafe/2009-March/058918.html

Claus Reinke ha scritto:
Can I close this ticket as not being to do with uvector? -- Don
You did notice the suggestion that performance of uvector and bytestring could be improved drastically if compile-time fusion would be augmented with runtime fusion?
The storablevector package implements Data.StorableVector.Lazy Just as with Data.ByteString.Lazy, it contains a linked list of chunks. I think that this can improve performances of my implementation, since it is much more efficient to append elements at the end of the vector (it avoids a lot of copying). In my draft implementation of the the Netflix Prize in D language, I used a similar implementation, base on: http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h Unfortunately, D garbage collector is really bad when there are a lot of allocations, so I gave up. Manlio Perillo

appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why).
Ok, I see. But, IMHO, this should be clearly documented.
There seems to be some agreement that strict variant operations should also be provided, but it needs some more thinking, and whenever I look at the code, I have the feeling that there are just too many operations defined in it, so I look away again:-(
However, reading the code now, I prefer my version using alter.
Yes, it is a more direct way to enforce strict evaluation. Btw, if your update function 'f' is strict in 'old' and 'new', you might just want to use 'Just $! f old new' and save all those 'seq's.
By the way, about insertWith/alter; from IntMap documentation:
insertWithKey: O(min(n,W) alter: O(log n)
So, alter is more efficient than insertWithKey? And what is that `W` ?
'W' is a maximum bound (see comment at top of that haddock page). 'alter' has some shortcuts if the update functions doesn't actually do anything, but for reimplementing 'insertWithKey', the complexity should be the same as the code would be the same.
But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram):
I'm not sure to fully understand the code.
Writing ':<' for 'snocU', the code for 'singletonU a1: But, again, IMHO it does not apply to my original problem. It is a question of resource constraints, probably. Instead of maintaining
an 'IntMap (UArr Int)' all the way through buildup, you could use an
'IntMap [Int]' while parsing, and then compress into 'IntMap (UArr Int)'
when parsing is finished. If you can afford the larger memory profile
during parsing, this should give better overall performance, with the
same memory need for the final state, but larger memory needs up
to that state. I'd be interested in the differences, if you try it.
Claus

Claus Reinke ha scritto:
appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why).
Ok, I see. But, IMHO, this should be clearly documented.
There seems to be some agreement that strict variant operations should also be provided, but it needs some more thinking, and whenever I look at the code, I have the feeling that there are just too many operations defined in it, so I look away again:-(
Well, it does not matter if strict variant operation is provided or not. But at least it should be documented whether an implemented operation is strict or lazy.
However, reading the code now, I prefer my version using alter.
Yes, it is a more direct way to enforce strict evaluation. Btw, if your update function 'f' is strict in 'old' and 'new', you might just want to use 'Just $! f old new' and save all those 'seq's.
Righ, thanks. And this seems to be a bit more memory friendly, too.
By the way, about insertWith/alter; from IntMap documentation:
insertWithKey: O(min(n,W) alter: O(log n)
So, alter is more efficient than insertWithKey? And what is that `W` ?
'W' is a maximum bound (see comment at top of that haddock page).
Where? And the top of that page there is only a link to a paper where the algorithm is defined.
'alter' has some shortcuts if the update functions doesn't actually do anything, but for reimplementing 'insertWithKey', the complexity should be the same as the code would be the same.
Ok.
But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram):
I'm not sure to fully understand the code.
[ Ok ]
But, again, IMHO it does not apply to my original problem.
It is a question of resource constraints, probably. Instead of maintaining an 'IntMap (UArr Int)' all the way through buildup, you could use an 'IntMap [Int]' while parsing, and then compress into 'IntMap (UArr Int)' when parsing is finished.
No, as I have written, this is not possible for my problem. Because this would eat all available memory after few minutes. The data I'm parsing from Netflix Prize data set contains: 17770 movies 480189 customers (with customer ids ranging from 1 to 2649429) 100480507 total ratings ratings are grouped for each movie, but I want to group them by costumers. This means that each customer has few ratings (200, on average), so there are a lot of small arrays. Since ratings for each customers are parsed "at the same time", using a plain list would consume a lot of memory, since stream fusion can only be executed at the end of the parsing. On the other hand, when I want to group ratings by movies, stream fusion seems to work fine. This is IMHO, of course.
If you can afford the larger memory profile during parsing, this should give better overall performance, with the same memory need for the final state, but larger memory needs up to that state. I'd be interested in the differences, if you try it.
Unfortunately, I can not afford this memory profile! My PC has only 2 GB of RAM. And I doubt any other could afford the required memory. Using UArr, the total required memory is about 900 MB. Thanks Manlio

Manlio Perillo wrote:
By the way, about insertWith/alter; from IntMap documentation:
insertWithKey: O(min(n,W) alter: O(log n)
So, alter is more efficient than insertWithKey? And what is that `W` ?
As Claus says it's the maximum (value of Int; number of keys). It's in an easily overlooked comment at the top of the IntMap page, last I checked. As for which is faster, it depends. The O(min(n,W)) stuff is effectively linear because n can't exceed W (what would you use for the key?), but it's a smart linear that rivals O(log n). Because the values of n are so small, what really matters here are the constant factors not the asymptotic complexity. Also it depends a lot on what operations you really need. If you're interested in the details, I highly recommend Okasaki&Gill's paper[1] where they introduce IntMap. It's eminently readable and gives comparative numbers to other datastructures for all the basic operations. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452 -- Live well, ~wren

IntMap (UArr (Word16 :*: Word8))
I was adding elements to the map using something like:
v = map singletonU (a :*: b)
insertWith appendU k v m
However doing this eats a *lot* of memory.
Since 'insertWith' doesn't actually do the 'appendU', the appends will also be compressed in time, at point of use, rather than spread out in time, over construction. Which might be inconvenient if large volumes of data are involved.
So the question is: why appending an array of only one element to an existing array causes memory problems?
It must copy the entire array.
And doing this repeatedly, with arrays of increasing length, would not be a good idea. For single-threaded use, one might use fusion to avoid the construction/recopying of the intermediate arrays. As usual, if the single-threading happens in the body of a recursive loop, compile-time fusion won't get a chance to work unless one unfolds the recursion a few steps. But one could do similar fusion dynamically, by partially reifying the construction functions (if we are appending an array that is itself created via append, then a single combined append will do). Wasn't there once a similar issue with strict bytestrings and cons? Ah, found it - different mailing list: http://www.haskell.org/pipermail/glasgow-haskell-users/2006-November/011603.... Was this actually resolved? From a quick glance, it seems I suggested runtime fusion, Don answered with compile-time fusion, Duncan suggested a different hack to achieve runtime fusion. But was the runtime fusion of nested cons in strict bytestring, or nested appends in uvector, ever implemented? Does the stream-fusion framework handle that implicitly? Claus

Claus Reinke ha scritto:
IntMap (UArr (Word16 :*: Word8))
I was adding elements to the map using something like:
v = map singletonU (a :*: b)
insertWith appendU k v m
However doing this eats a *lot* of memory.
Since 'insertWith' doesn't actually do the 'appendU', the appends will also be compressed in time, at point of use, rather than spread out in time, over construction. Which might be inconvenient if large volumes of data are involved.
Ah, ok; that may explain the problem, but I'm still not sure. With this program: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063 Memory usage seems ok. 955 MB, against 622 MB when ratings are grouped by movies ( using [(MovieID, UArr (Word32 :*: Word8))] ) The version using `insertWith appendU` is here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063#a3065 I still can not explain so much difference in memory usage.
So the question is: why appending an array of only one element to an existing array causes memory problems?
It must copy the entire array.
And doing this repeatedly, with arrays of increasing length, would not be a good idea.
I can't really see other solutions.
For single-threaded use, one might use fusion to avoid the construction/recopying of the intermediate arrays.
Fusion is not possible, in my case, IMHO. I'm parsing movie ratings, where we have customers ratings for each movie in separate files (17770 total movies). Each file has format <movie1>: <customer1>,<rating1>,<date1> <customer2>,<rating2>,<date2> ... <movie2>: <customer3>,<rating3>,<date3> ... I want to group ratings by customers, instead of grouping them by movies. Stream fusion is only possible in the latter case (but in this case I don't even need an IntMap). I'm missing something?
[...]
Manlio

Manlio Perillo ha scritto:
Hi.
As with a previous post, I think I have found a possible memory problem with the uvector package.
I have this data structure (for, again, my Netflix Prize project):
IntMap (UArr (Word16 :*: Word8))
[...]
Today I have rewritten my program to use `alter` and `snocU`:
append Nothing = Just $ singletonU (a :*: b) append (Just u) = Just $ snocU u (a :*: b)
alter append k m
[...]
Unfortunately I'm still not able to load the entire Netflix Prize training data set, grouping ratings by customers, because my PC has only 2 GB of RAM. The required memory is about 2 GB, but when the system start to swap, I have to kill the program.
Just another small strictness hint: append (Just u) = u `seq` Just $ snocU u (a :*: b) and now memory usage is 955 MB. Regards Manlio
participants (4)
-
Claus Reinke
-
Don Stewart
-
Manlio Perillo
-
wren ng thornton