Increasing memory use in stream computation

(Sorry for the long email.) Summary: why does the attached program have non-constant memory use? ==== Introduction ==== I've written a program to do a big computation. Unfortunately, the computation takes a very long time (expectedly), and the memory use increases slowly (unexpectedly), until it fills up the entire memory and swap space of the computer (many gigabytes). The rough structure of the program is: • create a long (up to 20 million) list of objects; • compute a number for each of those objects; • compute the sum of the resulting list. I switched the intermediate data structure from a list to a Stream (from the stream-fusion package), hoping to fix the memory issue. It decreased both the memory use and the rate of its increase, but after a long time, the program still uses up all available memory. ==== A simple program After many hours of cutting down my program, I now have a small program (attached) that shows the same behaviour. It uses only said stream-fusion package, and vector. (I haven't yet tried to cut out the use of vector. I hope it is irrelevant, because all vectors are of fixed small size.) I compile the program with ghc-7.6.1 using
ghc --make -threaded -rtsopts -with-rtsopts="-M1G -K128M" -O2 -main-is Test.main Test
The rts options may not be strictly necessary: I added them at some point to allow the use of multiple cores, and to prevent the program from crashing the machine by using all available memory. When running the program, the resident memory quickly grows to about 3.5 MB (which I am fine with); this stays constant for a long time, but after about 7 minutes, it starts to grow further. The growth is slow, but I really would hope this program to run in constant memory. ==== The code ==== Note that I added an instance for Monad Stream, using concatMap. This is implicitly used in the definition of the big stream. The source of Data.Stream contains many alternative implementations of concat and concatMap, and alludes to the difficulty of making it fuse properly. Could it be that the fusion did not succeed in this case? Thanks for any help! Regards, Arie

Hi Arie, On 10/10/13 14:02, Arie Peterson wrote:
(Sorry for the long email.)
Summary: why does the attached program have non-constant memory use?
Looking at the heap profile graph (generated with +RTS -h, no need to compile with profiling) I see the increasing memory use is split about evenly between STACK and BLACKHOLE. I don't know what that means or why it occurs, but replacing `small` solved that problem for me: small = V.fromList <$> S.stream (replicateM 7 [-1,0,0,1]) I get the same output 3999744 from your version and my changed version. Claude
==== Introduction ====
I've written a program to do a big computation. Unfortunately, the computation takes a very long time (expectedly), and the memory use increases slowly (unexpectedly), until it fills up the entire memory and swap space of the computer (many gigabytes).
The rough structure of the program is:
• create a long (up to 20 million) list of objects; • compute a number for each of those objects; • compute the sum of the resulting list.
I switched the intermediate data structure from a list to a Stream (from the stream-fusion package), hoping to fix the memory issue. It decreased both the memory use and the rate of its increase, but after a long time, the program still uses up all available memory.
==== A simple program
After many hours of cutting down my program, I now have a small program (attached) that shows the same behaviour. It uses only said stream-fusion package, and vector. (I haven't yet tried to cut out the use of vector. I hope it is irrelevant, because all vectors are of fixed small size.)
I compile the program with ghc-7.6.1 using
ghc --make -threaded -rtsopts -with-rtsopts="-M1G -K128M" -O2 -main-is Test.main Test
The rts options may not be strictly necessary: I added them at some point to allow the use of multiple cores, and to prevent the program from crashing the machine by using all available memory.
When running the program, the resident memory quickly grows to about 3.5 MB (which I am fine with); this stays constant for a long time, but after about 7 minutes, it starts to grow further. The growth is slow, but I really would hope this program to run in constant memory.
==== The code ====
Note that I added an instance for Monad Stream, using concatMap. This is implicitly used in the definition of the big stream.
The source of Data.Stream contains many alternative implementations of concat and concatMap, and alludes to the difficulty of making it fuse properly. Could it be that the fusion did not succeed in this case?
Thanks for any help!
Regards,
Arie

Hi Claude,
Looking at the heap profile graph (generated with +RTS -h, no need to compile with profiling) I see the increasing memory use is split about evenly between STACK and BLACKHOLE. I don't know what that means or why it occurs, but replacing `small` solved that problem for me:
small = V.fromList <$> S.stream (replicateM 7 [-1,0,0,1])
Interesting! Unfortunately, my real code is more complicated, and I can't simplify its "small" function in this way. (The list [-1,0,0,1], that is being streamed in the do block, in the full program depends on some parameter that changes on each iteration.) Although, maybe I can do all the logic of the "small" function in the list monad, and stream the resulting list, as you do in the above.

Hi Claude, On Thursday 10 October 2013 20:05:37 I wrote:
Although, maybe I can do all the logic of the "small" function in the list monad, and stream the resulting list, as you do in the above.
I tried a corresponding variant of my full program, but the memory use is quite a lot higher at the start, and increases by large amounts (compared to the version that streams at all levels). So, I'm still at a loss. Regards, Arie

Arie Peterson wrote:
(Sorry for the long email.)
Summary: why does the attached program have non-constant memory use?
Unfortunately, I don't know. I'll intersperse some remarks and propose an alternative to stream fusion at the end, which allows your test program to run in constant space.
==== A simple program
When running the program, the resident memory quickly grows to about 3.5 MB (which I am fine with); this stays constant for a long time, but after about 7 minutes, it starts to grow further. The growth is slow, but I really would hope this program to run in constant memory.
A quicker way to spot the increased memory usage is to look at GC statistics. I used
./Test +RTS -Sstderr 2>&1 | grep 'Gen: 1' 569904 8192 65488 0.00 0.00 0.01 0.01 0 0 (Gen: 1) 516520 9768 67080 0.00 0.00 4.23 4.23 0 0 (Gen: 1) 513824 14136 71448 0.00 0.00 8.43 8.44 0 0 (Gen: 1) 515856 16728 74040 0.00 0.00 12.70 12.75 0 0 (Gen: 1) 515416 19080 76392 0.00 0.00 17.01 17.11 0 0 (Gen: 1) 515856 22248 79560 0.00 0.00 21.33 21.48 0 0 (Gen: 1) 514936 25080 82392 0.00 0.00 25.65 25.84 0 0 (Gen: 1) 514936 28632 85944 0.00 0.00 29.94 30.16 0 0 (Gen: 1) 513512 32328 89640 0.00 0.00 34.24 34.48 0 0 (Gen: 1) 515224 37032 127112 0.00 0.00 38.35 38.62 0 0 (Gen: 1)
Note the increasing values in the third column; that's the live bytes after each major GC.
==== The code ====
Note that I added an instance for Monad Stream, using concatMap. This is implicitly used in the definition of the big stream.
The source of Data.Stream contains many alternative implementations of concat and concatMap, and alludes to the difficulty of making it fuse properly. Could it be that the fusion did not succeed in this case?
I had a glimpse at the core code generated by ghc, but the amount of code is overwhelming. From reading the source code, and as far as my intuition goes, the code *should* run in constant space. As an experiment, I rewrote the code using difference lists, and the result ran in constant memory. I then tried to abstract this idea into a nice data type. (I lost the low-level difference list code on the way, but the code was quite hard to read anyway.) I ended up with an odd mixture of a difference lists and a continuation that should be applied to each element: data Stream a where Stream :: (forall r. (a -> r) -> [r] -> [r]) -> Stream a with Stream s representing the list s id []. The motivation for the (a -> r) argument is that it makes fmap trivial: fmap f (Stream s) = Stream (\g -> s (g . f)) I'll attach the full code below (it's a separate module, "Stream.hs" that can be imported instead of Data.Stream for your small example.) With that replacement, the code runs in constant space and becomes about 3x faster. Using the 'singleton' function for 'return' results in an additional, but very modest (about 10%) speedup. I wonder whether that approach scales up to your real code. Enjoy, Bertram

Hi Bertram,
Unfortunately, I don't know. I'll intersperse some remarks and propose an alternative to stream fusion at the end, which allows your test program to run in constant space.
A quicker way to spot the increased memory usage is to look at GC statistics. I used
./Test +RTS -Sstderr 2>&1 | grep 'Gen: 1'
[…]
Thanks for the suggestion.
I had a glimpse at the core code generated by ghc, but the amount of code is overwhelming. From reading the source code, and as far as my intuition goes, the code *should* run in constant space.
Yes, my intuition says the same. For me though, this is only based on an informal imperative interpretation of the code, not on any understanding of the Stream internals.
As an experiment, I rewrote the code using difference lists, and the result ran in constant memory. I then tried to abstract this idea into a nice data type. (I lost the low-level difference list code on the way, but the code was quite hard to read anyway.) […]> I'll attach the full code below (it's a separate module, "Stream.hs" that can be imported instead of Data.Stream for your small example.) With that replacement, the code runs in constant space and becomes about 3x faster. Using the 'singleton' function for 'return' results in an additional, but very modest (about 10%) speedup.
I wonder whether that approach scales up to your real code.
Awesome, thanks for all this work! Unfortunately, the modified full program does not use constant memory. I replaced Data.Stream by your Stream, and added one more function "append" to it (which was very natural, given the difference list nature). This version increases its resident size by about 1MB after roughly 10 minutes, and then again after another 10 minutes. This makes it a significant improvement over Data.Stream, memory-wise, but I'm not sure if it will be enough to perform the entire computation without running out of memory.
participants (3)
-
Arie Peterson
-
Bertram Felgenhauer
-
Claude Heiland-Allen