
I have a small program that fills a file with random numbers. If I compile it without optimisation, it runs in constant space. And yet, if I supply -O2 (or even just -O1), for large output files the program gobbles large amounts of RAM. Is this a known bug? (GHC 6.10.x)

On Saturday 12 February 2011 11:30:26, Andrew Coppin wrote:
I have a small program that fills a file with random numbers. If I compile it without optimisation, it runs in constant space. And yet, if I supply -O2 (or even just -O1), for large output files the program gobbles large amounts of RAM.
Is this a known bug? (GHC 6.10.x)
It's known to happen when optimising shares what shouldn't be shared. Try compiling with -O2 -fno-cse (if that doesn't help, it doesn't necessarily mean it's not unwanted sharing, though). And, please, let us see some code to identify the problem.

Is this a known bug? (GHC 6.10.x)
It's known to happen when optimising shares what shouldn't be shared. Try compiling with -O2 -fno-cse (if that doesn't help, it doesn't necessarily mean it's not unwanted sharing, though). And, please, let us see some code to identify the problem.
I tried -O2 -fno-cse. No difference. I also tried -O2 -fno-full-laziness. BIG DIFFERENCE. The program now runs in constant space (like with -O0), but it also runs about 2x faster than -O0. I have no idea what these switches do, but clearly one of these optimisations is actually pessimal for this particular program. I still want to try compiling with a newer version of GHC to see what difference that makes. (And yes, if I had the code on this PC, I could post it. It's kinda long though... In essence, it just calls randomRIO a bazillion times and writes the results into a file using hPutChar.)

On 14 February 2011 21:00, Andrew Coppin
Is this a known bug? (GHC 6.10.x)
It's known to happen when optimising shares what shouldn't be shared. Try compiling with -O2 -fno-cse (if that doesn't help, it doesn't necessarily mean it's not unwanted sharing, though). And, please, let us see some code to identify the problem.
I tried -O2 -fno-cse. No difference.
I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.
See also the very old GHC ticket at http://hackage.haskell.org/trac/ghc/ticket/917 Max

I tried -O2 -fno-cse. No difference.
I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.
See also the very old GHC ticket at http://hackage.haskell.org/trac/ghc/ticket/917
I don't know if that's the problem or not, but it might plausibly be. Here's the smallest version of the program that I could come up with [which still misbehaves]: module Main (main) where import System.IO import System.Random main = do file_batch "01-Uniform" random_byte_uniform random_byte_uniform :: IO Int random_byte_uniform = randomRIO (0x00, 0xFF) random_file :: String -> Int -> IO Int -> IO () random_file f n rnd = do putStrLn $ "Save: " ++ f ++ " [" ++ show n ++ " bytes]" h <- openFile f WriteMode hSetBinaryMode h True mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n] hClose h file_batch :: String -> IO Int -> IO () file_batch f rnd = mapM_ (\ k -> mapM_ (\ n -> random_file (f ++ "-" ++ show k ++ "x-" ++ [n]) (10 * 1024 * 1024 * k) rnd ) "ABCD" ) [1..4] If main calls random_file directly, the program seems to work OK, so the problem seems to be file_batch. Maybe. I don't really know. I had a go at playing with -ddump-simpl, but that just generated a 8 KB file which is utterly incomprehensible. (Well, the -O0 variant is just about comprehensible. The -O2 variant isn't. But it appears that *everything* gets inlined into main...) If anybody can figure out what's happening here, I'd be interested to know.

On Tuesday 15 February 2011 20:15:54, Andrew Coppin wrote:
I tried -O2 -fno-cse. No difference.
I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.
See also the very old GHC ticket at http://hackage.haskell.org/trac/ghc/ticket/917
I don't know if that's the problem or not, but it might plausibly be.
Here's the smallest version of the program that I could come up with [which still misbehaves]:
module Main (main) where
import System.IO import System.Random
main = do file_batch "01-Uniform" random_byte_uniform
random_byte_uniform :: IO Int random_byte_uniform = randomRIO (0x00, 0xFF)
random_file :: String -> Int -> IO Int -> IO () random_file f n rnd = do putStrLn $ "Save: " ++ f ++ " [" ++ show n ++ " bytes]" h <- openFile f WriteMode hSetBinaryMode h True mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n] hClose h
file_batch :: String -> IO Int -> IO () file_batch f rnd = mapM_ (\ k -> mapM_ (\ n -> random_file (f ++ "-" ++ show k ++ "x-" ++ [n]) (10 * 1024 * 1024 * k) rnd ) "ABCD" ) [1..4]
If main calls random_file directly, the program seems to work OK, so the problem seems to be file_batch. Maybe.
Or, one could say, the problem is the export list :) If you remove the export list, so that random_file is exported, the leak disappears (at least with 7.0.1, didn't test 6.12). If nothing but main is exported, GHC can be much more aggressive with inlining, and it is. The result is that the list [1 .. 10*1024*1024*k] from the penultimate line of random_file is shared between the four iterations of the inner loop in file_batch (for k = 1 .. 4). Oops. If random_file is exported, it is too big to inline it (at least if you don't specifically ask for it), so you get no sharing (even better, GHC rewrites mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n] to a nice loop, the list isn't constructed at all).
I don't really know. I had a go at playing with -ddump-simpl, but that just generated a 8 KB file which is utterly incomprehensible. (Well, the -O0 variant is just about comprehensible. The -O2 variant isn't.
You have to look for interesting stuff (in this case the list [1 .. n]) and note its identifier (yes, coping with the identifiers in core is hard, especially when they are entirely compiler-generated and don't start with a source-code name), then see how it is used.
But it appears that *everything* gets inlined into main...)
That's kind of the point of module Main (main) where Sometimes that's good, other times not.
If anybody can figure out what's happening here, I'd be interested to know.
Due to the extensive inlining, GHC sees that some values are reused, so it decides to share those values instead of recomputing them. Unfortunately, those values are long lists. Making GHC look at smaller chunks of the code prevents that, as does turning off full-laziness (in both cases the let-binding of the list doesn't get floated out of random_file, that floating [more precisely, the resulting sharing] is what causes the leak). Which makes me wonder: unwanted sharing of lists [1 .. n] or similar is a frequent cause of space leaks, so would it be possible to teach GHC to not share such lists (unless they're bound to a name to indicate sharing is wanted)? In particular for enumerations [a .. b] of type [Int], [Integer] or similar, I'm pretty sure that the cost of recomputation is far outweighed by the memory consumption of sharing in almost all cases.

On 15/02/11 20:35, Daniel Fischer wrote:
Which makes me wonder: unwanted sharing of lists [1 .. n] or similar is a frequent cause of space leaks, so would it be possible to teach GHC to not share such lists (unless they're bound to a name to indicate sharing is wanted)?
In particular for enumerations [a .. b] of type [Int], [Integer] or similar, I'm pretty sure that the cost of recomputation is far outweighed by the memory consumption of sharing in almost all cases.
Compare with the heap profile graph output from this short program which uses a horrible data-dependency hack to force recomputation: main = do print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [(1 :: Int) .. 10000]] print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [x+1-x .. 10000]] The heap profile graph looks a little like this: ######## ######## ######## ######## ########_______ (Tested with ghc 6.12.3 -O2 on linux x86_64) Claude -- http://claudiusmaximus.goto10.org

On Tuesday 15 February 2011 22:20:06, Claude Heiland-Allen wrote:
Compare with the heap profile graph output from this short program which uses a horrible data-dependency hack to force recomputation:
main = do print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [(1 :: Int) .. 10000]] print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [x+1-x .. 10000]]
The heap profile graph looks a little like this:
######## ######## ######## ######## ########_______
(Tested with ghc 6.12.3 -O2 on linux x86_64)
Yup, confirmed with 6.12.3 and 7.0.1 on x86 linux (again behaves differently with -fno-full-laziness). Not only does the second use less memory, it is also faster (something around 10%). Thanks for the nice example.

On 15/02/2011 08:35 PM, Daniel Fischer wrote:
The result is that the list
[1 .. 10*1024*1024*k]
from the penultimate line of random_file is shared between the four iterations of the inner loop in file_batch (for k = 1 .. 4). Oops.
Ouch! That's gotta sting in the morning... o_O I suppose what we could really do with is a combinator that runs a monadic action N times, without actually constructing a list N elements long in order to do so. Then it becomes blatently obvious that there's nothing to share, and the problem goes away. They say that in Haskell, "a list *is* a loop". Apparently, not always. ;-)

On Tuesday 15 February 2011 23:29:39, Andrew Coppin wrote:
On 15/02/2011 08:35 PM, Daniel Fischer wrote:
The result is that the list
[1 .. 10*1024*1024*k]
from the penultimate line of random_file is shared between the four iterations of the inner loop in file_batch (for k = 1 .. 4). Oops.
Ouch! That's gotta sting in the morning... o_O
I think it has already stung, or we wouldn't have this thread, would we?
I suppose what we could really do with is a combinator that runs a monadic action N times, without actually constructing a list N elements long in order to do so.
True enough. But I guess nobody¹ bothered yet because there are so many possible designs and most of them are trivial to implement in a line or two (so the pain of writing them repeatedly isn't bad enough). ¹ Not quite true, there's the monad-loops package on hackage which provides a handful of loops. But not the trivial nTimesDo.
Then it becomes blatently obvious that there's nothing to share, and the problem goes away.
They say that in Haskell, "a list *is* a loop". Apparently, not always.
No, not always. Some lists are just, you know, lists.

On Feb 15, 2011, at 6:05 PM, Daniel Fischer
On Tuesday 15 February 2011 23:29:39, Andrew Coppin wrote:
Ouch!
I suppose what we could really do with is a combinator that runs a monadic action N times, without actually constructing a list N elements long in order to do so.
True enough. But I guess nobody¹ bothered yet because there are so many possible designs and most of them are trivial to implement in a line or two (so the pain of writing them repeatedly isn't bad enough).
¹ Not quite true, there's the monad-loops package on hackage which provides a handful of loops. But not the trivial nTimesDo.
Doesn't Control.Monad.replicateM_ do exactly that? -- James

On 16/02/2011 06:31 PM, James Andrew Cook wrote:
Doesn't Control.Monad.replicateM_ do exactly that?
10 points to Gryffindore. (Now, if only there was a version that feeds an integer to the monadic action as well... Still, it's not hard to implement.)

On Wednesday 16 February 2011 23:13:10, Max Bolingbroke wrote:
On 16 February 2011 21:51, Andrew Coppin
wrote: (Now, if only there was a version that feeds an integer to the monadic action as well... Still, it's not hard to implement.)
As simple as: forM [1..x] mk_my_action
The problem with that is that under certain circumstances the list is shared in nested loops, which was what caused the thread (it was mapM_ and not forM_, but I'd be very surprised if they behaved differently with -O2). What Andrew wants is a listless forM[_].

On 16 February 2011 22:48, Daniel Fischer
The problem with that is that under certain circumstances the list is shared in nested loops, which was what caused the thread (it was mapM_ and not forM_, but I'd be very surprised if they behaved differently with -O2).
Yep - d'oh! Thinking about it some more, this example is actually quite interesting because if you *prevent* the list from being floated the forM gets foldr/build fused into a totally listless optimal loop. It really does seem like a shame to disable that optimisation because of the floating... if only the fusion hit before float-out was run. Max

On 16/02/2011 11:09 PM, Max Bolingbroke wrote:
Thinking about it some more, this example is actually quite interesting because if you *prevent* the list from being floated the forM gets foldr/build fused into a totally listless optimal loop. It really does seem like a shame to disable that optimisation because of the floating... if only the fusion hit before float-out was run.
It seems to me that in this case, I'm using a list when what I actually "mean" is a stream. (Here "stream" refers to the construct used in the stream-fusion package.) I can't actually *want* a physical data structure to be constructed - so why am I asking for one? It sems to me that lots of Haskell code uses lists where it actually means streams... The stream-fusion package uses streams to fuse together list operations, but I rather suspect it would be cleaner and more helpful if people wrote code explicitly in terms of streams in the first place, except for the small minority of places where you really do want an actual list. But that's just my opinion...

I was battling a similar (the same?) issue recently. The problem might
indeed be caused by excessive sharing. There's a good example in GHC's
trac [1]. Try compiling your code with -O2 and -fno-full-laziness.
There is also an issue with full-laziness and recursive overloaded
functions [2]. Again, compiling with -fno-full-laziness should help.
Alternatively, if you're using ghc-7.0.1, try switching to HEAD.
-- Maciej
[1] http://hackage.haskell.org/trac/ghc/ticket/917
[2] http://www.haskell.org/pipermail/glasgow-haskell-users/2011-February/019997....
On Sat, Feb 12, 2011 at 7:30 PM, Andrew Coppin
I have a small program that fills a file with random numbers. If I compile it without optimisation, it runs in constant space. And yet, if I supply -O2 (or even just -O1), for large output files the program gobbles large amounts of RAM.
Is this a known bug? (GHC 6.10.x)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (6)
-
Andrew Coppin
-
Claude Heiland-Allen
-
Daniel Fischer
-
James Andrew Cook
-
Maciej Wos
-
Max Bolingbroke