Sub-optimal

older
Unable to get parallelism using...

Andrew Coppin

12 Feb 2011 12 Feb '11

10:30 a.m.

I have a small program that fills a file with random numbers. If I compile it without optimisation, it runs in constant space. And yet, if I supply -O2 (or even just -O1), for large output files the program gobbles large amounts of RAM. Is this a known bug? (GHC 6.10.x)

Show replies by date

Daniel Fischer

12 Feb 12 Feb

11:48 a.m.

On Saturday 12 February 2011 11:30:26, Andrew Coppin wrote:

...

I have a small program that fills a file with random numbers. If I compile it without optimisation, it runs in constant space. And yet, if I supply -O2 (or even just -O1), for large output files the program gobbles large amounts of RAM.

Is this a known bug? (GHC 6.10.x)

It's known to happen when optimising shares what shouldn't be shared. Try compiling with -O2 -fno-cse (if that doesn't help, it doesn't necessarily mean it's not unwanted sharing, though). And, please, let us see some code to identify the problem.

Andrew Coppin

14 Feb 14 Feb

9 p.m.

...

...
Is this a known bug? (GHC 6.10.x)

It's known to happen when optimising shares what shouldn't be shared. Try compiling with -O2 -fno-cse (if that doesn't help, it doesn't necessarily mean it's not unwanted sharing, though). And, please, let us see some code to identify the problem.

I tried -O2 -fno-cse. No difference. I also tried -O2 -fno-full-laziness. BIG DIFFERENCE. The program now runs in constant space (like with -O0), but it also runs about 2x faster than -O0. I have no idea what these switches do, but clearly one of these optimisations is actually pessimal for this particular program. I still want to try compiling with a newer version of GHC to see what difference that makes. (And yes, if I had the code on this PC, I could post it. It's kinda long though... In essence, it just calls randomRIO a bazillion times and writes the results into a file using hPutChar.)

Max Bolingbroke

11:49 p.m.

On 14 February 2011 21:00, Andrew Coppin wrote:

...

...
...
Is this a known bug? (GHC 6.10.x)

It's known to happen when optimising shares what shouldn't be shared. Try compiling with -O2 -fno-cse (if that doesn't help, it doesn't necessarily mean it's not unwanted sharing, though). And, please, let us see some code to identify the problem.

I tried -O2 -fno-cse. No difference.

I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.

See also the very old GHC ticket at http://hackage.haskell.org/trac/ghc/ticket/917 Max

Andrew Coppin

15 Feb 15 Feb

7:15 p.m.

New subject: Sub-optimal [code]

...

...
I tried -O2 -fno-cse. No difference.

I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.

See also the very old GHC ticket at http://hackage.haskell.org/trac/ghc/ticket/917

I don't know if that's the problem or not, but it might plausibly be. Here's the smallest version of the program that I could come up with [which still misbehaves]: module Main (main) where import System.IO import System.Random main = do file_batch "01-Uniform" random_byte_uniform random_byte_uniform :: IO Int random_byte_uniform = randomRIO (0x00, 0xFF) random_file :: String -> Int -> IO Int -> IO () random_file f n rnd = do putStrLn $ "Save: " ++ f ++ " [" ++ show n ++ " bytes]" h <- openFile f WriteMode hSetBinaryMode h True mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n] hClose h file_batch :: String -> IO Int -> IO () file_batch f rnd = mapM_ (\ k -> mapM_ (\ n -> random_file (f ++ "-" ++ show k ++ "x-" ++ [n]) (10 * 1024 * 1024 * k) rnd ) "ABCD" ) [1..4] If main calls random_file directly, the program seems to work OK, so the problem seems to be file_batch. Maybe. I don't really know. I had a go at playing with -ddump-simpl, but that just generated a 8 KB file which is utterly incomprehensible. (Well, the -O0 variant is just about comprehensible. The -O2 variant isn't. But it appears that *everything* gets inlined into main...) If anybody can figure out what's happening here, I'd be interested to know.

Daniel Fischer

8:35 p.m.

New subject: Sub-optimal [code]

On Tuesday 15 February 2011 20:15:54, Andrew Coppin wrote:

...

...
...
I tried -O2 -fno-cse. No difference.

I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.

See also the very old GHC ticket at http://hackage.haskell.org/trac/ghc/ticket/917

I don't know if that's the problem or not, but it might plausibly be.

Here's the smallest version of the program that I could come up with [which still misbehaves]:

module Main (main) where

import System.IO import System.Random

main = do file_batch "01-Uniform" random_byte_uniform

random_byte_uniform :: IO Int random_byte_uniform = randomRIO (0x00, 0xFF)

random_file :: String -> Int -> IO Int -> IO () random_file f n rnd = do putStrLn $ "Save: " ++ f ++ " [" ++ show n ++ " bytes]" h <- openFile f WriteMode hSetBinaryMode h True mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n] hClose h

file_batch :: String -> IO Int -> IO () file_batch f rnd = mapM_ (\ k -> mapM_ (\ n -> random_file (f ++ "-" ++ show k ++ "x-" ++ [n]) (10 * 1024 * 1024 * k) rnd ) "ABCD" ) [1..4]

If main calls random_file directly, the program seems to work OK, so the problem seems to be file_batch. Maybe.

Or, one could say, the problem is the export list :) If you remove the export list, so that random_file is exported, the leak disappears (at least with 7.0.1, didn't test 6.12). If nothing but main is exported, GHC can be much more aggressive with inlining, and it is. The result is that the list [1 .. 10*1024*1024*k] from the penultimate line of random_file is shared between the four iterations of the inner loop in file_batch (for k = 1 .. 4). Oops. If random_file is exported, it is too big to inline it (at least if you don't specifically ask for it), so you get no sharing (even better, GHC rewrites mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n] to a nice loop, the list isn't constructed at all).

...

I don't really know. I had a go at playing with -ddump-simpl, but that just generated a 8 KB file which is utterly incomprehensible. (Well, the -O0 variant is just about comprehensible. The -O2 variant isn't.

You have to look for interesting stuff (in this case the list [1 .. n]) and note its identifier (yes, coping with the identifiers in core is hard, especially when they are entirely compiler-generated and don't start with a source-code name), then see how it is used.

...

But it appears that *everything* gets inlined into main...)

That's kind of the point of module Main (main) where Sometimes that's good, other times not.

...

If anybody can figure out what's happening here, I'd be interested to know.

Due to the extensive inlining, GHC sees that some values are reused, so it decides to share those values instead of recomputing them. Unfortunately, those values are long lists. Making GHC look at smaller chunks of the code prevents that, as does turning off full-laziness (in both cases the let-binding of the list doesn't get floated out of random_file, that floating [more precisely, the resulting sharing] is what causes the leak). Which makes me wonder: unwanted sharing of lists [1 .. n] or similar is a frequent cause of space leaks, so would it be possible to teach GHC to not share such lists (unless they're bound to a name to indicate sharing is wanted)? In particular for enumerations [a .. b] of type [Int], [Integer] or similar, I'm pretty sure that the cost of recomputation is far outweighed by the memory consumption of sharing in almost all cases.

Claude Heiland-Allen

9:20 p.m.

New subject: Sub-optimal [code]

On 15/02/11 20:35, Daniel Fischer wrote:

...

Which makes me wonder: unwanted sharing of lists [1 .. n] or similar is a frequent cause of space leaks, so would it be possible to teach GHC to not share such lists (unless they're bound to a name to indicate sharing is wanted)?

...

In particular for enumerations [a .. b] of type [Int], [Integer] or similar, I'm pretty sure that the cost of recomputation is far outweighed by the memory consumption of sharing in almost all cases.

Compare with the heap profile graph output from this short program which uses a horrible data-dependency hack to force recomputation: main = do print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [(1 :: Int) .. 10000]] print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [x+1-x .. 10000]] The heap profile graph looks a little like this: ######## ######## ######## ######## ########_______ (Tested with ghc 6.12.3 -O2 on linux x86_64) Claude -- http://claudiusmaximus.goto10.org

Daniel Fischer

10:04 p.m.

New subject: Sub-optimal [code]

On Tuesday 15 February 2011 22:20:06, Claude Heiland-Allen wrote:

...

Compare with the heap profile graph output from this short program which uses a horrible data-dependency hack to force recomputation:

main = do print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [(1 :: Int) .. 10000]] print $ length [(x,y) | x <- [(1 :: Int) .. 10000], y <- [x+1-x .. 10000]]

The heap profile graph looks a little like this:

######## ######## ######## ######## ########_______

(Tested with ghc 6.12.3 -O2 on linux x86_64)

Yup, confirmed with 6.12.3 and 7.0.1 on x86 linux (again behaves differently with -fno-full-laziness). Not only does the second use less memory, it is also faster (something around 10%). Thanks for the nice example.

Andrew Coppin

10:29 p.m.

New subject: Sub-optimal [code]

On 15/02/2011 08:35 PM, Daniel Fischer wrote:

...

The result is that the list

[1 .. 10*1024*1024*k]

from the penultimate line of random_file is shared between the four iterations of the inner loop in file_batch (for k = 1 .. 4). Oops.

Ouch! That's gotta sting in the morning... o_O I suppose what we could really do with is a combinator that runs a monadic action N times, without actually constructing a list N elements long in order to do so. Then it becomes blatently obvious that there's nothing to share, and the problem goes away. They say that in Haskell, "a list *is* a loop". Apparently, not always. ;-)

Daniel Fischer

11:05 p.m.

New subject: Sub-optimal [code]

On Tuesday 15 February 2011 23:29:39, Andrew Coppin wrote:

...

On 15/02/2011 08:35 PM, Daniel Fischer wrote:

...
The result is that the list

[1 .. 10*1024*1024*k]

from the penultimate line of random_file is shared between the four iterations of the inner loop in file_batch (for k = 1 .. 4). Oops.

Ouch! That's gotta sting in the morning... o_O

I think it has already stung, or we wouldn't have this thread, would we?

...

I suppose what we could really do with is a combinator that runs a monadic action N times, without actually constructing a list N elements long in order to do so.

True enough. But I guess nobody¹ bothered yet because there are so many possible designs and most of them are trivial to implement in a line or two (so the pain of writing them repeatedly isn't bad enough). ¹ Not quite true, there's the monad-loops package on hackage which provides a handful of loops. But not the trivial nTimesDo.

...

Then it becomes blatently obvious that there's nothing to share, and the problem goes away.

They say that in Haskell, "a list *is* a loop". Apparently, not always.

No, not always. Some lists are just, you know, lists.

James Andrew Cook

16 Feb 16 Feb

6:31 p.m.

New subject: Sub-optimal [code]

On Feb 15, 2011, at 6:05 PM, Daniel Fischer wrote:

...

On Tuesday 15 February 2011 23:29:39, Andrew Coppin wrote:

...
Ouch!

I suppose what we could really do with is a combinator that runs a monadic action N times, without actually constructing a list N elements long in order to do so.

True enough. But I guess nobody¹ bothered yet because there are so many possible designs and most of them are trivial to implement in a line or two (so the pain of writing them repeatedly isn't bad enough).

¹ Not quite true, there's the monad-loops package on hackage which provides a handful of loops. But not the trivial nTimesDo.

Doesn't Control.Monad.replicateM_ do exactly that? -- James

Daniel Fischer

7:08 p.m.

New subject: Sub-optimal [code]

On Wednesday 16 February 2011 19:31:05, James Andrew Cook wrote:

...

Doesn't Control.Monad.replicateM_ do exactly that?

Yes, right, forgot about that. That would've worked fine in Andrew's case.

Andrew Coppin

9:51 p.m.

New subject: Sub-optimal [code]

On 16/02/2011 06:31 PM, James Andrew Cook wrote:

...

Doesn't Control.Monad.replicateM_ do exactly that?

10 points to Gryffindore. (Now, if only there was a version that feeds an integer to the monadic action as well... Still, it's not hard to implement.)

Max Bolingbroke

10:13 p.m.

New subject: Sub-optimal [code]

On 16 February 2011 21:51, Andrew Coppin wrote:

...

(Now, if only there was a version that feeds an integer to the monadic action as well... Still, it's not hard to implement.)

As simple as: forM [1..x] mk_my_action

Daniel Fischer

10:48 p.m.

New subject: Sub-optimal [code]

On Wednesday 16 February 2011 23:13:10, Max Bolingbroke wrote:

...

On 16 February 2011 21:51, Andrew Coppin wrote:

...
(Now, if only there was a version that feeds an integer to the monadic action as well... Still, it's not hard to implement.)

As simple as: forM [1..x] mk_my_action

The problem with that is that under certain circumstances the list is shared in nested loops, which was what caused the thread (it was mapM_ and not forM_, but I'd be very surprised if they behaved differently with -O2). What Andrew wants is a listless forM[_].

Max Bolingbroke

11:09 p.m.

New subject: Sub-optimal [code]

On 16 February 2011 22:48, Daniel Fischer wrote:

...

The problem with that is that under certain circumstances the list is shared in nested loops, which was what caused the thread (it was mapM_ and not forM_, but I'd be very surprised if they behaved differently with -O2).

Yep - d'oh! Thinking about it some more, this example is actually quite interesting because if you *prevent* the list from being floated the forM gets foldr/build fused into a totally listless optimal loop. It really does seem like a shame to disable that optimisation because of the floating... if only the fusion hit before float-out was run. Max

Andrew Coppin

20 Feb 20 Feb

3:10 p.m.

New subject: Sub-optimal [code]

On 16/02/2011 11:09 PM, Max Bolingbroke wrote:

...

Thinking about it some more, this example is actually quite interesting because if you *prevent* the list from being floated the forM gets foldr/build fused into a totally listless optimal loop. It really does seem like a shame to disable that optimisation because of the floating... if only the fusion hit before float-out was run.

It seems to me that in this case, I'm using a list when what I actually "mean" is a stream. (Here "stream" refers to the construct used in the stream-fusion package.) I can't actually *want* a physical data structure to be constructed - so why am I asking for one? It sems to me that lots of Haskell code uses lists where it actually means streams... The stream-fusion package uses streams to fuse together list operations, but I rather suspect it would be cleaner and more helpful if people wrote code explicitly in terms of streams in the first place, except for the small minority of places where you really do want an actual list. But that's just my opinion...

Maciej Wos

13 Feb 13 Feb

4:59 p.m.

I was battling a similar (the same?) issue recently. The problem might indeed be caused by excessive sharing. There's a good example in GHC's trac [1]. Try compiling your code with -O2 and -fno-full-laziness. There is also an issue with full-laziness and recursive overloaded functions [2]. Again, compiling with -fno-full-laziness should help. Alternatively, if you're using ghc-7.0.1, try switching to HEAD. -- Maciej [1] http://hackage.haskell.org/trac/ghc/ticket/917 [2] http://www.haskell.org/pipermail/glasgow-haskell-users/2011-February/019997.... On Sat, Feb 12, 2011 at 7:30 PM, Andrew Coppin wrote:

...

I have a small program that fills a file with random numbers. If I compile it without optimisation, it runs in constant space. And yet, if I supply -O2 (or even just -O1), for large output files the program gobbles large amounts of RAM.

Is this a known bug? (GHC 6.10.x)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

5249

Age (days ago)

5257

Last active (days ago)

List overview

Download

17 comments

6 participants

participants (6)

Andrew Coppin
Claude Heiland-Allen
Daniel Fischer
James Andrew Cook
Maciej Wos
Max Bolingbroke

Sub-optimal

Andrew Coppin

Andrew Coppin

Andrew Coppin

Claude Heiland-Allen

Andrew Coppin

James Andrew Cook

Andrew Coppin

Andrew Coppin

Maciej Wos

tags

participants (6)