
To avoid retaining a large lazy data structure in memory it is useful to hide it behind a function call. Below, "many" is used twice. It is hidden behind a function call so it can be garbage collected between uses. That's good. When compiling with "-O" it seems that GHC 7.4.1 decides to keep it in memory anyway. That's bad. (I can't read core so I don't know exactly what's going on). Replacing one of the "many" in "twice" with "different_many" makes everything fine again. Is this considered a bug in GHC? Is it a known bug? It is incredibly concerning that GHC would perform this kind of pessimisation. Tom % cat thunkfail.hs {-# OPTIONS_GHC -fno-warn-unused-binds #-} import Data.List million :: Int million = 10 ^ (6 :: Int) many :: () -> [Int] many () = [1..million] different_many :: () -> [Int] different_many () = [1..million] twice :: (Int, Int) twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ())) main :: IO () main = print twice % ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... (1784293664,1784293664) % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... Heap exhausted; Current maximum heap size is 5242880 bytes (5 MB); use `+RTS -M<size>' to increase it.

Hi, I believe, without checking, that Am Sonntag, den 24.02.2013, 17:49 +0000 schrieb Tom Ellis:
many :: () -> [Int] many () = [1..million]
is indeed called many times. The problem is that [1..million] is not dependent on the parameter, so GHC will float it out to a top level declaration, and hence share it between the multiple calls to many. You should try:
million :: () -> Int million _ = 10 ^ (6 :: Int)
many :: () -> [Int] many x = [1..million x]
Greetings, Joachim (Maybe I should continue to work on http://arxiv.org/abs/1106.3478 and related ideas, like annotating thunks in the code as unshareable... or does this mostly occur in small example programs like this?) -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote:
You should try:
million :: () -> Int million _ = 10 ^ (6 :: Int)
many :: () -> [Int] many x = [1..million x]
Thanks Joachim, but that doesn't work either. Tom % cat thunkfail.hs {-# OPTIONS_GHC -fno-warn-unused-binds #-} import Data.List million :: () -> Int million _ = 10 ^ (6 :: Int) many :: () -> [Int] many x = [1..million x] different_many :: () -> [Int] different_many x = [1..million x] twice :: (Int, Int) twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ())) main :: IO () main = print twice % ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail % +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... Heap exhausted; Current maximum heap size is 5242880 bytes (5 MB); use `+RTS -M<size>' to increase it.

Hi, Am Sonntag, den 24.02.2013, 18:41 +0000 schrieb Tom Ellis:
On Sun, Feb 24, 2013 at 07:12:24PM +0100, Joachim Breitner wrote:
You should try:
million :: () -> Int million _ = 10 ^ (6 :: Int)
many :: () -> [Int] many x = [1..million x]
Thanks Joachim, but that doesn't work either.
oh, strange, I though I had added Maybe you will need {-# NOINLINE million #-} as well. to the mailing. Must have skipped or accidentally removed it... anyways, with that pragma it works. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

In toy examples like this it will be generally hard to convince GHC not to just collapse your program down to a constant, when you're turning up the optimization level. In particular, you are implying -ffull-laziness with -O (or -O2), which can increase sharing.
GHC doesn't implement complete full-laziness. When optimisation in on, and -fno-full-laziness is not given, *some transformations that increase sharing are performed, such as extracting repeated computations from a loop*. These are the same transformations that a fully lazy implementation would do, the difference is that GHC doesn't consistently apply full-laziness, so don't rely on it.
http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.htm... If you explicitly rely on this not happening, turn it off: $ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ... $ time ./A +RTS -M750k (500000500000,500000500000) ./A +RTS -M750k 0.06s user 0.00s system 97% cpu 0.069 total A 750k heap should be enough for anyone :) -- Don On Sun, Feb 24, 2013 at 5:49 PM, Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
To avoid retaining a large lazy data structure in memory it is useful to hide it behind a function call. Below, "many" is used twice. It is hidden behind a function call so it can be garbage collected between uses. That's good. When compiling with "-O" it seems that GHC 7.4.1 decides to keep it in memory anyway. That's bad. (I can't read core so I don't know exactly what's going on). Replacing one of the "many" in "twice" with "different_many" makes everything fine again.
Is this considered a bug in GHC? Is it a known bug? It is incredibly concerning that GHC would perform this kind of pessimisation.
Tom
% cat thunkfail.hs {-# OPTIONS_GHC -fno-warn-unused-binds #-}
import Data.List
million :: Int million = 10 ^ (6 :: Int)
many :: () -> [Int] many () = [1..million]
different_many :: () -> [Int] different_many () = [1..million]
twice :: (Int, Int) twice = (foldl' (+) 0 (many ()), foldl' (+) 0 (many ()))
main :: IO () main = print twice
% ghc -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... (1784293664,1784293664)
% ghc -O -fforce-recomp -Wall -Werror -rtsopts thunkfail.hs && ./thunkfail +RTS -M5M [1 of 1] Compiling Main ( thunkfail.hs, thunkfail.o ) Linking thunkfail ... Heap exhausted; Current maximum heap size is 5242880 bytes (5 MB); use `+RTS -M<size>' to increase it.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, Feb 24, 2013 at 06:25:56PM +0000, Don Stewart wrote:
If you explicitly rely on this not happening, turn it off:
$ ghc -O2 -fno-full-laziness --make A.hs -rtsopts -fforce-recomp [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ...
$ time ./A +RTS -M750k (500000500000,500000500000) ./A +RTS -M750k 0.06s user 0.00s system 97% cpu 0.069 total
Thanks Don, that's good to know. I am not currently bitten by this issue in any production code, but I am concerned it will crop up when I least expect it. Tom
participants (3)
-
Don Stewart
-
Joachim Breitner
-
Tom Ellis