
Hi, I see a stack overflow with this code, but I don't understand why. Test.hs -------------- main :: IO () main = do let list = ([1..3*1000*1000] :: [Int]) let !x = foldl'2 (flip seq) () list return x -- foldl'2 is just the __GLASGOW_HASKELL__ version of Data.List.foldl'. -- http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/ -- src/Data-List.html -- The test case behaves the same if I call foldl' instead of foldl'2. foldl'2 :: (a -> b -> a) -> a -> [b] -> a foldl'2 f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = let z' = f z x in z' `seq` (lgo z' xs) $ ghc Test.hs -o Test $ time ./Test real 0m0.087s user 0m0.084s sys 0m0.000s $ ghc Test.hs -o Test -O $ time ./Test Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it. real 0m1.780s user 0m1.764s sys 0m0.004s $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.12.3 I looked at the Core output with ghc-core, with or without optimizations, and I see a $wlgo recursive function that doesn't appear to end in a tail call. I don't see any let expressions in the folding code, so I assume no thunks are being created. I can make a tail call appear by doing either of two things: 1. Replace "lgo z []" with "lgo !z []". This suggestion came from an email on haskell-beginners that I can't find right now. It was a few months ago. 2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use the other version: foldl' f a [] = a foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs My test case is contrived. Originally, I had a program that read lines from a file as Data.ByteString values, and I was trying to debug a stack overflow. I added the foldl' call to force the evaluation of the ByteString lines, but the foldl' call itself overflowed the stack. I might have fixed my original stack overflow problem. I was applying sum to a large list of integers, and sum is lazy. I don't think I have any real code anymore that overflows the stack, but I'm uncomfortable because I don't know why my test case doesn't work. Is the foldl' call in my test case allowed to have linear stack usage? Thanks, -Ryan

On Friday 10 September 2010 11:29:56, Ryan Prichard wrote:
Hi,
I see a stack overflow with this code, but I don't understand why.
Neither do I, really, but ghc is being too clever for its own good - or not clever enough.
I looked at the Core output with ghc-core, with or without optimizations, and I see a $wlgo recursive function that doesn't appear to end in a tail call.
Without optimisations, I see a nice tail-recursive lgo inside foldl'2, pretty much what one would like to see. With optimisations, you get a specialised worker $wlgo: Rec { Main.$wlgo :: [GHC.Types.Int] -> (##) GblId [Arity 1 NoCafRefs Str: DmdType S] Main.$wlgo = \ (w_sms :: [GHC.Types.Int]) -> case case w_sms of _ { [] -> GHC.Unit.(); : x_adq xs_adr -> case x_adq of _ { GHC.Types.I# _ -> case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() } } } of _ { () -> GHC.Prim.(##) } end Rec } and it's here that ghc shows the wrong amount of cleverness. What have we? At the types () and [Int], with f = flip seq, the step in lgo unfolds to lgo z (x:xs) ~> let z' = f z x in lgo z' xs ~> case f z x of z'@() -> lgo z' xs ~> case (case x of { I# _ -> z }) of z'@() -> lgo z' xs Now flip seq returns its first argument unless its second argument is _|_ and () has only one non-bottom value, so the ()-argument of lgo can be eliminated (here, the initial ()-argument is known to be (), in general the wrapper checks for _|_ before entering the worker), giving wlgo :: [Int] -> () wlgo [] = () wlgo (x:xs) = case (case x of { I# _ -> () }) of () -> wlgo xs It would be nice if the compiler rewrote the last equation to wlgo (x:xs) -> case x of { I# _ -> wlgo xs } , but apparently it can't. Still, that's pretty good code. Now comes the misplaced cleverness. Working with unboxed types is better (faster) than working with boxed types, so let's use unboxed types, giving $wlgo the type [Int] -> (##) (unboxed 0-tuple). But it wasn't clever enough to directly return (# #) in the case of an empty list - that would've allowed the core to be \ (w_sms :: [GHC.Types.Int]) -> case w_sms of _ { [] -> GHC.Prim.(##) : x_adq xs_adr -> case x_adq of _ { GHC.Types.I# _ -> Main.$wlgo xs_adr } } and all would've been fine and dandy. So it chose [] -> GHC.Unit.() and that forced the second branch to also have that type, hence you can't have a tail call there but have to box the result of the recursive call (only to unbox it again). So you get superfluous boxing, unboxing, reboxing in addition to a stack- eating recursion. But you've hit a rare case here, if you use foldl'2 (flip seq) at a type with more than one non-bottom value, the first argument of lgo is not eliminated and you don't get the boxing-unboxing dance, instead you get a nice tail-recursion, even if foldl'2 is used only once and not exported. Why GHC does that for (), beats me.
I don't see any let expressions in the folding code, so I assume no thunks are being created. I can make a tail call appear by doing either of two things:
1. Replace "lgo z []" with "lgo !z []". This suggestion came from an email on haskell-beginners that I can't find right now. It was a few months ago.
Perhaps http://www.haskell.org/pipermail/beginners/2010-August/005016.html ?
2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use the other version:
foldl' f a [] = a foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs
But that needs to pass also the function, which is generally slower than having it as a static argument. 3. {-# NOINLINE foldl'2 #-} But that's not so good for performance in general. Option 1 gives the best Core, but it changes behaviour, with that, foldl' (\_ _ -> 1) undefined [0] = 1 foldl'2 (\_ _ -> 1) undefined [0] = _|_ To retain the behaviour of foldl', foldl'2 f z0 xs0 = case xs0 of [] -> z0 (x:xs) -> lgo (f z0 x) xs where lgo !z [] = z lgo z (y:ys) = lgo (f z y) ys
My test case is contrived. Originally, I had a program that read lines from a file as Data.ByteString values, and I was trying to debug a stack overflow. I added the foldl' call to force the evaluation of the ByteString lines, but the foldl' call itself overflowed the stack.
That's probably a different matter, foldl' evaluates the accumulation parameter only to weak head normal form, if it's not of simple type, it can still contain thunks that will overflow the stack when demanded.
I might have fixed my original stack overflow problem. I was applying sum to a large list of integers, and sum is lazy.
For [Int] and [Integer], sum is specialised, so when compiled with optimisations, ghc should use a strict version for those types.
I don't think I have any real code anymore that overflows the stack, but I'm uncomfortable because I don't know why my test case doesn't work.
Is the foldl' call in my test case allowed to have linear stack usage?
I don't think the language definition treats that, so technically it's allowed then. But it shouldn't happen.
Thanks, -Ryan

Shouldn't this be reported as a bug?
On 11/09/2010 12:22 AM, "Daniel Fischer"
Hi,
I see a stack overflow with thi... Neither do I, really, but ghc is being too clever for its own good - or not clever enough.
I looked at the Core output with ghc-core, with or without optimizations, and I see a $wlgo r...
Without optimisations, I see a nice tail-recursive lgo inside foldl'2, pretty much what one would like to see. With optimisations, you get a specialised worker $wlgo: Rec { Main.$wlgo :: [GHC.Types.Int] -> (##) GblId [Arity 1 NoCafRefs Str: DmdType S] Main.$wlgo = \ (w_sms :: [GHC.Types.Int]) -> case case w_sms of _ { [] -> GHC.Unit.(); : x_adq xs_adr -> case x_adq of _ { GHC.Types.I# _ -> case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() } } } of _ { () -> GHC.Prim.(##) } end Rec } and it's here that ghc shows the wrong amount of cleverness. What have we? At the types () and [Int], with f = flip seq, the step in lgo unfolds to lgo z (x:xs) ~> let z' = f z x in lgo z' xs ~> case f z x of z'@() -> lgo z' xs ~> case (case x of { I# _ -> z }) of z'@() -> lgo z' xs Now flip seq returns its first argument unless its second argument is _|_ and () has only one non-bottom value, so the ()-argument of lgo can be eliminated (here, the initial ()-argument is known to be (), in general the wrapper checks for _|_ before entering the worker), giving wlgo :: [Int] -> () wlgo [] = () wlgo (x:xs) = case (case x of { I# _ -> () }) of () -> wlgo xs It would be nice if the compiler rewrote the last equation to wlgo (x:xs) -> case x of { I# _ -> wlgo xs } , but apparently it can't. Still, that's pretty good code. Now comes the misplaced cleverness. Working with unboxed types is better (faster) than working with boxed types, so let's use unboxed types, giving $wlgo the type [Int] -> (##) (unboxed 0-tuple). But it wasn't clever enough to directly return (# #) in the case of an empty list - that would've allowed the core to be \ (w_sms :: [GHC.Types.Int]) -> case w_sms of _ { [] -> GHC.Prim.(##) : x_adq xs_adr -> case x_adq of _ { GHC.Types.I# _ -> Main.$wlgo xs_adr } } and all would've been fine and dandy. So it chose [] -> GHC.Unit.() and that forced the second branch to also have that type, hence you can't have a tail call there but have to box the result of the recursive call (only to unbox it again). So you get superfluous boxing, unboxing, reboxing in addition to a stack- eating recursion. But you've hit a rare case here, if you use foldl'2 (flip seq) at a type with more than one non-bottom value, the first argument of lgo is not eliminated and you don't get the boxing-unboxing dance, instead you get a nice tail-recursion, even if foldl'2 is used only once and not exported. Why GHC does that for (), beats me.
I don't see any let expressions in the folding code, so I assume no thunks are being created. ... Perhaps http://www.haskell.org/pipermail/beginners/2010-August/005016.html
?
2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use the other version:
f...
But that needs to pass also the function, which is generally slower than having it as a static argument. 3. {-# NOINLINE foldl'2 #-} But that's not so good for performance in general. Option 1 gives the best Core, but it changes behaviour, with that, foldl' (\_ _ -> 1) undefined [0] = 1 foldl'2 (\_ _ -> 1) undefined [0] = _|_ To retain the behaviour of foldl', foldl'2 f z0 xs0 = case xs0 of [] -> z0 (x:xs) -> lgo (f z0 x) xs where lgo !z [] = z lgo z (y:ys) = lgo (f z y) ys
My test case is contrived. Originally, I had a program that read lines from a file as Data.B...
That's probably a different matter, foldl' evaluates the accumulation parameter only to weak head normal form, if it's not of simple type, it can still contain thunks that will overflow the stack when demanded.
I might have fixed my original stack overflow problem. I was applying sum to a large list of...
For [Int] and [Integer], sum is specialised, so when compiled with optimisations, ghc should use a strict version for those types.
I don't think I have any real code anymore that overflows the stack, but I'm uncomfortable be... I don't think the language definition treats that, so technically it's allowed then. But it shouldn't happen.
Thanks, -Ryan
_______________________________________________ Beginners mailing list Beginne...

On Friday 10 September 2010 17:52:59, Mathew de Detrich wrote:
Shouldn't this be reported as a bug?
I'm not sure, but: http://hackage.haskell.org/trac/ghc/ticket/4301

Daniel,
Thanks for your help.
On Fri, 10 Sep 2010 16:20:40 +0200
Daniel Fischer
On Friday 10 September 2010 11:29:56, Ryan Prichard wrote:
Hi,
I see a stack overflow with this code, but I don't understand why.
Neither do I, really, but ghc is being too clever for its own good - or not clever enough.
I looked at the Core output with ghc-core, with or without optimizations, and I see a $wlgo recursive function that doesn't appear to end in a tail call.
Without optimisations, I see a nice tail-recursive lgo inside foldl'2, pretty much what one would like to see.
When I ran ghc-core, I used the command-line "ghc-core -- Test.hs". I thought it would compile Test.hs without optimizations, but ghc-core actually has a default string of ghc arguments that includes -O2. If I run "ghc-core -- -O0 Test.hs", I also see a nice tail-recursive lgo.
With optimisations, you get a specialised worker $wlgo:
Rec { Main.$wlgo :: [GHC.Types.Int] -> (##) GblId [Arity 1 NoCafRefs Str: DmdType S] Main.$wlgo = \ (w_sms :: [GHC.Types.Int]) -> case case w_sms of _ { [] -> GHC.Unit.(); : x_adq xs_adr -> case x_adq of _ { GHC.Types.I# _ -> case Main.$wlgo xs_adr of _ { (# #) -> GHC.Unit.() } } } of _ { () -> GHC.Prim.(##) } end Rec }
and it's here that ghc shows the wrong amount of cleverness.
What have we? At the types () and [Int], with f = flip seq, the step in lgo unfolds to
lgo z (x:xs) ~> let z' = f z x in lgo z' xs ~> case f z x of z'@() -> lgo z' xs ~> case (case x of { I# _ -> z }) of z'@() -> lgo z' xs
Now flip seq returns its first argument unless its second argument is _|_ and () has only one non-bottom value, so the ()-argument of lgo can be eliminated (here, the initial ()-argument is known to be (), in general the wrapper checks for _|_ before entering the worker), giving
wlgo :: [Int] -> () wlgo [] = () wlgo (x:xs) = case (case x of { I# _ -> () }) of () -> wlgo xs
It would be nice if the compiler rewrote the last equation to
wlgo (x:xs) -> case x of { I# _ -> wlgo xs }
, but apparently it can't. Still, that's pretty good code. Now comes the misplaced cleverness. Working with unboxed types is better (faster) than working with boxed types, so let's use unboxed types, giving $wlgo the type
[Int] -> (##)
(unboxed 0-tuple). But it wasn't clever enough to directly return (# #) in the case of an empty list - that would've allowed the core to be
\ (w_sms :: [GHC.Types.Int]) -> case w_sms of _ { [] -> GHC.Prim.(##) : x_adq xs_adr -> case x_adq of _ { GHC.Types.I# _ -> Main.$wlgo xs_adr } }
and all would've been fine and dandy. So it chose [] -> GHC.Unit.() and that forced the second branch to also have that type, hence you can't have a tail call there but have to box the result of the recursive call (only to unbox it again). So you get superfluous boxing, unboxing, reboxing in addition to a stack- eating recursion.
But you've hit a rare case here, if you use foldl'2 (flip seq) at a type with more than one non-bottom value, the first argument of lgo is not eliminated and you don't get the boxing-unboxing dance, instead you get a nice tail-recursion, even if foldl'2 is used only once and not exported.
Why GHC does that for (), beats me.
I don't see any let expressions in the folding code, so I assume no thunks are being created. I can make a tail call appear by doing either of two things:
1. Replace "lgo z []" with "lgo !z []". This suggestion came from an email on haskell-beginners that I can't find right now. It was a few months ago.
Perhaps http://www.haskell.org/pipermail/beginners/2010-August/005016.html ?
Yes, that was the email. It was more recent than I thought.
2. Instead of using the __GLASGOW_HASKELL__ version of foldl', use the other version:
foldl' f a [] = a foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs
But that needs to pass also the function, which is generally slower than having it as a static argument.
3. {-# NOINLINE foldl'2 #-}
But that's not so good for performance in general.
Option 1 gives the best Core, but it changes behaviour, with that,
foldl' (\_ _ -> 1) undefined [0] = 1 foldl'2 (\_ _ -> 1) undefined [0] = _|_
To retain the behaviour of foldl',
foldl'2 f z0 xs0 = case xs0 of [] -> z0 (x:xs) -> lgo (f z0 x) xs where lgo !z [] = z lgo z (y:ys) = lgo (f z y) ys
I think I see why foldl'2 and foldl' have the same behavior.
My test case is contrived. Originally, I had a program that read lines from a file as Data.ByteString values, and I was trying to debug a stack overflow. I added the foldl' call to force the evaluation of the ByteString lines, but the foldl' call itself overflowed the stack.
That's probably a different matter, foldl' evaluates the accumulation parameter only to weak head normal form, if it's not of simple type, it can still contain thunks that will overflow the stack when demanded.
I'm using Data.ByteString.ByteString: Prelude> :info Data.ByteString.ByteString data Data.ByteString.Internal.ByteString = Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr GHC.Word.Word8) !Int !Int If I evaluate a value of this type into WHNF, I think the fields are also evaluated into WHNF due to the strictness annotations. That should remove thunks from the Int arguments. I wonder if the ForeignPtr field could still have a thunk. I suppose I really wanted an NFData instance for ByteString so I could use rnf. I looked at the NFData instance for a list. It's just: instance NFData a => NFData [a] where rnf [] = () rnf (x:xs) = rnf x `seq` rnf xs If I just want to evaluate each list element to WHNF, I think I could write: forceList [] = () forceList (x:xs) = x `seq` forceList xs This function is simpler than foldl'2, and it turns into a tail-recursive function in Core. Maybe I could also use "forceList = foldr seq ()".
I might have fixed my original stack overflow problem. I was applying sum to a large list of integers, and sum is lazy.
For [Int] and [Integer], sum is specialised, so when compiled with optimisations, ghc should use a strict version for those types.
I retested my original program that used sum. It overflows the stack with -O0, but it doesn't overflow with -O or "-O -fno-strictness". With -v4 -ddump-simpl-stats -O, I see 1 RuleFired 1 SPEC Data.List.sum I don't see this output with -O0 instead of -O. -Ryan

On Saturday 11 September 2010 23:30:50, Ryan Prichard wrote:
That's probably a different matter, foldl' evaluates the accumulation parameter only to weak head normal form, if it's not of simple type, it can still contain thunks that will overflow the stack when demanded.
I'm using Data.ByteString.ByteString:
Prelude> :info Data.ByteString.ByteString data Data.ByteString.Internal.ByteString = Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr GHC.Word.Word8) !Int !Int
If I evaluate a value of this type into WHNF, I think the fields are also evaluated into WHNF due to the strictness annotations. That should remove thunks from the Int arguments. I wonder if the ForeignPtr field could still have a thunk.
No, the pointer is just a memory address, no thunk there, a (strict) ByteString in WHNF is fully evaluated. The question is what you do with your ByteStrings in the fold.
I suppose I really wanted an NFData instance for ByteString so I could use rnf.
In general, you don't need an NFData instance for ByteStrings, since WHNF = NF for (strict) ByteStrings and reducing lazy ByteStrings to normal form would sort of defeat their purpose. If you need an NFData instance because you want to rnf e.g. a Map ByteString stuff, the default implementation of rnf is enough.
I looked at the NFData instance for a list. It's just:
instance NFData a => NFData [a] where rnf [] = () rnf (x:xs) = rnf x `seq` rnf xs
If I just want to evaluate each list element to WHNF, I think I could write:
forceList [] = () forceList (x:xs) = x `seq` forceList xs
Yes.
This function is simpler than foldl'2, and it turns into a tail-recursive function in Core. Maybe I could also use "forceList = foldr seq ()".
Exactly the same. But of course you need to evaluate forceList list to WHNF to have it do any work, case forceList list of () -> foo let !() = forceList list in bar or something.
I might have fixed my original stack overflow problem. I was applying sum to a large list of integers, and sum is lazy.
For [Int] and [Integer], sum is specialised, so when compiled with optimisations, ghc should use a strict version for those types.
I retested my original program that used sum. It overflows the stack with -O0, but it doesn't overflow with -O or "-O -fno-strictness". With -v4 -ddump-simpl-stats -O, I see
1 RuleFired 1 SPEC Data.List.sum
I don't see this output with -O0 instead of -O.
Specialisations are only used when compiled with optimisations since the specialised versions have different names and are inserted via rewrite rules. And rewrite rules are ignored with -O0. So without optimisations, you get the vanilla sum with lazy accumulator. To avoid stack overflows without optimisations, use foldl' (+) 0 instead of sum.
-Ryan
participants (3)
-
Daniel Fischer
-
Mathew de Detrich
-
Ryan Prichard