Why does this blow the stack?

Given this function: dropTest n = head . drop n $ [1..] I get a stack overflow when n is greater than ~ 550,000 . Is that inevitable behavior for large n? Is there a better way to do it? Justin

On Fri, 2007-12-21 at 09:13 -0800, Justin Bailey wrote:
Given this function:
dropTest n = head . drop n $ [1..]
I get a stack overflow when n is greater than ~ 550,000 . Is that inevitable behavior for large n? Is there a better way to do it?
A similar example is discussed on http://www.haskell.org/haskellwiki/Stack_overflow at the bottom.

On Fri, 21 Dec 2007 12:13:04 -0500, Justin Bailey
Given this function:
dropTest n = head . drop n $ [1..]
I get a stack overflow when n is greater than ~ 550,000 . Is that inevitable behavior for large n? Is there a better way to do it?
Justin
I'm curious as well. My first thought was to try the (!!) operator. Typing Prelude> [1..] !! 550000 overflows the stack on my computer, as does dropTest 550000. The code for (!!) is apparently as follows: xs !! n | n < 0 = error "Prelude.!!: negative index" [] !! _ = error "Prelude.!!: index too large" (x:_) !! 0 = x (_:xs) !! n = xs !! (n-1) Isn't this tail recursive? What is eating up the stack? Brad Larsen

On Dec 21, 2007 9:48 AM, Brad Larsen
I'm curious as well. My first thought was to try the (!!) operator. Typing
Prelude> [1..] !! 550000
overflows the stack on my computer, as does dropTest 550000.
I think its [1..] which is building up the unevaluated thunk. Using this definition of dropTest does not blow the stack: dropTest n = head . drop n $ repeat 1 Justin

On Dec 21, 2007 9:51 AM, Justin Bailey
I think its [1..] which is building up the unevaluated thunk. Using this definition of dropTest does not blow the stack:
It also works if you do [(1::Int) ..] !! n, but not with [(1::Integer) ..] !! n Sounds like GHC is being smart about strictness for Ints, but doesn't know that Integer is equally strict. If that's right, it's a bug in GHC.

On Fri, 2007-12-21 at 09:56 -0800, David Benbennick wrote:
On Dec 21, 2007 9:51 AM, Justin Bailey
wrote: I think its [1..] which is building up the unevaluated thunk. Using this definition of dropTest does not blow the stack:
It also works if you do [(1::Int) ..] !! n, but not with [(1::Integer) ..] !! n
Sounds like GHC is being smart about strictness for Ints, but doesn't know that Integer is equally strict. If that's right, it's a bug in GHC.
It is a bug in GHC. From http://darcs.haskell.org/packages/base/GHC/Enum.lhs enumFrom (I# x) = eftInt x maxInt# where I# maxInt# = maxInt -- Blarg: technically I guess enumFrom isn't strict! ... eftInt x y | x ># y = [] | otherwise = go x where go x = I# x : if x ==# y then [] else go (x +# 1#)

derek.a.elkins:
On Fri, 2007-12-21 at 09:56 -0800, David Benbennick wrote:
On Dec 21, 2007 9:51 AM, Justin Bailey
wrote: I think its [1..] which is building up the unevaluated thunk. Using this definition of dropTest does not blow the stack:
It also works if you do [(1::Int) ..] !! n, but not with [(1::Integer) ..] !! n
Sounds like GHC is being smart about strictness for Ints, but doesn't know that Integer is equally strict. If that's right, it's a bug in GHC.
It is a bug in GHC. From http://darcs.haskell.org/packages/base/GHC/Enum.lhs
enumFrom (I# x) = eftInt x maxInt# where I# maxInt# = maxInt -- Blarg: technically I guess enumFrom isn't strict!
...
eftInt x y | x ># y = [] | otherwise = go x where go x = I# x : if x ==# y then [] else go (x +# 1#)
As usual, this is an issue with the irregular numeric-operation strictness applied to Integer. Consider this Integer enumFrom: main = print x where x = head (drop 1000000 (enumFrom' 1)) ------------------------------------------------------------------------ enumFrom' :: Integer -> [Integer] enumFrom' x = enumDeltaInteger x 1 enumDeltaInteger :: Integer -> Integer -> [Integer] enumDeltaInteger x d = x `seq` x : enumDeltaInteger (x+d) d Is fine. The Int instance is already strict like this. I'll file a patch. I hate these slightly too lazy issues with Integer, that aren't present with Int. The atomic strictness of Integer is only irregularly applied through the base libraries. For example, in Data.List, this was considered acceptable: maximum :: (Ord a) => [a] -> a maximum [] = errorEmptyList "maximum" maximum xs = foldl1 max xs {-# RULES "maximumInt" maximum = (strictMaximum :: [Int] -> Int); "maximumInteger" maximum = (strictMaximum :: [Integer] -> Integer) #-} Really, if we let Int be strict on (+) and (*)-style operations, and Integer sometimes strict on those, we should just declare that these atomic numeric types (and Word, Word8,..) should be strict on (+) and so on. -- Don

Am Freitag, 21. Dezember 2007 schrieb Justin Bailey:
Given this function:
dropTest n = head . drop n $ [1..]
I get a stack overflow when n is greater than ~ 550,000 . Is that inevitable behavior for large n? Is there a better way to do it?
Justin
[1..] equals [1, (1)+1, (1+1)+1, (1+1+1)+1, (1+1+1+1)+1, ...] where the brackets are a reference to the previous entry in the list. so, if you want to reach the nth element in [1..], then the garbage collector automagically collects the unused list entries... but there are no unused. [1..]!!10 = ((((((((((1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1 now, that one long formula needs to be completely build in the stack prior to evaluation. to prevent this, each value has to be evaluated before stepping deeper into the list. i.e. like with this bang pattern: let ((!x):xs)!!!i = case i of {0->x;_->xs!!!pred i} in [1..]!!!10 or simply like this trick: [1..maxBound]!!10 this causes every single entry to be checked against the list-end-condition, just before the calculation of the next list entry. - marc

coeus:
Am Freitag, 21. Dezember 2007 schrieb Justin Bailey:
Given this function:
dropTest n = head . drop n $ [1..]
I get a stack overflow when n is greater than ~ 550,000 . Is that inevitable behavior for large n? Is there a better way to do it?
Justin
[1..] equals [1, (1)+1, (1+1)+1, (1+1+1)+1, (1+1+1+1)+1, ...] where the brackets are a reference to the previous entry in the list. so, if you want to reach the nth element in [1..], then the garbage collector automagically collects the unused list entries... but there are no unused.
[1..]!!10 = ((((((((((1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1 now, that one long formula needs to be completely build in the stack prior to evaluation. to prevent this, each value has to be evaluated before stepping deeper into the list. i.e. like with this bang pattern:
let ((!x):xs)!!!i = case i of {0->x;_->xs!!!pred i} in [1..]!!!10
or simply like this trick: [1..maxBound]!!10 this causes every single entry to be checked against the list-end-condition, just before the calculation of the next list entry.
There's no good reason for the accumulator for Integer to be lazy, especially when you see that adding an upper bound (enumFromTo) or using Int uses a strict accumulator. I've filled a bug report and fix for this. http://hackage.haskell.org/trac/ghc/ticket/1997 there's an ad hoc sprinkling of strict and lazy Num ops for Integer through the base library, while Int is fairly consistently strict. -- Don

On Dec 21, 2007 12:02 PM, Don Stewart
There's no good reason for the accumulator for Integer to be lazy, especially when you see that adding an upper bound (enumFromTo) or using Int uses a strict accumulator.
I've filled a bug report and fix for this.
http://hackage.haskell.org/trac/ghc/ticket/1997
there's an ad hoc sprinkling of strict and lazy Num ops for Integer through the base library, while Int is fairly consistently strict.
Thanks for fixing this. But doesn't GHC have strictness analysis? Even if there was consistent strictness for Integer in the base library, that wouldn't help for code not in the base library. In other words, I want to be able to define mysum :: (Num a) => [a] -> a mysum = foldl (+) 0 in my own programs, and have GHC automatically make it strict if "a" happens to be Int or Integer. Is there any chance of GHC getting that ability?

dbenbenn:
On Dec 21, 2007 12:02 PM, Don Stewart
wrote: There's no good reason for the accumulator for Integer to be lazy, especially when you see that adding an upper bound (enumFromTo) or using Int uses a strict accumulator.
I've filled a bug report and fix for this.
http://hackage.haskell.org/trac/ghc/ticket/1997
there's an ad hoc sprinkling of strict and lazy Num ops for Integer through the base library, while Int is fairly consistently strict.
Thanks for fixing this. But doesn't GHC have strictness analysis?
Sure does!
Even if there was consistent strictness for Integer in the base library, that wouldn't help for code not in the base library. In other words, I want to be able to define
mysum :: (Num a) => [a] -> a mysum = foldl (+) 0
in my own programs, and have GHC automatically make it strict if "a" happens to be Int or Integer. Is there any chance of GHC getting that ability?
Thankfully, GHC does that already :) mysum :: (Num a) => [a] -> a mysum = foldl (+) 0 main = print (mysum [1..10000000]) In ghc 6.6, $ time ./A +RTS -M20M Heap exhausted; Current maximum heap size is 19996672 bytes (19 Mb); use `+RTS -M<size>' to increase it. and in GHC 6.8, ghc can see through to the strictness of (+) $ time ./A +RTS -M20M 50000005000000 ./A +RTS -M20M 0.95s user 0.00s system 99% cpu 0.959 total The problem here was an explicit recusive loop though, with just not enough for the strictness analyser to get going. -- Don

On Dec 21, 2007 2:30 PM, Don Stewart
dbenbenn:
Thanks for fixing this. But doesn't GHC have strictness analysis?
Sure does!
The problem here was an explicit recusive loop though, with just not enough for the strictness analyser to get going.
The explicit loop you're talking about is: enumDeltaInteger :: Integer -> Integer -> [Integer] enumDeltaInteger x d = x : enumDeltaInteger (x+d) d That code isn't very complicated, and I would hope to be able to write code like that in my own programs without having to worry about strictness. Given that the compiler even has an explicit signature, why can't it transform that code to enumDeltaInteger x d = let s = x + d in x : (seq s $ enumDeltaInteger s d) since it knows that (Integer+Integer) is strict? Of course, improving the strictness analysis is harder, but it pays off more, too.

On Fri, Dec 21, 2007 at 03:16:17PM -0800, David Benbennick wrote:
On Dec 21, 2007 2:30 PM, Don Stewart
wrote: dbenbenn:
Thanks for fixing this. But doesn't GHC have strictness analysis?
Sure does!
The problem here was an explicit recusive loop though, with just not enough for the strictness analyser to get going.
The explicit loop you're talking about is: enumDeltaInteger :: Integer -> Integer -> [Integer] enumDeltaInteger x d = x : enumDeltaInteger (x+d) d That code isn't very complicated, and I would hope to be able to write code like that in my own programs without having to worry about strictness. Given that the compiler even has an explicit signature, why can't it transform that code to enumDeltaInteger x d = let s = x + d in x : (seq s $ enumDeltaInteger s d) since it knows that (Integer+Integer) is strict? Of course, improving the strictness analysis is harder, but it pays off more, too.
Because they simply aren't the same. Try applying your functions to undefined undefined. Stefan

On Dec 22, 2007 12:06 AM, Stefan O'Rear
enumDeltaInteger :: Integer -> Integer -> [Integer] enumDeltaInteger x d = x : enumDeltaInteger (x+d) d That code isn't very complicated, and I would hope to be able to write code like that in my own programs without having to worry about strictness. Given that the compiler even has an explicit signature, why can't it transform that code to enumDeltaInteger x d = let s = x + d in x : (seq s $ enumDeltaInteger s d) since it knows that (Integer+Integer) is strict? Of course, improving the strictness analysis is harder, but it pays off more, too.
Because they simply aren't the same.
Try applying your functions to undefined undefined.
This took a little work for me to see. Here it is for the interested: Prelude> let edi :: Integer -> Integer -> [Integer]; edi x d = x : edi (x+d) d Prelude> let edi' :: Integer -> Integer -> [Integer]; edi' x d = let s = x + d in x : (seq s $ edi s d) Prelude> _:_:_ <- return $ edi undefined undefined Prelude> _:_:_ <- return $ edi' undefined undefined *** Exception: Prelude.undefined Luke

On 12/21/07, Stefan O'Rear
Because they simply aren't the same.
Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program: Prelude> length $ take 10 ([undefined ..] :: [Integer]) 10 With his patch, that will throw. Some other types have the same issue (lazy Enum when it could be strict, leading to stack overflow): Prelude> length $ take 10 ([undefined ..] :: [Double]) 10 Prelude> length $ take 10 ([undefined ..] :: [Float]) 10 Prelude Foreign.C.Types> length $ take 10 ([undefined ..] :: [CFloat]) 10 Prelude Foreign.C.Types> length $ take 10 ([undefined ..] :: [CDouble]) 10 Prelude Foreign.C.Types> length $ take 10 ([undefined ..] :: [CLDouble]) 10 Prelude Data.Ratio> length $ take 10 ([undefined ..] :: [Ratio Int]) 10

On 12/22/07, David Benbennick
On 12/21/07, Stefan O'Rear
wrote: Because they simply aren't the same.
Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program:
In fact, it's possible to get strictness to avoid stack overflow, and still have laziness to allow undefined: myEnumFrom :: Integer -> [Integer] myEnumFrom a = map (a+) $ enumDeltaIntegerStrict 0 1 where enumDeltaIntegerStrict x d = x `seq` x : enumDeltaIntegerStrict (x+d) d then *Main> (myEnumFrom 42) !! (10^6) 1000042 *Main> length $ take 10 $ myEnumFrom undefined 10

dbenbenn:
On 12/21/07, Stefan O'Rear
wrote: Because they simply aren't the same.
Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program:
Prelude> length $ take 10 ([undefined ..] :: [Integer]) 10
That's right. It makes Integer behave like the Int instance. -- Don

dons:
dbenbenn:
On 12/21/07, Stefan O'Rear
wrote: Because they simply aren't the same.
Good point; thanks. That means that Don's patch could theoretically break some existing Haskell program:
Prelude> length $ take 10 ([undefined ..] :: [Integer]) 10
That's right. It makes Integer behave like the Int instance.
And we see again that strictness properties are very ill-defined, here, the Enum types in base: Prelude> length $ take 10 ([undefined ..] :: [Int]) *** Exception: Prelude.undefined Prelude> length $ take 10 ([undefined ..] :: [()]) *** Exception: Prelude.undefined Prelude> length $ take 10 ([undefined ..] :: [Ordering]) *** Exception: Prelude.undefined Prelude> length $ take 10 ([undefined ..] :: [Bool]) *** Exception: Prelude.undefined But, Prelude> length $ take 10 ([undefined ..] :: [Float]) 10 Prelude> length $ take 10 ([undefined ..] :: [Double]) 10 And, Prelude> length $ take 10 ([undefined ..] :: [Integer]) 10 Now, Prelude> length $ take 10 ([undefined ..] :: [Integer]) *** Exception: Prelude.undefined So we see that Float and Double also have this problem, Prelude> head (drop 10000000 [1 .. ]) :: Float *** Exception: stack overflow Prelude> head (drop 10000000 [1 .. ]) :: Double *** Exception: stack overflow People shouldn't be writing code that depends on this! -- Don

Since it's possible to support laziness for Integer (while still
avoiding any stack overflow), I think it makes sense to do so. What
if you have some big complicated program like the following:
x = some big slow computation
y = [x..]
lots of code
z = length $ take 10 y
Why bother computing x if you don't have to? And as an added benefit,
if you keep laziness, you don't have to worry about possibly breaking
any programs that depend on the current lazy behavior.
Laziness would NOT make sense for Int, since Int is bounded. You
can't tell how long [(x::Int) ..] is without evaluating x.
On 12/22/07, Don Stewart
People shouldn't be writing code that depends on this!
One of the main features of Haskell is that it's lazy. I don't think it's wrong to write code that depends on laziness.

dbenbenn:
Since it's possible to support laziness for Integer (while still avoiding any stack overflow), I think it makes sense to do so. What if you have some big complicated program like the following:
x = some big slow computation y = [x..]
lots of code
z = length $ take 10 y
Why bother computing x if you don't have to? And as an added benefit, if you keep laziness, you don't have to worry about possibly breaking any programs that depend on the current lazy behavior.
Laziness would NOT make sense for Int, since Int is bounded. You can't tell how long [(x::Int) ..] is without evaluating x.
On 12/22/07, Don Stewart
wrote: People shouldn't be writing code that depends on this!
One of the main features of Haskell is that it's lazy. I don't think it's wrong to write code that depends on laziness.
Depending on laziness if fine, but depending on undefined strictness semantics for particular types is more fragile. Whether Int or Bool or whatever type has a strict or lazy accumulator in enumFromTo is entirely unspecified -- you can't look up the report to check. -- Don

On 12/26/07, Don Stewart
Depending on laziness if fine, but depending on undefined strictness semantics for particular types is more fragile. Whether Int or Bool or whatever type has a strict or lazy accumulator in enumFromTo is entirely unspecified -- you can't look up the report to check.
Right, I understand that. But is there any reason not to keep the Integer version of [a..] lazy, given that we can fix the stack overflow problem without introducing strictness?

The (extremely enlightening) discussion so far has focused on the
inconsistent (arguably buggy) behavior of [a,b..c] enumeration sugar.
I think it's worth pointing out that the code could also be made to
run by making the drop function strict. I got to thinking, in a
"strictness" debugging scenario like this, it seems like you can get
the behavior you want by making things strict at various levels. You
can make the inner level (enumeration sugar) stricter, or you can
leave the enumeration stuff lazy and make the drop stricter. Maybe I
didn't express that very well, so here's code:
(I would argue that this discussion could be the basis of a
"strictness kata" exercise, following up from a cafe post I made a
while ago.)
{-# LANGUAGE BangPatterns #-}
import Data.List
import Testing
import Test.QuickCheck
import Test.QuickCheck.Batch
tDrop = ( head . drop n ) ( edi1 1 2 )
tStrictDrop = ( head . strictDrop n ) ( edi1 1 2 )
n = 10^6
--edi: enum delta integer
-- stack overflow on 10^6 el list if use drop
-- ok with strictDrop
edi1 start next = start : edi1 next (next + delta)
where delta = next - start
-- ok (no overflow) for drop, of course also okay for strictDrop
edi2 !start next = start : edi2 next (next + delta)
where delta = next - start
ediWithMax start next bound = takeWhile p $ edi start next
where p i | delta > 0 = i <= bound
| delta < 0 = i >= bound
| otherwise = i <= bound
delta = next - start
edi = edi1
strictDrop _ [] = []
strictDrop n l | n <= 0 = l
strictDrop n (!x:xs) | n>0 = strictDrop (n-1) xs
pStrictDropSameAsDrop n xs = drop n xs == strictDrop n xs
where types = (n::Int,xs::[Integer])
pEdi1 start next max =
abs(next-start) > 0 ==> -- otherwise hits bottom because of eg [0,0..0]
ediWithMax start next max == [start,next..max]
where types = (start :: Integer, next :: Integer, max :: Integer)
pEdi2 start next max = ( take 1000 $ ediWithMax start next max ) == (
take 1000 $ [start,next..max] )
where types = (start :: Integer, next :: Integer, max :: Integer)
t2 = runTests "edi" testOptions
[run pEdi1,
run pEdi2,
run pStrictDropSameAsDrop]
where testOptions = TestOptions
{ no_of_tests = 100 -- number of tests to run
, length_of_tests = 1 -- 1 second max per check
-- where a check == n tests
, debug_tests = False -- True => debugging info
}
2007/12/21, Justin Bailey
Given this function:
dropTest n = head . drop n $ [1..]
I get a stack overflow when n is greater than ~ 550,000 . Is that inevitable behavior for large n? Is there a better way to do it?
Justin _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (10)
-
Albert Y. C. Lai
-
Brad Larsen
-
David Benbennick
-
Derek Elkins
-
Don Stewart
-
Justin Bailey
-
Luke Palmer
-
Marc A. Ziegert
-
Stefan O'Rear
-
Thomas Hartman