Re: [Haskell] simple function: stack overflow in hugs vs none in ghc

(redirecting to haskell-cafe) Tom Pledger wrote:
| > sqnc p ts = let ( r, ts' ) = p ts in case r of | > Nothing -> ([],ts') | > Just x -> let (r',ts'') = (sqnc p ts') in (x:r', ts'' ) :
I don't know how ghc is avoiding the stack overflow, but it looks like it's caused by strict pattern matching. The first call to sqnc wants reassurance that the second call is returning a pair (as opposed to _|_) before deciding that it's OK to return a pair.
Try putting a tilde in the recursive call, to make the pattern lazy.
let ~(r',ts'') = ...
"let" already is lazy -- putting a tilde there has no effect. In fact that tuple is produced from (x:r', ts'' ) even if (sqnc p ts') is _|_. (The "let" at the beginning of the function can be discovered to be strict by strictness analysis (which GHC does), because "r" is analyzed by the "case", and the pair needs to be deconstructed to find the value of "r".) Isaac

john lask wrote:
test1 = readFile "big.dat" >>= (\x->print $ parse x) test2 = readFile "big.dat" >>= (\x->print $ fst $ parse x)
test1 (on a large file) will succeed in ghc but fail in hugs test2 on same file will succeed in both ghc and hugs
big.dat is just some large data file say 1MB. (not particularly large by todays standards!)
The question: is there any changes that can be made to the code to make test1 work in hugs without changing the essence of the function?
parse x = sqnc item x where
item =( \ ts -> case ts of [] -> ( Nothing, []) ts -> ( Just (head ts), tail ts) )
sqnc p ts = let ( r, ts' ) = p ts in case r of Nothing -> ([],ts') Just x -> let (r',ts'') = (sqnc p ts') in ( x:r', ts'' )
Strange, this shouldn't happen :) You may want to try item [] = (Nothing, []) item (t:ts) = (Just t , ts) but that shouldn't help ;) Let's try to find out what's going on by doing graph reduction with our bare hands. The preliminary material on http://en.wikibooks.org/wiki/Haskell/Graph_reduction should help a bit. Ideally, there would be tool support (hat? other debugger?) but when things become too complicated, tools can only keep you a few minutes longer above the water before drowning in complexity, too. The main point is that (print $ parse x) and (print $ fst $ parse x) differ in that the latter only computes the answer but not the remaining tokens. So, the stack overflow is triggered when evaluating the remaining tokens, but I don't see why. What happens for (print $ snd $ parse x) ? Let's rewrite your code to figure out what's going on item [] = (Nothing, []) item ts = (Just (head ts), tail ts) For sqnc , we need to translate stuff like let (a,b) = e in . Let-bound patterns aren't explained in the wikibook and in fact they're tricky. When done wrong, there may be space leaks, see also J. Sparud. Fixing Some Space Leaks without a Garbage Collector. http://citeseer.ist.psu.edu/sparud93fixing.html I don't know whether its implemented in Hugs (probably not?) and GHC (probably, but maybe with bugs?). We'll use the not so good translatation let (a,b) = e in e' <=> let x = e; a = fst x; b = snd x; in e' I'd like to call sqnc differently, namely many . We get many p ts = let z = p ts r = fst z ts' = snd z in case r of Nothing -> ([], ts') Just x -> let z' = many p ts' r' = fst z' ts''= snd z' in (x:r', ts'') Intimidating, no? :) Now, let's evaluate an example expression, like many item (1:2:3:...) (the list is intended to be finite, but we'll decide later about its length). To preserve space and stay sane, we'll only focus on the things that get evaluated and write ... for the rest. Let's start: many item (1:2:3:...) => let ts = 1:2:3:... in let ... z = item ts; r = fst z; ... in case r of ... => let ... z = (Just (head ts), tail ts); r = fst z ... => let ... z = (r, tail ts); r = Just (head ts) ... in case r of The above step is not clear from the description in the wikibook, but it's a handy notation of saying that the first component and r point to the same thing. Expanding the case expression yields (in full form) => let ts = 1:2:3: ... in let z = (r, tail ts) r = Just x x = head ts ts'= snd z in let z' = many item ts' r' = fst z' ts''= snd z' in (x:r', ts'') This is the weak head normal form of our expression. Of course, we wanted print (many item ts) = putStrLn (show ...) which means evaluating the first component and then the second component in the pair to full normal form. So, the next redex to be reduced is x followed by r' which forces z' which at least forces ts' => ... => let ts = x:ts' x = 1 ts' = 2:3:... in let z = (r, ts') r = Just x in let z' = let ... in (..,..) r' = fst z' ts''= snd z' in (x:r',ts'') To stay sane, we garbage collect z and r and rename variables before expanding the expression for z' which is obtained in the same way we obtained it before let ts0 = x0 : ts1 x0 = 1 ts1 = 2:3:... z0 = let z = (r, tail ts1) r = Just x x = head ts1 ts'= snd z in let z' = many item ts' r' = fst z' ts''= snd z' in (x:r', ts'') r0 = fst z0 us0 = snd z0 in (x0:r0, us0) Collecting lets and renaming yields let ts0 = x0 : ts1 x0 = 1 ts1 = 2:3:... z = (r, tail ts1) r = Just x1 x1 = head ts1 ts' = snd z z1 = many item ts' r1 = fst z1 us1 = snd z1 z0 = (x1:r1, us1) r0 = fst z0 us0 = snd z0 in (x0:r0, us0) The insight is that the original naming was bad, r and z are quite different from r0 and z0. Reducing r0 and x1 yields => let ts0 = x0 : ts1 x0 = 1 ts1 = x1 : ts2 x1 = 2 ts2 = 3:... z = (r, tail ts1) r = Just x1 ts' = snd z z1 = many item ts' r1 = fst z1 us1 = snd z1 z0 = (r0, us1) r0 = x1:r1 us0 = snd z0 in (x0:r0, us0) The general scheme should be clear now: z,r and ts' are temporary variables and further reduction of r1, r2 and so on leads to a chain let x0 = 1; ts0 = x0 : ts1 x1 = 2; ts1 = x1 : ts2 x2 = 3; ts2 = x2 : ts3 ... x8 = .. z = (r, tail ts8) r = Just x8 ts' = snd z z8 = many item ts' r8 = fst z8 us8 = snd z8 z7 = (r7, us8) r7 = x8:r8 us7 = snd z7 ... z0 = (r0, us1) r0 = x1:r1 us0 = snd z0 in (x0:r0, us0) So, after forcing the first component of the overall result to normal form, the result looks like (1:2:3:..., snd (_,snd (_,snd (_,...))) ) and it seems that Hugs fails to evaluate the tail recursive chain of snd ?? In the end, here's our decisive result: either Hugs or my analysis has a bug :D Regards, apfelmus

I agree with your analysis. if the following is tried in hugs then ghc you will obtain two different results.. return (repeat 'a') >>= \ x -> print $ span (const True) x with hugs you will get a stack error, in ghc it executes in constant space, i.e. indefinitely. In essenece the above example does exactly the same as my ealier code. _________________________________________________________________ Advertisement: Need a Hand? Use Yellow.com.au http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fadsfac%2Enet%2Flink%2Easp%3Fcc%3DPAS075%2E5683%2E0%26clk%3D1%26creativeID%3D73753&_t=765559690&_r=Hotmail_email_tagline_sept07&_m=EXT

return (repeat 'a') >>= \ x -> print $ span (const True) x
with hugs you will get a stack error, in ghc it executes in constant space, i.e. indefinitely. In essenece the above example does exactly the same as my ealier code.
this thread might be relevant: http://www.haskell.org/pipermail/hugs-bugs/2007-June/001815.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001816.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001817.html claus

afraid not the given example is too strict, the requirement is to generate the matched portion lazilly, and return the tail (unconsumed portion). In principle the function should be capable of being written to run in constant space which the given example dose not.
From: "Claus Reinke"
To: "john lask" , , Subject: Re: [Haskell-cafe] RE: simple function: stack overflow in hugs vs nonein ghc Date: Mon, 24 Sep 2007 13:05:24 +0100 return (repeat 'a') >>= \ x -> print $ span (const True) x
with hugs you will get a stack error, in ghc it executes in constant space, i.e. indefinitely. In essenece the above example does exactly the same as my ealier code.
this thread might be relevant:
http://www.haskell.org/pipermail/hugs-bugs/2007-June/001815.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001816.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001817.html
claus
_________________________________________________________________ Advertisement: Love Footy? Get AFL, NRL, Rugby live scores and video now! http://ninemsn.com.au/share/redir/adTrack.asp?mode=click&clientID=819&referral=hotmailtagline&URL=http://sports.ninemsn.com.au

afraid not
the given example is too strict, the requirement is to generate the matched portion lazilly, and return the tail (unconsumed portion).
ah yes, without optimisations, Prelude.span builds up stack, while the continuation-based alternative i mentioned is too strict for some uses.
In principle the function should be capable of being written to run in constant space which the given example dose not.
return (repeat 'a') >>= \ x -> print $ span (const True) x
how about the old spec, then? span p l = (takeWhile p l,dropWhile p l) since takeWhile takes forever, here, it isn't even inefficient!-) claus
with hugs you will get a stack error, in ghc it executes in constant space, i.e. indefinitely. In essenece the above example does exactly the same as my ealier code.
this thread might be relevant:
http://www.haskell.org/pipermail/hugs-bugs/2007-June/001815.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001816.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001817.html

Claus Reinke wrote:
the given example is too strict, the requirement is to generate the matched portion lazilly, and return the tail (unconsumed portion).
ah yes, without optimisations, Prelude.span builds up stack,
I don't quite understand why it does so at all. span p [] = ([],[]) span p xs@(x:xs') | p x = let (ys,zs) = span p xs' in (x:ys,zs) | otherwise = ([],xs) I mean, the third line can return a tuple and the first element of the first list immediately. Where does the stack space come from? True enough, the second component will be a tower snd (_, snd (_, snd (_, ... but snd is tail recursive.
In principle the function should be capable of being written to run in constant space which the given example dose not.
Btw, even with the space leak prevention from the Sparud paper, it will be a linear chain of indirections. So, span doesn't run in constant space at all! The problem is that when lazily deconstructing the first component, the second component has to be "deconstructed", too (reduced partially) or it will leak space. I mean, fetching the x from (x:ys, id zs) should reduce the id , too. Of course, that should be up to the programmer's choice (partial reduction may be expensive), so is there a way to specify that in code? Regards, apfelmus

nope. it is true that in the case of an infinite list it exibits the "desired" behaviour but ... return (replicate 1000000 'a') >>= \x->print $ spant (const True) x ERROR - Garbage collection fails to reclaim sufficient space i.e. as the function unfold, the thunk representing the second term builds up on the heap. (not sure why it works for an infinite list, hugs must drop the reference to the tail ?) to obtain a function that will properly operate in constant space, for every unfolding of the first term we need to enforce evaluation of the second term.
From: "Claus Reinke"
To: "john lask" CC: Subject: Re: [Haskell-cafe] RE: simple function: stack overflow in hugs vsnonein ghc Date: Mon, 24 Sep 2007 16:20:42 +0100 afraid not
the given example is too strict, the requirement is to generate the matched portion lazilly, and return the tail (unconsumed portion).
ah yes, without optimisations, Prelude.span builds up stack, while the continuation-based alternative i mentioned is too strict for some uses.
In principle the function should be capable of being written to run in constant space which the given example dose not.
return (repeat 'a') >>= \ x -> print $ span (const True) x
how about the old spec, then?
span p l = (takeWhile p l,dropWhile p l)
since takeWhile takes forever, here, it isn't even inefficient!-)
claus
with hugs you will get a stack error, in ghc it executes in constant space, i.e. indefinitely. In essenece the above example does exactly the same as my ealier code.
this thread might be relevant:
http://www.haskell.org/pipermail/hugs-bugs/2007-June/001815.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001816.html http://www.haskell.org/pipermail/hugs-bugs/2007-June/001817.html
_________________________________________________________________ Get more out of your e-mail. Update to Windows Live Hotmail today! http://ninemsn.com.au/share/redir/adTrack.asp?mode=click&clientID=778&referral=HotmailTagline&URL=http://g.msn.com/8HMBEN/14811??PS=

return (replicate 1000000 'a') >>= \x->print $ spant (const True) x
ERROR - Garbage collection fails to reclaim sufficient space
i.e. as the function unfold, the thunk representing the second term builds up on the heap.
true. i've often wanted a copy pseudo-function that would avoid updating shared references, thus avoiding such leaks (at the cost of re-evaluation). if you have control of the list producer, you can make sure that it produces unshared copies of such lists: return producer >>= \x->print $ spant (const True) x where producer _ = replicate 1000000 'a' spant x = (takeWhile p $ x (),dropWhile p $ x () )
(not sure why it works for an infinite list, hugs must drop the reference to the tail ?)
curioser and curioser.. just as for apfelmus' question, |> ah yes, without optimisations, Prelude.span builds up stack, | I don't quite understand why it does so at all. i don't have an answer at hand, but would like one!-)
to obtain a function that will properly operate in constant space, for every unfolding of the first term we need to enforce evaluation of the second term.
if you don't mind ugly, unsafe, unrecommended code, here's a version implementing your description, purely to serve as a bad example;-) -- ugly, unsafe, unrecommended, unsymmetric -- (leak is avoided only left to right) span4 p l = unsafePerformIO $ do mv <- newMVar l return (take p l mv,drop mv) where take p xs@(x:xs') mv | p x = unsafePerformIO $ swapMVar mv xs' >> return (x:take p xs' mv) take p xs mv = unsafePerformIO $ swapMVar mv xs >> return [] drop mv = unsafePerformIO $ readMVar mv >>= return . dropWhile p -- for right to left evaluation (drop before take), we -- run into the problem that drop can't make a copy -- of l, so will force unfolding of the l shared with take return (replicate 1000000 'a') >>= \x-> print $ swap $ span (const True) x where swap (a,b) = (b,a) there have been threads in the past on this topic, ie, how to force two otherwise independent thunks with shared references to evolve in synch in order to avoid space leaks. claus

return (replicate 1000000 'a') >>= \x->print $ spant (const True) x
ERROR - Garbage collection fails to reclaim sufficient space
i.e. as the function unfold, the thunk representing the second term builds up on the heap. (not sure why it works for an infinite list, hugs must drop the reference to the tail ?)
actually, that works because the infinite list has a finite representation, being cyclic, so there's nothing to unfold:
return (repeat 'a') >>= \ x -> print $ span (const True) x
claus
participants (4)
-
apfelmus
-
Claus Reinke
-
Isaac Dupree
-
john lask