[Haskell-cafe] Re: [Haskell] simple function: stack overflow in hugs vs none in ghc

24 Sep 2007

      john lask wrote:
...
...
test1 = readFile "big.dat" >>= (\x->print $ parse x)
test2 = readFile "big.dat" >>= (\x->print $ fst $ parse x)
test1 (on a large file) will succeed in ghc but fail in hugs
test2 on same file will succeed in both ghc and hugs
big.dat is just some large data file say 1MB.
(not particularly large by todays standards!)
The question: is there any changes that can be made to the code to
make test1 work in hugs without changing the essence of the function?
...
parse x = sqnc item x
  where
...
item =( \ ts -> case ts of
                   [] -> ( Nothing, [])
                   ts -> ( Just (head ts), tail ts) )
sqnc p ts =
       let ( r, ts' ) = p ts in case r of
            Nothing -> ([],ts')
            Just x -> let (r',ts'') = (sqnc p ts')  in ( x:r', ts'' )
Strange, this shouldn't happen :) You may want to try

   item []     = (Nothing, [])
   item (t:ts) = (Just t , ts)

but that shouldn't help ;)

Let's try to find out what's going on by doing graph reduction with our 
bare hands. The preliminary material on
   http://en.wikibooks.org/wiki/Haskell/Graph_reduction
should help a bit. Ideally, there would be tool support (hat? other 
debugger?) but when things become too complicated, tools can only keep 
you a few minutes longer above the water before drowning in complexity, too.

The main point is that (print $ parse x) and (print $ fst $ parse x) 
differ in that the latter only computes the answer but not the remaining 
tokens. So, the stack overflow is triggered when evaluating the 
remaining tokens, but I don't see why. What happens for (print $ snd $ 
parse x) ?

Let's rewrite your code to figure out what's going on

   item [] = (Nothing, [])
   item ts = (Just (head ts), tail ts)

For  sqnc , we need to translate stuff like  let (a,b) = e in  . 
Let-bound patterns aren't explained in the wikibook and in fact they're 
tricky. When done wrong, there may be space leaks, see also

  J. Sparud. Fixing Some Space Leaks without a Garbage Collector.
  http://citeseer.ist.psu.edu/sparud93fixing.html

I don't know whether its implemented in Hugs (probably not?) and GHC 
(probably, but maybe with bugs?). We'll use the not so good translatation

   let (a,b) = e  in e'
<=>
   let x = e; a = fst x; b = snd x; in e'

I'd like to call  sqnc  differently, namely  many . We get

   many p ts =
     let z   = p ts
         r   = fst z
         ts' = snd z
     in case r of
       Nothing -> ([], ts')
       Just x  ->
         let z'  = many p ts'
             r'  = fst z'
             ts''= snd z'
         in (x:r', ts'')

Intimidating, no? :) Now, let's evaluate an example expression, like

  many item (1:2:3:...)

(the list is intended to be finite, but we'll decide later about its 
length). To preserve space and stay sane, we'll only focus on the things 
that get evaluated and write ... for the rest. Let's start:

  many item (1:2:3:...)
  => let ts = 1:2:3:... in
       let ... z = item ts; r = fst z; ... in  case r of ...
  => let ... z = (Just (head ts), tail ts); r = fst z ...
  => let ... z = (r, tail ts); r = Just (head ts) ... in  case r of

The above step is not clear from the description in the wikibook, but 
it's a handy notation of saying that the first component and r point to 
the same thing. Expanding the case expression yields (in full form)

  => let ts = 1:2:3: ... in
     let z  = (r, tail ts)
         r  = Just x
         x  = head ts
         ts'= snd z
     in
       let z'  = many item ts'
           r'  = fst z'
           ts''= snd z'
       in (x:r', ts'')

This is the weak head normal form of our expression. Of course, we 
wanted  print (many item ts) = putStrLn (show ...)  which means 
evaluating the first component and then the second component in the pair 
to full normal form. So, the next redex to be reduced is  x  followed by 
r' which forces  z'  which at least forces  ts'

  => ...
  => let ts  = x:ts'
         x   = 1
         ts' = 2:3:...
     in
       let z   = (r, ts')
           r   = Just x
       in
         let z'  = let ... in (..,..)
             r'  = fst z'
             ts''= snd z'
         in (x:r',ts'')

To stay sane, we garbage collect  z  and  r  and rename variables before 
expanding the expression for  z'  which is obtained in the same way we 
obtained it before

  let ts0 = x0 : ts1
      x0  = 1
      ts1 = 2:3:...

      z0 = let z  = (r, tail ts1)
               r  = Just x
               x  = head ts1
               ts'= snd z
           in
             let z'  = many item ts'
                 r'  = fst z'
                 ts''= snd z'
             in (x:r', ts'')

      r0  = fst z0
      us0 = snd z0

  in (x0:r0, us0)

Collecting  lets  and renaming yields

  let ts0 = x0 : ts1
      x0  = 1
      ts1 = 2:3:...

      z   = (r, tail ts1)
      r   = Just x1
      x1  = head ts1
      ts' = snd z

      z1  = many item ts'
      r1  = fst z1
      us1 = snd z1

      z0  = (x1:r1, us1)
      r0  = fst z0
      us0 = snd z0

  in (x0:r0, us0)

The insight is that the original naming was bad, r and z are quite 
different from r0 and z0. Reducing r0 and x1 yields

  =>
  let ts0 = x0 : ts1
      x0  = 1
      ts1 = x1 : ts2
      x1  = 2
      ts2 = 3:...

      z   = (r, tail ts1)
      r   = Just x1
      ts' = snd z

      z1  = many item ts'
      r1  = fst z1
      us1 = snd z1

      z0  = (r0, us1)
      r0  = x1:r1
      us0 = snd z0

  in (x0:r0, us0)

The general scheme should be clear now: z,r and ts' are temporary 
variables and further reduction of r1, r2 and so on leads to a chain

  let x0 = 1; ts0 = x0 : ts1
      x1 = 2; ts1 = x1 : ts2
      x2 = 3; ts2 = x2 : ts3
      ...
      x8 = ..

      z   = (r, tail ts8)
      r   = Just x8
      ts' = snd z

      z8  = many item ts'
      r8  = fst z8
      us8 = snd z8

      z7  = (r7, us8)
      r7  = x8:r8
      us7 = snd z7
      ...
      z0  = (r0, us1)
      r0  = x1:r1
      us0 = snd z0

  in (x0:r0, us0)

So, after forcing the first component of the overall result to normal 
form, the result looks like

  (1:2:3:..., snd (_,snd (_,snd (_,...))) )

and it seems that Hugs fails to evaluate the tail recursive chain of 
snd  ??

In the end, here's our decisive result: either Hugs or my analysis has a 
bug :D

Regards,
apfelmus