Re: [GHC] #10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm

31 Jul 2015

      ...
Experimentally bgamari's test program does ~n2 allocations and takes ~n3
 total time in the Applicative version, while the Monad version runs in
#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in
ByteCodeAsm
-------------------------------------+-------------------------------------
        Reporter:  bgamari           |                   Owner:  bgamari
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |               Test Case:  ghcirun004
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by nomeata):

 linear allocations and time.

 That and the fact that in comment:4 I need the associativity law, sounds
 like there is a quadratic behavior due to wrongly associated binds.

 Let’s see if I can evaluate my way by hand through it.

 {{{#!hs
 -- These fragments from bgamari’s test case
 let t n = Thing n cr2
 let cr2 = const $ return 2

    run (t 1 >> (t 2 >> t 3))
 == run (Thing 1 (cr2 >=> (\_ -> (t 2 >> t 3))))
 == run ((cr2 1 >=> (\_ -> (t 2 >> t 3))) 1)
 == run (cr2 1 >>=  (\_ -> (t 2 >> t 3)))
 == run (return 2 >>= (\_ -> (t 2 >> t 3)))
 == run ((\_ -> (t 2 >> t 3)) 2)
 == run (t 2 >> t3)
 == run (Thing 2 (cr2 >=> (\_ -> t 3)))
 == run ((cr2 2 >=> (\_ -> t 3)) 1)
 == run (cr2 2 >>=  (\_ -> t 3))
 == run (return 2 >>= (\_ -> t 3))
 == run ((\_ -> t 3) 2)
 == run (t 3)
 == run (Thing 3 cr2)
 == run (cr2 3)
 == run (return 2)
 == 2
 }}}

 For the applicative version, based on the empirical implementation, I
 assume that some parts of the code are kept alive for too long, and
 possibly be traversed multiple times. So here we go:

 {{{#!hs
 let cri = \_ -> return id)
 let ri  = (\x -> return (id x))

    run (t 1 *> (t 2 *> t 3))
 == run ((t 1 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2
 x3))))
 == run ((Thing 1 (cr2 >=> cri)) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 ->
 return (x2 x3))))
 == run (Thing 1 ((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return
 (x2 x3)))))
 == run (((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2
 x3)))) 1)
 == run (((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2
 x3)))) 1)
 == run (((cr2 >=> cri) 1 >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2
 x3)))))
 == run (((cr2 1 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2
 x3)))))
 == run (((return 2 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return
 (x2 x3)))))
 == run (((cri 2) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))))
 == run (((return id) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2
 x3)))))
 == run ((\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))) id)
 == run ((t 2 *> t 3) >>= ri) -- *¹
 == run (((t 2 >>= cri) >>= (\x2 -> t3) >>= (\x3 -> return (x2 x3)))) >>=
 ri)
 == ... -- as before
 == run ((t 3 >>= ri) >>= ri)
 == run (Thing 3 (cr2 >=> ri) >>= ri)
 == run (Thing 3 ((cr2 >=> ri) >=> ri)) -- *²
 == run (((cr2 >=> ri) >=> ri) 3)
 == run ((cr2 >=> ri) 3 >>= ri)
 == run ((cr2 3 >>= ri) >>= ri)
 == run ((return 2 >>= ri) >>= ri)
 == run (ri 2 >>= ri)
 == run (return 2 >>= ri)
 == run (ri 2)
 == run (return 2)
 == 2
 }}}

 `*¹`: I think this is interesting. Whereas above, `run (a >> b)` will
 eventually reduce to `run b`, here we get `run (b >>= complex_return)`,
 with one `complex_return` for every element in the list. This explains at
 least some of the blow up: We build this chain, and then we have to
 eventually tear it down again.

 `*²` And again we traverse this whole chain of pointless `ri`’s.

 Hmm, not sure if and how this exposition explains the quadratic blow up in
 allocations, though. Do we traverse the stack of `>=> ri` once per element
 somehow, similar to a wrongly associated `++`? But I don’t see it right
 now.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:7
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler