
Experimentally bgamari's test program does ~n2 allocations and takes ~n3 total time in the Applicative version, while the Monad version runs in
#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): linear allocations and time. That and the fact that in comment:4 I need the associativity law, sounds like there is a quadratic behavior due to wrongly associated binds. Let’s see if I can evaluate my way by hand through it. {{{#!hs -- These fragments from bgamari’s test case let t n = Thing n cr2 let cr2 = const $ return 2 run (t 1 >> (t 2 >> t 3)) == run (Thing 1 (cr2 >=> (\_ -> (t 2 >> t 3)))) == run ((cr2 1 >=> (\_ -> (t 2 >> t 3))) 1) == run (cr2 1 >>= (\_ -> (t 2 >> t 3))) == run (return 2 >>= (\_ -> (t 2 >> t 3))) == run ((\_ -> (t 2 >> t 3)) 2) == run (t 2 >> t3) == run (Thing 2 (cr2 >=> (\_ -> t 3))) == run ((cr2 2 >=> (\_ -> t 3)) 1) == run (cr2 2 >>= (\_ -> t 3)) == run (return 2 >>= (\_ -> t 3)) == run ((\_ -> t 3) 2) == run (t 3) == run (Thing 3 cr2) == run (cr2 3) == run (return 2) == 2 }}} For the applicative version, based on the empirical implementation, I assume that some parts of the code are kept alive for too long, and possibly be traversed multiple times. So here we go: {{{#!hs let cri = \_ -> return id) let ri = (\x -> return (id x)) run (t 1 *> (t 2 *> t 3)) == run ((t 1 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) == run ((Thing 1 (cr2 >=> cri)) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) == run (Thing 1 ((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) 1) == run (((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) 1) == run (((cr2 >=> cri) 1 >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((cr2 1 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((return 2 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((cri 2) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((return id) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run ((\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))) id) == run ((t 2 *> t 3) >>= ri) -- *¹ == run (((t 2 >>= cri) >>= (\x2 -> t3) >>= (\x3 -> return (x2 x3)))) >>= ri) == ... -- as before == run ((t 3 >>= ri) >>= ri) == run (Thing 3 (cr2 >=> ri) >>= ri) == run (Thing 3 ((cr2 >=> ri) >=> ri)) -- *² == run (((cr2 >=> ri) >=> ri) 3) == run ((cr2 >=> ri) 3 >>= ri) == run ((cr2 3 >>= ri) >>= ri) == run ((return 2 >>= ri) >>= ri) == run (ri 2 >>= ri) == run (return 2 >>= ri) == run (ri 2) == run (return 2) == 2 }}} `*¹`: I think this is interesting. Whereas above, `run (a >> b)` will eventually reduce to `run b`, here we get `run (b >>= complex_return)`, with one `complex_return` for every element in the list. This explains at least some of the blow up: We build this chain, and then we have to eventually tear it down again. `*²` And again we traverse this whole chain of pointless `ri`’s. Hmm, not sure if and how this exposition explains the quadratic blow up in allocations, though. Do we traverse the stack of `>=> ri` once per element somehow, similar to a wrongly associated `++`? But I don’t see it right now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler