[GHC] #10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Compile-time Unknown/Multiple | performance bug Test Case: ghcirun004 | Blocked By: Blocking: | Related Tickets: Differential Revisions: | -------------------------------------+------------------------------------- Phab:D924 has proposed that we redefine `mapM_`, currently, {{{ mapM_ = foldr ((>>) . f) (return ()) }}} as, {{{ mapM_ = foldr ((*>) . f) (return ()) = traverse_ }}} as part of the AMP proposal. However, this appears to have severe effects on the performance characteristics of the `Assembler` monad defined in `ByteCodeAsm`. In particular, the `mapM_` use in `ByteCodeAsm.assembleBCO` blows up severely, increasing the runtime of the `ghcirun004` testcase from 4 seconds to over 5 minutes. Intriguingly, defining `(*>) = (>>)` in `Assembler`'s `Applicative` instance (as done in Phab:D1097) restores reasonable runtime. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): [https://gist.github.com/bgamari/9623997162a3399859a9 Here] is a minimal testcase demonstrating the difference. The efficient definition of `mapM_` (using `(>>)`) produces this Core, {{{#!hs mapA_3 :: Assembler () mapA_3 = Main.Pure () doTestM_go :: [Assembler Integer] -> Assembler () doTestM_go = \ ds_a1XT -> case ds_a1XT of _ { [] -> mapA_3; : y_a1XY ys_a1XZ -> let { k_a1VW k_a1VW = doTestM_go ys_a1XZ } in $c>>= y_a1XY (\ _ -> k_a1VW) } doTestM :: Assembler () doTestM = doTestM_go test }}} Whereas the slower `Applicative`-based definition produces, {{{#!hs mapA_3 :: Assembler () mapA_3 = Main.Pure () mapA_2 :: Assembler (() -> ()) mapA_2 = Main.Pure id lvl4_r3JE :: Integer -> Assembler (() -> ()) lvl4_r3JE = \ _ -> Main.mapA_2 doTestA_go :: [Assembler Integer] -> Assembler () doTestA_go = \ ds_a1XT -> case ds_a1XT of _ { [] -> mapA_3; : y_a1XY ys_a1XZ -> let { m2_a1W7 m2_a1W7 = doTestA_go ys_a1XZ } in $c>>= ($c>>= y_a1XY lvl4_r3JE) (\ x1_a1W8 -> $c>>= m2_a1W7 (\ x2_a1W9 -> Pure (x1_a1W8 x2_a1W9))) } doTestA :: Assembler () doTestA = doTestA_go test }}} Note the three `(>>=)` uses in the applicative version, compared to the single invocation in the monadic version. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): Indeed, one can go from the slow code to the efficient, by just applying all three of the monad laws in the right order :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): BTW, here are the relevant definitions that add up to three `>>=` for one `*>`: {{{#!hs a1 *> a2 = (id <$ a1) <*> a2 (<$) = fmap . const fmap = liftM liftM f m1 = do { x1 <- m1; return (f x1) } -- here is one (<*>) = ap ap m1 m2 = do { x1 <- m1; x2 <- m2; return (x1 x2) } -- here are two }}} It seems that performance-worrying code should simply ''not'' implement `Functor` and `Applicative` via the `Monad`-derived methods. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): Of course, one might expect the compiler to derive the good implementation from the bad. It would be valid, assuming the monad laws hold (which we generally do not do): {{{#!hs a1 *> a2 == (id <$ a1) <*> a2 -- inline *> == fmap (const id) a1 <*> a2 -- inline <$ == liftM (const id) a1 <*> a2 -- inline fmap == (a1 >>= (\x1 -> return (const id x1))) <*> a2 -- liftM == (a1 >>= (\_ -> return id)) <*> a2 -- inline const == (a1 >>= (\_ -> return id)) >>= (\x2 -> a2 >>= (\x3 -> return (x2 x3))) -- inline <*> and ap == a1 >>= (\_ -> return id >>= (\x2 -> a2 >>= (\x3 -> return (x2 x3)))) -- assoc monad law == a1 >>= (\_ -> a2 >>= (\x3 -> return (id x3))) -- first monad law == a1 >>= (\_ -> a2 >>= return) -- inline id, eta-contract == a1 >>= (\_ -> a2) -- second monad law }}} Maybe a bit out of reach for our current simplification infrastructure, where for example the methods `>>=` and `return` will quickly be replaced by their concrete implementations -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by rwbarton): I have to say that I still don't understand exactly why {{{ a1 *> a2 == (a1 >>= (\_ -> return id)) >>= (\x2 -> a2 >>= (\x3 -> return (x2 x3))) }}} is more than a constant (say 10 times) slower than `a1 >> a2` for this `Assembler` monad. Experimentally bgamari's test program does ~n^2^ allocations and takes ~n^3^ total time in the Applicative version, while the Monad version runs in linear allocations and time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Replying to [comment:5 rwbarton]:
I have to say that I still don't understand exactly why {{{ a1 *> a2 == (a1 >>= (\_ -> return id)) >>= (\x2 -> a2 >>= (\x3 -> return (x2 x3))) }}} is more than a constant (say 10 times) slower than `a1 >> a2` for this `Assembler` monad.
Me neither! An articulate explanation from someone would be v helpful -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Experimentally bgamari's test program does ~n2 allocations and takes ~n3 total time in the Applicative version, while the Monad version runs in
#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): linear allocations and time. That and the fact that in comment:4 I need the associativity law, sounds like there is a quadratic behavior due to wrongly associated binds. Let’s see if I can evaluate my way by hand through it. {{{#!hs -- These fragments from bgamari’s test case let t n = Thing n cr2 let cr2 = const $ return 2 run (t 1 >> (t 2 >> t 3)) == run (Thing 1 (cr2 >=> (\_ -> (t 2 >> t 3)))) == run ((cr2 1 >=> (\_ -> (t 2 >> t 3))) 1) == run (cr2 1 >>= (\_ -> (t 2 >> t 3))) == run (return 2 >>= (\_ -> (t 2 >> t 3))) == run ((\_ -> (t 2 >> t 3)) 2) == run (t 2 >> t3) == run (Thing 2 (cr2 >=> (\_ -> t 3))) == run ((cr2 2 >=> (\_ -> t 3)) 1) == run (cr2 2 >>= (\_ -> t 3)) == run (return 2 >>= (\_ -> t 3)) == run ((\_ -> t 3) 2) == run (t 3) == run (Thing 3 cr2) == run (cr2 3) == run (return 2) == 2 }}} For the applicative version, based on the empirical implementation, I assume that some parts of the code are kept alive for too long, and possibly be traversed multiple times. So here we go: {{{#!hs let cri = \_ -> return id) let ri = (\x -> return (id x)) run (t 1 *> (t 2 *> t 3)) == run ((t 1 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) == run ((Thing 1 (cr2 >=> cri)) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) == run (Thing 1 ((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) 1) == run (((cr2 >=> cri) >=> (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3)))) 1) == run (((cr2 >=> cri) 1 >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((cr2 1 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((return 2 >>= cri) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((cri 2) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run (((return id) >>= (\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))))) == run ((\x2 -> (t 2 *> t 3) >>= (\x3 -> return (x2 x3))) id) == run ((t 2 *> t 3) >>= ri) -- *¹ == run (((t 2 >>= cri) >>= (\x2 -> t3) >>= (\x3 -> return (x2 x3)))) >>= ri) == ... -- as before == run ((t 3 >>= ri) >>= ri) == run (Thing 3 (cr2 >=> ri) >>= ri) == run (Thing 3 ((cr2 >=> ri) >=> ri)) -- *² == run (((cr2 >=> ri) >=> ri) 3) == run ((cr2 >=> ri) 3 >>= ri) == run ((cr2 3 >>= ri) >>= ri) == run ((return 2 >>= ri) >>= ri) == run (ri 2 >>= ri) == run (return 2 >>= ri) == run (ri 2) == run (return 2) == 2 }}} `*¹`: I think this is interesting. Whereas above, `run (a >> b)` will eventually reduce to `run b`, here we get `run (b >>= complex_return)`, with one `complex_return` for every element in the list. This explains at least some of the blow up: We build this chain, and then we have to eventually tear it down again. `*²` And again we traverse this whole chain of pointless `ri`’s. Hmm, not sure if and how this exposition explains the quadratic blow up in allocations, though. Do we traverse the stack of `>=> ri` once per element somehow, similar to a wrongly associated `++`? But I don’t see it right now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): Another factoid: * It does not help to implement `<$` (used by the default implementation of `*>`) manually. * It does not help to implement `<*>` manually * So `a1 *> a2 = (id <$ a1) <*> a2` indeed seems to be the root of the problem. Do we need to educate people to write `(*>) == (>>)` if they write `(<*>) = ap`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by bgamari): Indeed I went through a similar [https://gist.github.com/bgamari/484842415e5faf9c02fb exercise] last night and came to a similiar conclusion: smells fishy but I don't necessarily see anything clearly quadratic. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by akio): I think the quadratic behavior comes from the fact that the `>>=` linearly traverses its LHS. Let's define the size of an `Assembler` value with: {{{#!hs size :: Assembler a -> Integer size (Pure _) = 1 size (Thing i k) = 1 + size (k i) }}} `m >>= f` performs `O(size(m))` operations because `>>=` is recursive on its LHS. Most of the cost is hidden inside a lambda, but you will have to pay it eventually, namely when `run` is finally applied. Now we can analyze the cost of other operations: * `m >> n` needs `O(size(m))` operations, because it's `m >>= \_ -> n`. * `f <$> m` needs `O(size(m))` operations, because it's `m >>= return . f`. * `m <*> n` needs `O(size(m) + size(n))` operations, because it's `m >>= \v -> n >>= \w -> return v w`. * `m *> n` needs `O(size(m) + size(n))` operations, because it's `const <$> m <*> n`. If you use `mapM_` with a `N`-element list of 2-sized `Assembler`s, each application of `>>` costs `O(1)`, so the total cost is `O(N)`. If you use `mapA_` instead, the LHS of an application of `*>` is `O(1)` large but the RHS is `O(N)` large on average. This means it needs `O(N)` operations on average, so the total cost is `O(N*N)`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by bgamari): * cc: core-libraries-committee@… (added) * status: new => closed * resolution: => fixed Comment: Thanks akio for the analysis! It seems pretty clear that rewriting `mapM_` in terms of `traverse_` does not preserve the performance characteristics of user code in general. I'm going to close this and push the matter off to the Core Libraries Committee to decide what implications this might have on their future plans (likely few given they already [https://mail.haskell.org/pipermail/libraries/2015-May/025719.html suspected] there was the potential for performance regressions with this change). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by bgamari): * owner: bgamari => * status: closed => new * resolution: fixed => -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10711: Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: Resolution: invalid | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: ghcirun004 Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => invalid -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10711#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC