
#8763: forM_ [1..N] does not get fused (10 times slower than go function) -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #7206 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): It seems that for `IO`, GHC decides that it's OK to inline `c` from the [https://hackage.haskell.org/package/base-4.11.0.0/docs/src/GHC.Enum.html#efd... fusion helper of enumFromThenTo], but not so for `ST s`. For our case, `c` is the `<huge>` computation (see the worker `$wc` in comment:44) performed for each outer list element and would be duplicated by inlining: It's mentioned thrice in the definition of `efdtIntUpFB`. Consequently, `c` has almost always `Guidance=NEVER`, except in the `IO` case, where it miraculously gets `Guidance=IF_ARGS [20 420 0] 674 0` just when it is inlined. Not sure what this decision is based on. The inlining decision for `eftIntFB` is much easier: `c` [https://hackage.haskell.org/package/base-4.11.0.0/docs/src/GHC.Enum.html#eft... only happens once there]. I'm not sure if `IO` gets special treatment by the inliner, but I see a few ways out: * Do the same hacks for `ST`, if there are any which apply (ugly) * Reduce the number of calls to `c` in the implementation of `efdtIntUpFB`, probably for worse branch prediction * Figure out why the floated out expression of `\x -> (nop x *>)` occuring in `forM_ nop = flip mapM_ nop = foldr ((>>) . nop) (return ())` doesn't get eta-expanded in the `ST` case, whereas the relevant `IO` code is. I hope that by fixing this, the `c` expression inlines again. Here's how it inlines for `IO`: {{{ (>>) . nop = \x -> (nop x >>) = \x -> (nop x *>) -- notice how it's no different than ST up until here = \x -> (thenIO (nop x)) }}} The inliner probably stops here, but because of eta-expansion modulo coercions to `\x k s -> thenIO (nop x) k s`, we can inline [https://hackage.haskell.org/package/base-4.11.0.0/docs/src/GHC.Base.html#the... thenIO]: {{{ \x k s -> thenIO (nop x) y s = \x k s -> case nop x s of (# new_s, _ #) -> k new_s) }}} which is much better and probably more keenly inlined than `\x -> (nop x *>)` in the `ST` case. What makes GHC eta-expand one, but not the other? This is just a wild guess and the only real difference I could make out in diffs. Maybe someone with actual insights into the simplifier can comment on this claim (that the inliner gives up on `c` due to the missed eta- expansion and inlining)? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8763#comment:45 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler