A performance predicament

Hi devs, I'm working on some compiler performance bugs. I've implemented caching for coercion kinds at Phab:D1992. But tests show a net slowdown, which I'm currently investigating. I could abandon this work for 8.0, but it really should show an improvement, and so I'm looking deeper. A little profiling has shown that the (>>=) operator of the FlatM monad is doing 13% of all allocations on a test program (T3064). The sad thing is that, if I understand correctly, this bind shouldn't do any allocation at all! Here are the relevant definitions:
newtype FlatM a = FlatM { runFlatM :: FlattenEnv -> TcS a } newtype TcS a = TcS { unTcS :: TcSEnv -> TcM a } type TcM = TcRn type TcRn = TcRnIf TcGblEnv TcLclEnv type TcRnIf a b = IOEnv (Env a b) newtype IOEnv env a = IOEnv (env -> IO a)
As we can see here, FlatM a is equivalent to (Foo -> Bar -> Baz -> IO a). So working in this monad should just pass around the three parameters without doing any allocation, unless IO's bind operation does allocation. (I assume we use magic to prevent that last piece.) I've tried adding INLINE to the various pieces to no avail. Am I misunderstanding something fundamental here? I feel like I must be. Thanks, Richard PS: The allocation done by FlatM was around before my caching optimization. So I'm working slightly orthogonally to where I started. But if IOEnv is really doing something even slightly slowly, a small tweak here could net massive improvements in overall speed.

You don't say how you got those numbers, but if it's by -prof-auto-all it may be a red herring. Profiling prevents optimisation!
I built TcFlatten by touching it, make, grab command line, then execute that command line again with -ddump-simpl.
The defn of >>= for FlatM looks scary (below); but it's never called because it's already been inlined. So I don’t think it'll allocate anything.
An alternative approach is to add manual SCCs and drill in gradually
Simon
TcFlatten.$fMonadFlatM1 =
\ (@ a11_abat)
(@ b_abau)
(m_a7hJ :: FlatM a11_abat)
(k_a7hK :: a11_abat -> FlatM b_abau)
(env_a7hL :: FlattenEnv) ->
let {
m1_aegb [Dmd=

On Mar 11, 2016, at 12:04 PM, Simon Peyton Jones
Profiling prevents optimisation!
I didn't know that. Why does it do this? It defeats the point of profiling a bit if we can't optimize. I just assumed the optimizer looks through the SCCs, preserving them, but that otherwise SCCs don't get in the way. Anyway, helpful to know. Is this fact documented in the manual? Thanks! Richard
participants (2)
-
Richard Eisenberg
-
Simon Peyton Jones