Re: performance regressions

16 Dec 2014

      I've learned several very interesting things in this analysis.

- Inlining polymorphic methods is very important. Here are some data points to back up that claim:
   * Original implementation using zipWithAndUnzipM:    8,472,613,440 bytes allocated in the heap
   * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488 bytes allocated in the heap
   * Using `inline` at call site to force inlining:     6,281,539,792 bytes allocated in the heap

The middle step above allowed GHC to specialize zipWithAndUnzipM to my particular monad, but GHC didn't see fit to actually inline the function. Using `inline` forced it, to good effect. (I did not collect data on code sizes, but it wouldn't be hard to.)

By comparison:
   * Hand-written recursion:    6,587,809,112 bytes allocated in the heap
Interestingly, this is *not* the best result!

Conclusion: We should probably add INLINE pragmas to Util and MonadUtils.

- I then looked at rejiggering the algorithm to keep the common case fast. This had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM, from Control.Monad. To my surprise, this brought disaster!
   * Using `inline` and mapAndUnzipM:        7,463,047,432 bytes allocated in the heap
   * Hand-written recursion:                 5,848,602,848 bytes allocated in the heap

That last number is better than the numbers above because of the algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it already has an INLINE pragma in Control.Monad of course. Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting inlined, but a call to `map` remained, perhaps causing extra allocation.

Conclusion: We should examine the implementation of mapAndUnzipM (and similar functions) in Control.Monad. Is it as fast as possible?

In the end, I was unable to bring the allocation numbers down to where they were before my work. This is because the flattener now deals in roles. Most of its behavior is the same between nominal and representational roles, so it seems silly (though very possible) to specialize the code to nominal to keep that path fast. Instead, I identified one key spot and made that go fast. 

Thus, there is a 7% bump to memory usage on very-type-family-heavy code, compared to before my commit on Friday. (On more ordinary code, there is no noticeable change.)

Validating my patch locally now; will push when that's done.

Thanks,
Richard

On Dec 16, 2014, at 10:41 AM, Joachim Breitner  wrote:
...
Hi,
Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
...
On Dec 16, 2014, at 4:01 AM, Joachim Breitner  wrote:
...
another guess (without looking at the code, sorry): Are they in the same
module? I.e., can GHC specialize the code to your particular Monad?
...
No, they're not in the same module. I could also try moving the
zipWithAndUnzipM function to the same module, and even specializing it
by hand to the right monad.
I did mean zipWithAndUnzipM, so maybe yes: Try that.
(I find it hard to believe that any polymorphic monadic code should
perform well, with those many calls to an unknown (>>=) with a function
parameter, but maybe I’m too pessimistic here.)
...
Could that be preventing the fusing?
There is not going to be any fusing here, at least not list fusion; that
would require your code to be written in terms of functions with fusion
rules.
Greetings,
Joachim
-- 
Joachim “nomeata” Breitner
 mail@joachim-breitner.de • http://www.joachim-breitner.de/
 Jabber: nomeata@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
 Debian Developer: nomeata@debian.org
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs