While trying to come up with a minimal example I discovered one more puzzling thing. runghc is fastest, ghc is slower, ghc with optimization is slowest. This is completely reverse of the expected order.

ghc -O1 (-O2 is similar): 

time                 15.23 ms   (14.72 ms .. 15.73 ms)

ghc -O0:

time                 3.612 ms   (3.548 ms .. 3.728 ms)

runghc:

time                 2.250 ms   (2.156 ms .. 2.348 ms)


I am grokking it further. Any pointers will be helpful. I understand that -O2 can sometimes be slower e.g. aggressive inlining can sometimes be counterproductive. But 4x variation is a lot and this is the case with -O1 as well which should be relatively safer than -O2 in general. Worst of all runghc is significantly faster than ghc. What's going on?

-harendra


On 8 September 2017 at 18:49, Harendra Kumar <harendra.kumar@gmail.com> wrote:
I will try creating a minimal example and open a ticket for the inlining problem, the one I am sure about. 

-harendra

On 8 September 2017 at 18:35, Simon Peyton Jones <simonpj@microsoft.com> wrote:

I know that this is not an easy request, but can either of you produce a small example that demonstrates your problem?   If so, please open a ticket.

 

I don’t like hearing about people having to use trial and error  with INLINE or SPECIALISE pragmas.  But I can’t even begin to solve the problem unless I can reproduce it.

 

Simon

 

From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Harendra Kumar
Sent: 08 September 2017 13:50
To: Mikolaj Konarski <mikolaj.konarski@gmail.com>
Cc: ghc-devs@haskell.org
Subject: Re: Performance degradation when factoring out common code

 

I should also point out that I saw performance improvements by manually factoring out and propagating some common expressions to outer loops in performance sensitive paths. Now I have made this a habit to do this manually. Not sure if something like this has also been fixed with that ticket or some other ticket.

 

-harendra

 

On 8 September 2017 at 17:34, Harendra Kumar <harendra.kumar@gmail.com> wrote:

Thanks Mikolaj! I have seen some surprising behavior quite a few times recently and I was wondering whether GHC should do better. In one case I had to use SPECIALIZE very aggressively, in another version of the same code it worked well without that. I have been doing a lot of trial and error with the INLINE/NOINLINE pragmas to figure out what the right combination is. Sometimes it just feels like black magic, because I cannot find a rationale to explain the behavior. I am not sure if there are any more such problems lurking in, perhaps this is an area where some improvement looks possible.

 

-harendra

 

 

On 8 September 2017 at 17:10, Mikolaj Konarski <mikolaj.konarski@gmail.com> wrote:

Hello,

I've had a similar problem that's been fixed in 8.2.1:

https://ghc.haskell.org/trac/ghc/ticket/12603

You can also use some extreme global flags, such as

ghc-options: -fexpose-all-unfoldings -fspecialise-aggressively

to get most the GHC subtlety and shyness out of the way
when experimenting.

Good luck
Mikolaj




On Fri, Sep 8, 2017 at 11:21 AM, Harendra Kumar
<harendra.kumar@gmail.com> wrote:
> Hi,
>
> I have this code snippet for the bind implementation of a Monad:
>
>     AsyncT m >>= f = AsyncT $ \_ stp yld ->
>         let run x = (runAsyncT x) Nothing stp yld
>             yield a _ Nothing  = run $ f a
>             yield a _ (Just r) = run $ f a <> (r >>= f)
>         in m Nothing stp yield
>
> I want to have multiple versions of this implementation parameterized by a
> function, like this:
>
> bindWith k (AsyncT m) f = AsyncT $ \_ stp yld ->
>     let run x = (runAsyncT x) Nothing stp yld
>         yield a _ Nothing  = run $ f a
>         yield a _ (Just r) = run $ f a `k` (bindWith k r f)
>     in m Nothing stp yield
>
> And then the bind function becomes:
>
> (>>=) = bindWith (<>)
>
> But this leads to a performance degradation of more than 10%. inlining does
> not help, I tried INLINE pragma as well as the "inline" GHC builtin. I
> thought this should be a more or less straightforward replacement making the
> second version equivalent to the first one. But apparently there is
> something going on here that makes it perform worse.
>
> I did not look at the core, stg or asm yet. Hoping someone can quickly
> comment on it. Any ideas why is it so? Can this be worked around somehow?
>
> Thanks,
> Harendra
>

> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>