Re: Integer constant folding in the presence of new primops

19 Jun 2013

      I mean, it certainly *seems* reasonable a 15% hit could come from
pipelining changes or cache behavior or something. I don't think
alignment would really be a huge issue; post-Nehalem I believe
non-aligned writes/reads are extremely cheap. Non-intuitive behavior
can totally happen too: I've seen cases of adding instructions to a
loop which speeds things up (e.g. by taking the extra step, you may
mitigate a dependency stall, which massively helps pipelining across
the loop body etc.)

Nicolas, can I ask what benchmark you're looking at? And what
performance tools are you using, Intels'? If you're on Linux, the
'perf' tool on a modern kernel can be used to quickly get an overview
of how many cache misses/hits your process has, how many pipeline
stalls occur, etc. You can then use it to drill down a bit into the
assembly that's problematic.

That might not give you an exact culprit (it could be many changes and
accumulative hits,) but it's a start.

On Wed, Jun 19, 2013 at 10:43 AM, Nicolas Frisby
 wrote:
...
I'm also seeing performance regressions in the shootout benchmarks that I
can't identify in the asm. The new asm looks better but performs worse, with
a ~15% slowdown.
I fired up the performance counters in my CPU and the free Intel code for
inspecting them showed that my CPU utilization took about a 10% hit, even
while executing fewer total instructions.
1) Jan, perhaps we're seeing the same sort of behavior — the shootout
benchmarks have extremely hot loops (hundreds of millions of iterations
IIRC). I used ticky profiling too, and saw no suspicious changes in any
counters.
2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
program with a very hot loop is due to incidentally inhibiting some caching
behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
Core i7-2620M, Sandy Bridge.
Thanks all.
On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek 
wrote:
...
...
If it's not sorted out, can you open a ticket, put in the relevant info
(so
we don't need to look at the email trail), and we can tackle it when you
get here.
Currently there's a temporary workaround: I'm using new folding rules for
all primitive types,
except for Integer, in which case I left the old folding rules unchanged.
This of course should
be modified to make all rules uniform, but for now it at least passes
validation. I didn't fill
the ticket, because the bug does not exist yet :) It only manifests itself
in my patches, which
have not been applied yet. I'll add all the information from this
discussion to my github fork of
GHC and then move it to Trac once the bug makes it to HEAD.
What worries me more about my patches is the performance regression in
kahan, because I see no
obvious differences in the generated assembly.
Janek
...
Simon
-----Original Message-----
From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org]
On
Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
To: Ian Lynagh
Cc: ghc-devs@haskell.org
Subject: Re: Integer constant folding in the presence of new primops
...
If you remove everything but the quotInteger test from
integerConstantFolding and compile with -ddump-rule-rewrites then
you'll see that the eqInteger rule fires before quotInteger. This is
presumably comparing against 0, as the definition of quot for Integer
(in GHC.Real) is
    _ `quot` 0 = divZeroError
    n `quot` d = n `quotInteger` d
Yes, I noticed these two rules firing together - perhaps that's the
explanation why. I created a small program for testing:
main = print quotInt
quotInt :: Integer
quotInt = 100063 `quot` 156
I noticed that when I define eqInteger wrapper to be NOINLINE, the call
to
quot is translated to Core as:
Main.quotInt =
  GHC.Real.$fIntegralInteger_$cquot
    (__integer 100063) (__integer 156)
but when I change the wrapper to INLINE I get:
Main.quotInt =
  GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
IntegralInteger (__integer 100063) (__integer 156)
All rule firing happens later (I used -ddump-simpl-iterations
-ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
don't fire.
...
Do you also still have eqInteger wired in? It sounds like you might
have given them both the same unique?
No, they didn't have the same unique. I modified the existing rules to
work
on the new primops and ignore their wrappers. At the moment I reverted
these changes so that I can make progress and leave this problem for
later.
Janek
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs
-- 
Regards,
Austin - PGP: 4096R/0x91384671

Re: Integer constant folding in the presence of new primops

Austin Seipp