[GHC] #12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x)

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- See the attached somewhat reduced version of the main (manually unrolled) portion of an MD5 implementation taken from PureMD5. It used to compile in about 200ms, which is already bad enough, but now takes 1200 ms (ghc 6.12.3 to ghc 7.10.3 using -O2). I'd like to note that in my light testing the UNPACK pragma accounts for 400ms of this delta. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tommd): * Attachment "regression-md5.hs" added. A slice of pureMD5 that shows performance issues in GHC -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * failure: None/Unknown => Compile-time performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tommd): On 8.0 rc2: {{{ $ time ghc -O2 -fforce-recomp regression-md5.hs [1 of 1] Compiling Data.Digest.Pure.MD5 ( regression-md5.hs, regression- md5.o ) real 0m1.497s user 0m1.460s sys 0m0.028s }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Interesting, it looks like we are spending a significant amount of time (500 ms out of a 1500 ms compilation) in CodeGen on 8.0.1 despite the fact that the simplified program is only 6000 terms, 2000 types. It looks like there was a rather significant regression from 7.6 (where compilation took around 700 ms) to 7.8 (1200 ms). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * cc: simonmar (added) Comment: Cc'ing Simon Marlow, who is quite familiar with the sinking pass. Indeed this example produces an extremely large amount of code (around 2700 lines of C--) in `applyMD5Rounds`. Indeed the sink pass does do a significantly amount of fiddling with this code. For instance, prior to sinking much of the code for this function is bit twiddling, {{{ _cbnY::I64 = _s9Sn::I64 ^ 4294967295; _s9WQ::I64 = _cbnY::I64; _cbo1::I64 = _s9WQ::I64 & _s9Sp::I64; _s9WR::I64 = _cbo1::I64; _cbo4::I64 = _s9Sn::I64 & _s9So::I64; _s9WP::I64 = _cbo4::I64; _cbo7::I64 = _s9WP::I64 | _s9WR::I64; _s9WS::I64 = _cbo7::I64; _cboa::I64 = _s9WS::I64 + _s9SG::I64; _s9WT::I64 = _cboa::I64; _cbod::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9WT::I64)); _s9WU::I64 = _cbod::I64; _cbog::I64 = _s9WU::I64 + 3614090360; _s9WV::I64 = _cbog::I64; _cboj::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9WV::I64)); _s9WW::I64 = _cboj::I64; _cbom::I64 = _s9WW::I64 + _s9Sm::I64; _s9WX::I64 = _cbom::I64; _cbop::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9WX::I64)); _s9WO::I64 = _cbop::I64; _cbos::I64 = _s9WO::I64 >> 25; _s9X0::I64 = _cbos::I64; _cbov::I64 = _s9WO::I64 << 7; _s9WZ::I64 = _cbov::I64; _cboy::I64 = _s9WZ::I64 | _s9X0::I64; _s9X1::I64 = _cboy::I64; _cboB::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9X1::I64)); _s9X2::I64 = _cboB::I64; _cboE::I64 = _s9X2::I64 + _s9Sn::I64; _s9X3::I64 = _cboE::I64; _cboH::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9X3::I64)); _s9WY::I64 = _cboH::I64; _cboK::I64 = _s9WY::I64 ^ 4294967295; }}} Which sinking turns into, {{{ _s9WO::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(%MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(%MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9Sn::I64 & _s9So::I64 | _s9Sn::I64 ^ 4294967295 & _s9Sp::I64 + _s9SG::I64)) + 3614090360)) + I64[Sp + 152])); _s9WY::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(%MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32((_s9WO::I64 << 7) | (_s9WO::I64 >> 25))) + _s9Sn::I64)); _s9X4::I64 = %MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(%MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(%MO_UU_Conv_W32_W64(%MO_UU_Conv_W64_W32(_s9WY::I64 & _s9Sn::I64 | _s9WY::I64 ^ 4294967295 & _s9So::I64 + _s9SX::I64)) + 3905402710)) + _s9Sp::I64)); }}} It would be interesting to know how the code produced by 6.12 differs from that produced by newer compilers. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12028: Large let bindings are 6x slower (since 6.12.x to 7.10.x) -------------------------------------+------------------------------------- Reporter: tommd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): There were regressions due to the new code generator, but we felt that overall the architecture benefits were worth the hit. I actually spent a lot of time getting the overheads down to an acceptable level. A common trick to apply is to spot a pattern in the code generator and do some specialised code generation. Maybe it's possible to apply that here? See also #7198. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12028#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC