
#9646: Simplifer non-determinism leading to 8 fold difference in run time performance -------------------------------------+------------------------------------- Reporter: erikd | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 (amd64) Type of failure: Runtime | Difficulty: Unknown performance bug | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by erikd): Amos Robinson has helped make some progress on this. The problem is with the `StrictPrim` monad (defined in the file `Common/GHC/Ineteger/StrictPrim.hs` of the ghc-perfbug-test project) which basically an ST monad with a bunch of explicit bang patterns to force strict evaluation. The big difference between the slow and the fast version is that in the slow version of the .dump-simpl output of these inner loops have an extra lambda eg on line 39 of `slow-Natural.dump-simpl`: {{{ (\ eta -> (# eta, vx #)) `cast` ...; }}} The `eta` variable in this case is the state token for the `StrictPrim` monad. Furthermore, Amos Robinson was able to re-write the inner loop from: {{{#!hs innerLoop2 !xi !yi !carryhi !carrylo !sum | yi < n2 = do x <- indexWordArrayM arr1 xi y <- indexWordArrayM arr2 yi let (# !cry0, !prod #) = timesWord2 x y (# !cry1, !sum1 #) = plusWord2 prod sum (# !tcryhi, !crylo #) = plusWord2C carrylo cry0 cry1 !cryhi = plusWord carryhi tcryhi innerLoop2 (xi - 1) (yi + 1) cryhi crylo sum1 | otherwise = do return $! (carryhi, carrylo, sum) }}} to {{{#!hs innerLoop2 !xi !yi !carryhi !carrylo !sum = StrictPrim $ \s -> if yi < n2 then let StrictPrim go = do x <- indexWordArrayM arr1 xi y <- indexWordArrayM arr2 yi let (# !cry0, !prod #) = timesWord2 x y (# !cry1, !sum1 #) = plusWord2 prod sum (# !tcryhi, !crylo #) = plusWord2C carrylo cry0 cry1 !cryhi = plusWord carryhi tcryhi innerLoop2 (xi - 1) (yi + 1) cryhi crylo sum1 in go s else (# s, (carryhi, carrylo, sum) #) }}} and this new formulation *always* compiles to run fast. After some discussion with Amos, we came to the following conclusions: 1. The simplifier should be deterministic. The same input file, with the same compiler flags should result in the same output (preferably the output that performs better). 2. It is not un-reasonable to expect the compiler to apply the transformation that Amos appied manually. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9646#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler