
#14035: Weird performance results. -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): Simon, first of all, thank you very much for your time and help with this topic! I added some important notices to the points mentioned in your response: **(1)** I'm so happy that you've found out that something is wrong and you've got fix for that! In generall, `-XStrict` is awesome, we need it in high performance Haskell code, putting bangs everywhere (and remembering about it) could be cumbersome. **(2)** You're of course right. I just opened the browser to add comment exactly about the same finding. The specification of `(|||)` allows GHC to easily discover that if we always use `XFalse` value, it could shorten the mentioned code to `s@(T b' a') <- fromFailParser $ f a ; return s` (just reuse the value). There are however 3 other non-obvious questions involved: **(2a)** Why GHC is able to optimize the code this way if we use everywhere `-XFalse` but it does not when using everywhere `-XTrue`? Very similar final core could be generated in the later case – if `b` is `XFalse` we can just reuse the output value, if it is `XTrue` we can be sure the output always contains `XTrue` as well. **(2b)** Even if GHC needs to create code like `T b' a' <- fromFailParser $ f a ; return $ T something a'`, why it takes so long? This is a strict, fully evaluated value, so why "updating a field" takes 10x longer than Char comparison? **(2c)** Moreover, what is the reason to "allocate a fresh `T` every time round the loop"? The fields of the tuple `T` do not "interact" with each other, they are just 2 separate outputs from a function. I could of course be very wrong, but I think it should be possible to just optimize `T a b` to `(# a,b #)` and cut the "fresh `T` allocation time" completely out, am I right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14035#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler