[GHC] #11677: Dramatic de-optimization with "-O", "-O1", "-O2" options

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: optimization | Operating System: Linux deoptimization | Architecture: x86_64 | Type of failure: Runtime (amd64) | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Look for this simple program: {{{#!hs: import Control.Monad import Data.Maybe -- import qualified Data.HashMap.Strict as M -- import qualified Data.Map.Lazy as M import qualified Data.Map.Strict as M -- import Control.DeepSeq -- import Control.Exception main :: IO () main = do putStrLn "Start" n <- read <$> getLine q <- read <$> getLine dict' <- M.fromList <$> replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine) -- dict <- evaluate $ force dict' let dict = dict' count <- length <$> catMaybes <$> replicateM q (flip M.lookup dict <$> getLine) print count }}} When compiled __without__ "-O2" it runs about 0.07 sec on my computer. But when compiled __with__ "-O2" it runs about 77 sec (1100 times slowly!). Look: compile and run without "-O2": {{{ % rm -rf ./mime_type mime_type.{o,hi} && ghc mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type [1 of 1] Compiling Main ( mime_type.hs, mime_type.o ) Linking mime_type ... Start 4738 ./mime_type 0,06s user 0,01s system 97% cpu 0,069 total }}} And with "-O2": {{{ % rm -rf ./mime_type mime_type.{o,hi} && ghc -O2 mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type [1 of 1] Compiling Main ( mime_type.hs, mime_type.o ) Linking mime_type ... Start 4738 ./mime_type 76,73s user 0,10s system 99% cpu 1:17,12 total }}} But when force `dict` variable (`dict <- evaluate $ force dict'`), it runs fast in both cases (with and without "-O2"). Also this bug is reproductable with "-O", "-O1" options. Also this bug is reproductable with GHC 7.10.2 and GHC 8.0.1-rc2 (The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204). Data file my_data.txt is attached. It has simple structure: * N, number of key-value pairs * K, number of keys for searching * N key-value pairs * K kes You can generate it with this Ruby script: {{{ #!/usr/bin/env ruyb lst = ('a'..'z').to_a N = 10000 K = 10000 File.open('my_data.txt', 'w') do |f| f.puts(N) f.puts(K) N.times do f.puts("#{lst.sample(3).join} #{lst.sample(5).join}") # f.puts("(\"#{lst.sample(3).join}\",\"#{lst.sample(5).join}\")") end K.times do f.puts("#{lst.sample(3).join}") end end }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by malphunction): * Attachment "my_data.txt" added. Test data -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by malphunction: @@ -77,1 +77,1 @@ - #!/usr/bin/env ruyb + #!/usr/bin/env ruby New description: Look for this simple program: {{{#!hs: import Control.Monad import Data.Maybe -- import qualified Data.HashMap.Strict as M -- import qualified Data.Map.Lazy as M import qualified Data.Map.Strict as M -- import Control.DeepSeq -- import Control.Exception main :: IO () main = do putStrLn "Start" n <- read <$> getLine q <- read <$> getLine dict' <- M.fromList <$> replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine) -- dict <- evaluate $ force dict' let dict = dict' count <- length <$> catMaybes <$> replicateM q (flip M.lookup dict <$> getLine) print count }}} When compiled __without__ "-O2" it runs about 0.07 sec on my computer. But when compiled __with__ "-O2" it runs about 77 sec (1100 times slowly!). Look: compile and run without "-O2": {{{ % rm -rf ./mime_type mime_type.{o,hi} && ghc mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type [1 of 1] Compiling Main ( mime_type.hs, mime_type.o ) Linking mime_type ... Start 4738 ./mime_type 0,06s user 0,01s system 97% cpu 0,069 total }}} And with "-O2": {{{ % rm -rf ./mime_type mime_type.{o,hi} && ghc -O2 mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type [1 of 1] Compiling Main ( mime_type.hs, mime_type.o ) Linking mime_type ... Start 4738 ./mime_type 76,73s user 0,10s system 99% cpu 1:17,12 total }}} But when force `dict` variable (`dict <- evaluate $ force dict'`), it runs fast in both cases (with and without "-O2"). Also this bug is reproductable with "-O", "-O1" options. Also this bug is reproductable with GHC 7.10.2 and GHC 8.0.1-rc2 (The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204). Data file my_data.txt is attached. It has simple structure: * N, number of key-value pairs * K, number of keys for searching * N key-value pairs * K kes You can generate it with this Ruby script: {{{ #!/usr/bin/env ruby lst = ('a'..'z').to_a N = 10000 K = 10000 File.open('my_data.txt', 'w') do |f| f.puts(N) f.puts(K) N.times do f.puts("#{lst.sample(3).join} #{lst.sample(5).join}") # f.puts("(\"#{lst.sample(3).join}\",\"#{lst.sample(5).join}\")") end K.times do f.puts("#{lst.sample(3).join}") end end }}} -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by malphunction: @@ -66,0 +66,3 @@ + Also this bug is reproductable with .Strict and .Lazy versions; and with + Data.HashMap, .Strict and .Lazy + New description: Look for this simple program: {{{#!hs: import Control.Monad import Data.Maybe -- import qualified Data.HashMap.Strict as M -- import qualified Data.Map.Lazy as M import qualified Data.Map.Strict as M -- import Control.DeepSeq -- import Control.Exception main :: IO () main = do putStrLn "Start" n <- read <$> getLine q <- read <$> getLine dict' <- M.fromList <$> replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine) -- dict <- evaluate $ force dict' let dict = dict' count <- length <$> catMaybes <$> replicateM q (flip M.lookup dict <$> getLine) print count }}} When compiled __without__ "-O2" it runs about 0.07 sec on my computer. But when compiled __with__ "-O2" it runs about 77 sec (1100 times slowly!). Look: compile and run without "-O2": {{{ % rm -rf ./mime_type mime_type.{o,hi} && ghc mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type [1 of 1] Compiling Main ( mime_type.hs, mime_type.o ) Linking mime_type ... Start 4738 ./mime_type 0,06s user 0,01s system 97% cpu 0,069 total }}} And with "-O2": {{{ % rm -rf ./mime_type mime_type.{o,hi} && ghc -O2 mime_type.hs -o mime_type && cat my_data.txt | time ./mime_type [1 of 1] Compiling Main ( mime_type.hs, mime_type.o ) Linking mime_type ... Start 4738 ./mime_type 76,73s user 0,10s system 99% cpu 1:17,12 total }}} But when force `dict` variable (`dict <- evaluate $ force dict'`), it runs fast in both cases (with and without "-O2"). Also this bug is reproductable with "-O", "-O1" options. Also this bug is reproductable with .Strict and .Lazy versions; and with Data.HashMap, .Strict and .Lazy Also this bug is reproductable with GHC 7.10.2 and GHC 8.0.1-rc2 (The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204). Data file my_data.txt is attached. It has simple structure: * N, number of key-value pairs * K, number of keys for searching * N key-value pairs * K kes You can generate it with this Ruby script: {{{ #!/usr/bin/env ruby lst = ('a'..'z').to_a N = 10000 K = 10000 File.open('my_data.txt', 'w') do |f| f.puts(N) f.puts(K) N.times do f.puts("#{lst.sample(3).join} #{lst.sample(5).join}") # f.puts("(\"#{lst.sample(3).join}\",\"#{lst.sample(5).join}\")") end K.times do f.puts("#{lst.sample(3).join}") end end }}} -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by malphunction): Update: if disable `-fenable-rewrite-rule` (i.e. `-O2 -fno-enable-rewrite- rule`), then compiled program runs fast! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Thanks for the excellent description and testcase! The problem here is that GHC is inlining the definition of `dict` with `-O1`. This means that your transformed program looks like, {{{#!hs dict' <- replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine) count <- length <$> catMaybes <$> replicateM q (flip M.lookup (M.fromList dict) <$> getLine) }}} Meaning that the `Map` is being reconstructed with every line that is read. You can easily discourage GHC from performing this inlining by placing a bang on the `dict'` binding, {{{#!hs !dict' <- M.fromList <$> replicateM n ((\(k:v:_) -> (k,v)) <$> words <$> getLine) }}} You were accomplishing this same end with your `evaluate $ deepseq`, but more "forcefully". Indeed it is a bit unfortunate that GHC decides that this inlining is beneficial, but I'm not entirely sure how it could know otherwise. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): For the record, I concluded this by comparing the output of `-ddump-simpl -dsuppress-all` from the testcases compiled with `-O1` with and without the `evaluate $ deepseq`. The delta in the Core produced in these two cases is extremely small, {{{#!diff @@ -297,6 +302,16 @@ case $wa @ (String, String) ww1 (lvl3 `cast` ...) ipv4 of _ { (# ipv6, ipv7 #) -> case $sfromList @ [Char] ipv7 of dict' { __DEFAULT -> + case seq# + @ (Map String String) + @ RealWorld + (case $fNFDataMap_$crnf + @ [Char] @ [Char] ($sforce3 `cast` ...) ($sforce3 `cast` ...) dict' + of _ { () -> + dict' + }) + ipv6 + of _ { (# ipv8, ipv9 #) -> case readEither6 @ Int (run @ Int lvl2 ipv5) of _ { [] -> case error @@ -306,8 +321,8 @@ readEither4 of wild3 { }; - : x1 ds5 -> - case ds5 of _ { + : x1 ds6 -> + case ds6 of _ { [] -> case x1 of _ { I# ww3 -> case $wa @@ -316,25 +331,25 @@ ((\ (eta :: State# RealWorld) -> case wantReadableHandle_1 @ String hGetLine4 stdin (hGetLine2 `cast` ...) eta - of _ { (# ipv8, ipv9 #) -> - (# ipv8, $slookup1 @ [Char] ipv9 dict' #) + of _ { (# ipv10, ipv11 #) -> + (# ipv10, $slookup1 @ [Char] ipv11 ipv9 #) }) `cast` ...) - ipv6 - of _ { (# ipv8, ipv9 #) -> + ipv8 + of _ { (# ipv10, ipv11 #) -> hPutStr2 stdout - (case $wlenAcc @ [Char] (catMaybes1 @ String ipv9) 0# + (case $wlenAcc @ [Char] (catMaybes1 @ String ipv11) 0# of ww4 { __DEFAULT -> case $wshowSignedInt 0# ww4 ([] @ Char) of _ { (# ww6, ww7 #) -> : @ Char ww6 ww7 } }) True - ipv8 + ipv10 } }; - : ipv8 ipv9 -> + : ipv10 ipv11 -> case error @ 'PtrRepLifted @ Int }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11677: Dramatic de-optimization with "-O", "-O1", "-O2" options -------------------------------------+------------------------------------- Reporter: malphunction | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: optimization | deoptimization Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): This is another in the long chain of tickets involving the "state hack" and `replicateM`. See #1168 for a list and #9388 for ideas. Sadly `-fno-state-hack` doesn't make any difference. Reason: the `fmap` for `IO` is begin called, and looks like {{{ fmapIO = \f a s. case a s of (r, s') -> (f r, s') }}} So if we have `replicateM (fmapIO (g (expensive x)) getLine)`, we'll inline `fmapIO` to {{{ replicateM (let f = g (expensive x) in \s. case getLine s of (r, s') -> (f r, s')) }}} Now if that `\s` which come from `fmapIO` is treated as one-shot, we'll inline `g (expensive x)` inside; disaster. The `-fno-state-hack` doesn't make any difference because Joachim arranged to persist one-shot-ness in interface files, so what matters is the setting in `GHC.Base` where `fmapIO` was defined. Anyway that's the reason, and we have many examples of it. The right solution is sketched in #9388 but it needs someone to pick up the cudgels. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11677#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC