
#9203: Perf regression in 7.8.2 relative to 7.6.3, possibly related to Typeable --------------------------------------------+------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Compiler | Version: 7.8.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by simonmar): * cc: tibbe (added) Comment: I've narrowed this down further. It seems to be something to do with `HashMap`. With the following source file: {{{ {-# LANGUAGE RankNTypes, GADTs, BangPatterns, DeriveDataTypeable, StandaloneDeriving #-} {-# OPTIONS_GHC -fno-warn-unused-do-bind -fno-warn-type-defaults #-} module Bench where import Prelude hiding (mapM) import Control.Concurrent import Data.Hashable import Data.Time.Clock import Data.Traversable import Data.Typeable import System.Environment import Text.Printf import qualified Data.HashMap.Strict as HashMap import Data.HashMap.Strict (HashMap) import Unsafe.Coerce data TestReq a where ReqInt :: {-# UNPACK #-} !Int -> TestReq Int deriving Typeable deriving instance Eq (TestReq a) deriving instance Show (TestReq a) instance Hashable (TestReq a) where hashWithSalt salt (ReqInt i) = hashWithSalt salt (0::Int, i) main = do [n] <- fmap (fmap read) getArgs t0 <- getCurrentTime let f 0 !cache = cache f !n !cache = f (n-1) (dcinsert (ReqInt n) 0 cache) -- let !cache = f n dcempty let m = dclookup (ReqInt (n `div` 2)) cache print m t1 <- getCurrentTime printf "insert: %.2fs\n" (realToFrac (t1 `diffUTCTime` t0) :: Double) t0 <- getCurrentTime let f 0 !m = m f !n !m = case dclookup (ReqInt n) cache of Nothing -> f (n-1) m Just _ -> f (n-1) (m+1) print (f n 0) t1 <- getCurrentTime printf "lookup: %.2fs\n" (realToFrac (t1 `diffUTCTime` t0) :: Double) newtype DataCache = DataCache (HashMap TypeRep SubCache) -- | The implementation is a two-level map: the outer level maps the -- types of requests to 'SubCache', which maps actual requests to their -- results. So each 'SubCache' contains requests of the same type. -- This works well because we only have to store the dictionaries for -- 'Hashable' and 'Eq' once per request type. data SubCache = forall req a . (Hashable (req a), Eq (req a)) => SubCache ! (HashMap (req a) a) -- NB. the inner HashMap is strict, to avoid building up -- a chain of thunks during repeated insertions. -- | A new, empty 'DataCache'. dcempty :: DataCache dcempty = DataCache HashMap.empty -- | Inserts a request-result pair into the 'DataCache'. dcinsert :: (Hashable (r a), Typeable (r a), Eq (r a)) => r a -- ^ Request -> a -- ^ Result -> DataCache -> DataCache dcinsert req result (DataCache m) = DataCache $ HashMap.insertWith fn (typeOf req) (SubCache (HashMap.singleton req result)) m where fn (SubCache new) (SubCache old) = SubCache (unsafeCoerce new `HashMap.union` old) -- | Looks up the cached result of a request. dclookup :: Typeable (r a) => r a -- ^ Request -> DataCache -> Maybe a dclookup req (DataCache m) = case HashMap.lookup (typeOf req) m of Nothing -> Nothing Just (SubCache sc) -> unsafeCoerce (HashMap.lookup (unsafeCoerce req) sc) }}} GHC 7.6.3: {{{ Just 0 insert: 0.73s 500000 lookup: 0.23s }}} GHC 7.8.2: {{{ Just 0 insert: 1.01s 500000 lookup: 0.53s }}} `insert` is a bit slower, but `lookup` is more than twice as slow with 7.8.2. Looking at the Core, at lookup in particular, the code in 7.8.2 looks reasonable. But in both cases we end up calling `Data.HashMap.Base.lookup` for the inner lookup, and I'm guessing that is where the inefficiency lies. @tibbe, want to take a look? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9203#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler