
#8793: Improve GHC.Event.IntTable performance ------------------------------------+------------------------------------- Reporter: cdk | Owner: Type: task | Status: new Priority: normal | Milestone: 7.8.1 Component: libraries/base | Version: 7.6.3 Keywords: | Operating System: Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: None/Unknown Difficulty: Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | ------------------------------------+------------------------------------- The performance of `GHC.Event.IntTable` can be improved. I've managed to get some nice increases across the board. Benchmarking using `criterion` shows: function, % faster than current impl. `insert`: 4% `lookup`: 26% `update`: 11% `delete`: 5% There is one strange thing I noted. In `updateWith`, there is an inner loop that looks like this: {{{ data Bucket a = Empty | Bucket Int a (Bucket a) go _ Empty = (False, Nothing, Empty) go cont (Bucket key val next) | key == k = case f val of Nothing -> (True, Just val, cont next) Just v -> (False, Just val, cont (Bucket key v next)) | otherwise = go (\x -> cont (Bucket key val x)) next }}} which returns a tuple that is immediately consumed like so: {{{ (delete_occurred, old_val, new_bkt) <- go id ... when (isJust old_val) $ do <updateIntTable> when delete_occurred <decIntTableSize> return old_val }}} I expected that inlining the `<updateIntTable>` and `<decIntTableSize>` code blocks directly into `go` would result in better code than creating a tuple and then pattern matching on it afterwards. ie. {{{ go _ Empty = return Nothing go cont (Bucket key val next) | key == k = do case f val of Nothing -> <updateIntTable> (cont next) >> <decIntTableSize> Just v -> <updateIntTable> (cont (Bucket key v next) return (Just val) | otherwise = go (\x -> cont (Bucket key val x)) next }}} which has the exact same semantics. To my suprise, this code is almost 2x slower! The core generated in both cases is exactly what I'd expect; if anything, the second version seems tighter. I'm not sure why the first version is faster, but perhaps the original author, Bryan O'Sullivan, can shed some light as he used the tupled method in the first version. I'll attach my patch, `criterion`'s html output for the benchmarks as well as the benchmarking code, and the core for the oddity I discussed above. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8793 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler