[GHC] #13331: Worker/wrapper can lead to sharing failure

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- `nofib` turned up a serious performance bug in the implementation of `insert` in `containers-0.5.10.1`. The function was defined thus: {{{#!hs origInsert :: Ord k => k -> a -> Map k a -> Map k a origInsert = go where go :: Ord k => k -> a -> Map k a -> Map k a go !kx x Tip = singleton kx x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go kx x r EQ | kx `ptrEq` ky && x `ptrEq` y -> t | otherwise -> Bin sz kx x l r {-# INLINABLE origInsert #-} }}} When this specializes to `Int` keys (or any other "unboxable" ones, including tuples), worker/wrapper botches the job: {{{ Rec { -- RHS size: {terms: 102, types: 65, coercions: 0} $w$sgo :: forall a_a7M6. Int# -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6 $w$sgo = \ (@ a_a7M6) (ww_s8oI :: Int#) (w_s8oE :: a_a7M6) (w1_s8oF :: Map Int a_a7M6) -> let { kx_X7KQ :: Int kx_X7KQ = I# ww_s8oI } in case w1_s8oF of wild_Xg { [...] origInsertInt_$sgo :: forall a_a7M6. Int -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6 origInsertInt_$sgo = \ (@ a_a7M6) (w_s8oD :: Int) (w1_s8oE :: a_a7M6) (w2_s8oF :: Map Int a_a7M6) -> case w_s8oD of _ { I# ww1_s8oI -> $w$sgo ww1_s8oI w1_s8oE w2_s8oF } }}} The wrapper opens the box, throws it away, and passes the contents to the worker, which immediately builds a ''new'' box with exactly the same contents. This prevents the pointer equality tests from succeeding for these types, and it also turns out to cause quite a lot of extra allocation for some types (leading to the severe nofib regression). One could reasonably argue that the code above is a bit complicated, and that GHC could be forgiven for failing to realize that the box should be saved. Unfortunately, a straightforward change that would seem to make this clear does not in fact convince GHC: {{{#!hs myInsert :: Ord k => k -> a -> Map k a -> Map k a myInsert kx0 = go kx0 where go !kx x Tip = singleton kx0 x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go kx x r EQ | kx0 `ptrEq` ky && x `ptrEq` y -> t | otherwise -> Bin sz kx0 x l r {-# INLINABLE myInsert #-} }}} does exactly the same thing. The only ''simple'' way I found to avoid that is to remove the bang patterns, which really ''shouldn't'' work, but does. This, however, is prohibited by the required semantics—we must be strict in the key even if comparison is not. The only fix I've found thus far is truly disgusting, and seems to work at least partly by mistake: {{{#!hs insert :: Ord k => k -> a -> Map k a -> Map k a insert kx0 = go kx0 kx0 where go :: Ord k => k -> k -> a -> Map k a -> Map k a go orig !kx x Tip = singleton (lazy orig) x go orig !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go orig kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go orig kx x r EQ | x `ptrEq` y && (lazy orig `seq` (orig `ptrEq` ky)) -> t | otherwise -> Bin sz (lazy orig) x l r {-# INLINABLE insert #-} }}} We would also like to be able to experiment with an implementation that uses CPS (recursive join points today!) rather than pointer equality tests for the internal nodes, leaving pointer equality to the leaves. But I have not found any way whatsoever to avoid this W/W problem in that version. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by dfeuer: Old description:
`nofib` turned up a serious performance bug in the implementation of `insert` in `containers-0.5.10.1`. The function was defined thus:
{{{#!hs origInsert :: Ord k => k -> a -> Map k a -> Map k a origInsert = go where go :: Ord k => k -> a -> Map k a -> Map k a go !kx x Tip = singleton kx x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go kx x r EQ | kx `ptrEq` ky && x `ptrEq` y -> t | otherwise -> Bin sz kx x l r
{-# INLINABLE origInsert #-} }}}
When this specializes to `Int` keys (or any other "unboxable" ones, including tuples), worker/wrapper botches the job:
{{{ Rec { -- RHS size: {terms: 102, types: 65, coercions: 0} $w$sgo :: forall a_a7M6. Int# -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6 $w$sgo = \ (@ a_a7M6) (ww_s8oI :: Int#) (w_s8oE :: a_a7M6) (w1_s8oF :: Map Int a_a7M6) -> let { kx_X7KQ :: Int kx_X7KQ = I# ww_s8oI } in case w1_s8oF of wild_Xg {
[...]
origInsertInt_$sgo :: forall a_a7M6. Int -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6 origInsertInt_$sgo = \ (@ a_a7M6) (w_s8oD :: Int) (w1_s8oE :: a_a7M6) (w2_s8oF :: Map Int a_a7M6) -> case w_s8oD of _ { I# ww1_s8oI -> $w$sgo ww1_s8oI w1_s8oE w2_s8oF } }}}
The wrapper opens the box, throws it away, and passes the contents to the worker, which immediately builds a ''new'' box with exactly the same contents. This prevents the pointer equality tests from succeeding for these types, and it also turns out to cause quite a lot of extra allocation for some types (leading to the severe nofib regression).
One could reasonably argue that the code above is a bit complicated, and that GHC could be forgiven for failing to realize that the box should be saved. Unfortunately, a straightforward change that would seem to make this clear does not in fact convince GHC:
{{{#!hs myInsert :: Ord k => k -> a -> Map k a -> Map k a myInsert kx0 = go kx0 where go !kx x Tip = singleton kx0 x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go kx x r EQ | kx0 `ptrEq` ky && x `ptrEq` y -> t | otherwise -> Bin sz kx0 x l r
{-# INLINABLE myInsert #-} }}}
does exactly the same thing. The only ''simple'' way I found to avoid that is to remove the bang patterns, which really ''shouldn't'' work, but does. This, however, is prohibited by the required semantics—we must be strict in the key even if comparison is not. The only fix I've found thus far is truly disgusting, and seems to work at least partly by mistake:
{{{#!hs insert :: Ord k => k -> a -> Map k a -> Map k a insert kx0 = go kx0 kx0 where go :: Ord k => k -> k -> a -> Map k a -> Map k a go orig !kx x Tip = singleton (lazy orig) x go orig !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go orig kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go orig kx x r EQ | x `ptrEq` y && (lazy orig `seq` (orig `ptrEq` ky)) -> t | otherwise -> Bin sz (lazy orig) x l r
{-# INLINABLE insert #-} }}}
We would also like to be able to experiment with an implementation that uses CPS (recursive join points today!) rather than pointer equality tests for the internal nodes, leaving pointer equality to the leaves. But I have not found any way whatsoever to avoid this W/W problem in that version.
New description: `nofib` turned up a serious performance bug in the implementation of `insert` in `containers-0.5.10.1`. The function was defined thus: {{{#!hs origInsert :: Ord k => k -> a -> Map k a -> Map k a origInsert = go where go :: Ord k => k -> a -> Map k a -> Map k a go !kx x Tip = singleton kx x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go kx x r EQ | kx `ptrEq` ky && x `ptrEq` y -> t | otherwise -> Bin sz kx x l r {-# INLINABLE origInsert #-} }}} When this specializes to `Int` keys (or any other "unboxable" ones, including tuples), worker/wrapper botches the job: {{{ Rec { -- RHS size: {terms: 102, types: 65, coercions: 0} $w$sgo :: forall a_a7M6. Int# -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6 $w$sgo = \ (@ a_a7M6) (ww_s8oI :: Int#) (w_s8oE :: a_a7M6) (w1_s8oF :: Map Int a_a7M6) -> let { kx_X7KQ :: Int kx_X7KQ = I# ww_s8oI } in case w1_s8oF of wild_Xg { [...] origInsertInt_$sgo :: forall a_a7M6. Int -> a_a7M6 -> Map Int a_a7M6 -> Map Int a_a7M6 origInsertInt_$sgo = \ (@ a_a7M6) (w_s8oD :: Int) (w1_s8oE :: a_a7M6) (w2_s8oF :: Map Int a_a7M6) -> case w_s8oD of _ { I# ww1_s8oI -> $w$sgo ww1_s8oI w1_s8oE w2_s8oF } }}} The wrapper opens the box, throws it away, and passes the contents to the worker, which immediately builds a ''new'' box with exactly the same contents. This prevents the pointer equality tests from succeeding for these types, and it also turns out to cause quite a lot of extra allocation for some types (leading to the severe nofib regression). One could reasonably argue that the code above is a bit complicated, and that GHC could be forgiven for failing to realize that the box should be saved. Unfortunately, a straightforward change that would seem to make this clear does not in fact convince GHC: {{{#!hs myInsert :: Ord k => k -> a -> Map k a -> Map k a myInsert kx0 = go kx0 where go !kx x Tip = singleton kx0 x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go kx x r EQ | kx0 `ptrEq` ky && x `ptrEq` y -> t | otherwise -> Bin sz kx0 x l r {-# INLINABLE myInsert #-} }}} does exactly the same thing. The only ''simple'' way I found to avoid that is to remove the bang patterns, which really ''shouldn't'' work, but does. This, however, is prohibited by the desired semantics—I believe we want to be strict in the key even if comparison is not. In any case, that really shouldn't be causing trouble and it is. The only fix I've found thus far is truly disgusting, and seems to work at least partly by mistake: {{{#!hs insert :: Ord k => k -> a -> Map k a -> Map k a insert kx0 = go kx0 kx0 where go :: Ord k => k -> k -> a -> Map k a -> Map k a go orig !kx x Tip = singleton (lazy orig) x go orig !kx x t@(Bin sz ky y l r) = case compare kx ky of LT | l' `ptrEq` l -> t | otherwise -> balanceL ky y l' r where !l' = go orig kx x l GT | r' `ptrEq` r -> t | otherwise -> balanceR ky y l r' where !r' = go orig kx x r EQ | x `ptrEq` y && (lazy orig `seq` (orig `ptrEq` ky)) -> t | otherwise -> Bin sz (lazy orig) x l r {-# INLINABLE insert #-} }}} We would also like to be able to experiment with an implementation that uses CPS (recursive join points today!) rather than pointer equality tests for the internal nodes, leaving pointer equality to the leaves. But I have not found any way whatsoever to avoid this W/W problem in that version. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Can you make a repro case? What is this `ptrEq` thing? It looks very smelly to me. Surely at least it should be called `unsafePtrEq`. What does it do? Reboxing in worker/wrapper is very difficult to avoid. It was described in our original strictness analysis paper, and also in the paper about constructor specialisation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): `ptrEq` is indeed a thin wrapper around `reallyUnsafePtrEquality#`: {{{#!hs ptrEq :: a -> a -> Bool ptrEq x y = tagToEnum# (reallyUnsafePtrEquality# x y) }}} If it's really hard to work this out automatically, I wonder if it could be made a little easier with a pragma. I'd be satisfied if I could write something like {{{#!hs go :: Ord k => {-# NOUNPACK #-} k -> k -> a -> Map k a -> Map k a }}} to indicate that GHC should not unbox the first argument. Would that make it easier to avoid the problem? I'd hate to leave something as fragile- looking as my workaround hack in `containers` forever. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I'd still like to check a repro case. Without it I'm guessing wildly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "Repro13331.hs" added. Repro case -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Actually, I just discovered that we don't even need pointer equality to reproduce! {{{#!hs naiveInsert1 :: Ord k => k -> a -> Map k a -> Map k a naiveInsert1 kx0 = go kx0 where --go :: Ord k => k -> a -> Map k a -> Map k a go !kx x Tip = singleton kx0 x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT -> balanceL ky y l' r where !l' = go kx x l GT -> balanceR ky y l r' where !r' = go kx x r EQ -> Bin sz kx0 x l r {-# INLINABLE naiveInsert1 #-} }}} and {{{#!hs naiveInsert2 :: Ord k => k -> a -> Map k a -> Map k a naiveInsert2 = go where go :: Ord k => k -> a -> Map k a -> Map k a go !kx x Tip = singleton kx x go !kx x t@(Bin sz ky y l r) = case compare kx ky of LT -> balanceL ky y l' r where !l' = go kx x l GT -> balanceR ky y l r' where !r' = go kx x r EQ -> Bin sz kx x l r {-# INLINABLE naiveInsert2 #-} }}} both demonstrate the same problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "Minimal13331.hs" added. A smaller reproduction -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): I suspect the solution will be to expand the "box-demand analysis" described in the demand analysis paper ever so slightly. Instead of choosing between just passing the box/tuple and passing (some of) its contents, I think we want to add a third option: pass the box ''and'' (some of) its contents. This is a good option to have when we only need the box sometimes, and indeed is even important in cases where we ''always'' (eventually) need the box: {{{#!hs f :: Int -> [Int] -> [Int] f !n [] = [n] f n (x : xs) | x > 1000 = n : f x xs | otherwise = x : f n xs }}} This simplifies to {{{ Rec { -- RHS size: {terms: 41, types: 48, coercions: 0, joins: 0/0} $wf :: Int# -> [Int] -> (# Int, [Int] #) $wf = \ (ww_s2b5 :: Int#) (w_s2b2 :: [Int]) -> case w_s2b2 of { [] -> (# I# ww_s2b5, [] #); : ipv_s29V ipv1_s29W -> case ipv_s29V of wild1_a2a8 { I# x_a2aa -> case tagToEnum# (># x_a2aa 1000#) of { False -> (# wild1_a2a8, case $wf ww_s2b5 ipv1_s29W of { (# ww2_s2bb, ww3_s2bc #) -> : ww2_s2bb ww3_s2bc } #); True -> (# I# ww_s2b5, case $wf x_a2aa ipv1_s29W of { (# ww2_s2bb, ww3_s2bc #) -> : ww2_s2bb ww3_s2bc } #) } } } end Rec } -- RHS size: {terms: 13, types: 15, coercions: 0, joins: 0/0} f :: Int -> [Int] -> [Int] f = \ (w_s2b1 :: Int) (w1_s2b2 :: [Int]) -> case w_s2b1 of { I# ww1_s2b5 -> case $wf ww1_s2b5 w1_s2b2 of { (# ww3_s2bb, ww4_s2bc #) -> : ww3_s2bb ww4_s2bc } } }}} Whoops! Even this keeps losing boxes it ends up needing again. Indeed, this example demonstrates that we can lose a bunch of boxes when we will ''always'' end up needing to reconstruct them. A better transformation would give `$wf` a type like `Int# -> Int -> [Int] -> (# Int, [Int] #)`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13331: Worker/wrapper can lead to sharing failure -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Ah, I see better now. My tiny list example can probably be fixed relatively easily using a "just pass both" approach, but my real-life example involves some knowledge the compiler doesn't have: that we tend to have the box already and/or we end up needing the box, and that the cost of losing sharing here is higher than it may look. That application dependence seems to argue in favor of a pragma-based solution. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13331#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC