
#16040: Unboxing-Related Performance Issue with Polymorphic Functions -------------------------------------+------------------------------------- Reporter: _recursion | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Interesting. For `test1` (the fast one) we get {{{ Rec { $wgo_r2xK :: GHC.Prim.Int# -> GHC.Prim.Int# $wgo_r2xK = \ (ww_s2w4 :: GHC.Prim.Int#) -> case GHC.Prim.># ww_s2w4 0# of { __DEFAULT -> ww_s2w4; 1# -> $wgo_r2xK (GHC.Prim.-# ww_s2w4 1#) } end Rec } T16040.$wtest1 :: GHC.Prim.Int# -> GHC.Prim.Int# T16040.$wtest1 = \ (ww_s2wd :: GHC.Prim.Int#) -> $wgo_r2xK ww_s2wd test1 :: Int -> Int test1 = \ (w_s2wa :: Int) -> case w_s2wa of { GHC.Types.I# ww1_s2wd -> case T16040.$wtest1 ww1_s2wd of ww2_s2wh { __DEFAULT -> GHC.Types.I# ww2_s2wh } } }}} Notice that loop `$wgo` does not allocation at all. For `test2` we get {{{ Rec { $wgo1_r2xL :: GHC.Prim.Int# -> (# Int #) $wgo1_r2xL = \ (ww_s2wm :: GHC.Prim.Int#) -> case GHC.Prim.># ww_s2wm 0# of { __DEFAULT -> (# GHC.Types.I# ww_s2wm #); 1# -> $wgo1_r2xL (GHC.Prim.-# ww_s2wm 1#) } end Rec } T16040.$wtest2 :: GHC.Prim.Int# -> Int T16040.$wtest2 = \ (ww_s2wv :: GHC.Prim.Int#) -> case $wgo1_r2xL ww_s2wv of { (# ww2_s2wy #) -> ww2_s2wy } test2 :: Int -> Int test2 = \ (w_s2ws :: Int) -> case w_s2ws of { GHC.Types.I# ww1_s2wv -> T16040.$wtest2 ww1_s2wv } }}} Here the `$wgo1` loop does no allocation on its hot path, but does allocate an `I# ww` box as it returns. I think that `$wgo1` is doing a heap-check on ''every'' iteration, at the start of the function. It would be better to do the check only on the ''last'' iteration, in the `DEFAULT` branch. I bet these redundant heap checks are what is taking the time. We have long-standing tickets about this; see the summary in comment:2 of Trac #14791. I would love someone to work on this. To catch cases like this should not be very hard. By "like this" I mean * A primitive case * Which has no allocation "upstream" (i.e. before it) * And at least one alternative does no allocation. ---------- Matthew is right that Nested CPR would also fix this. The trouble is that the recursive `go` from `test2` returns an `X Int`, whose representation has two levels of box; eg `X (I# 3#)`. The current CPR transform can't optimise that. Nested-CPR can, and would make ''neither'' branch of `$wgo2` allocate Tantalizingly, we have a nested-cpr patch nearly ready to go. (See Trac #1600.) But it needs someone to pay it some sustained attention. ---------- I don't see an obvious band-aid. This is a long-standing issue, not a regression. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16040#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler