[GHC] #14865: GHC Defeats Manual Worker Wrapper with Unboxed Sum

#14865: GHC Defeats Manual Worker Wrapper with Unboxed Sum -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.5 Keywords: UnboxedSums | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Here's the code in question: {{{ {-# LANGUAGE BangPatterns #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE MagicHash #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE UnboxedSums #-} {-# LANGUAGE UnboxedTuples #-} {-# OPTIONS_GHC -O2 #-} module Byte.Array.Window ( findByte , boxMaybeInt ) where import Data.Primitive (ByteArray) import Data.Word (Word8) import GHC.Types (RuntimeRep,TYPE) import GHC.Int (Int(I#)) import GHC.Exts (Int#) import qualified Data.Primitive as PM type Maybe# (a :: TYPE (r :: RuntimeRep)) = (# (# #) | a #) boxMaybeInt :: Maybe# Int# -> Maybe Int boxMaybeInt = \case (# | a #) -> Just (I# a) (# (# #) | #) -> Nothing unboxInt :: Int -> Int# unboxInt (I# i) = i -- | Finds the first occurrence of the given byte. -- TODO: optimize this to search through a whole -- Word64 at a time if the bytearray is pinned. findByte :: Int -> Int -> Word8 -> ByteArray -> Maybe Int findByte !off !len !w !arr = boxMaybeInt (go off) where go :: Int -> Maybe# Int# go !ix = if ix < len then if PM.indexByteArray arr ix == w then (# | unboxInt ix #) else go (ix + 1) else (# (# #) | #) }}} When compiled with GHC 8.5 with `-ddump-simpl -dsuppress-all`, here is the relevant part of the resulting Core: {{{ -- RHS size: {terms: 33, types: 13, coercions: 0, joins: 1/1} $wfindByte $wfindByte = \ ww_s38C ww1_s38G ww2_s38K ww3_s38O -> joinrec { $wgo_s38v $wgo_s38v ww4_s38t = case <# ww4_s38t ww1_s38G of { __DEFAULT -> Nothing; 1# -> case indexWord8Array# ww3_s38O ww4_s38t of wild_a36w { __DEFAULT -> case eqWord# wild_a36w ww2_s38K of { __DEFAULT -> jump $wgo_s38v (+# ww4_s38t 1#); 1# -> Just (I# ww4_s38t) } } }; } in jump $wgo_s38v ww_s38C -- RHS size: {terms: 21, types: 12, coercions: 0, joins: 0/0} findByte findByte = \ w_s38w w1_s38x w2_s38y w3_s38z -> case w_s38w of { I# ww1_s38C -> case w1_s38x of { I# ww3_s38G -> case w2_s38y of { W8# ww5_s38K -> case w3_s38z of { ByteArray ww7_s38O -> $wfindByte ww1_s38C ww3_s38G ww5_s38K ww7_s38O } } } } }}} I expected that the tail recursive `go` helpful function from my original code would still be a function that returns an unboxed sum when optimized and turned into Core. However, it isn't. The call to `boxMaybeInt` gets pushed into `go`. This means that when `findByte` is called and the result cased on, an allocation is going to happen. I think it would be preferable for `boxMaybeInt` to not get pushed into the worker, since `boxMaybeInt (go off)` could be inlined and the allocation of `Maybe` could be prevented (assuming that it was cased on right afterward). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14865 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14865: GHC Defeats Manual Worker Wrapper with Unboxed Sum -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: UnboxedSums Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): This is a classic difficulty. Given, say {{{ f x = g (...x...) }}} perhaps we could split `f` into two parts {{{ f x = g (f' x) f' x = ...x... }}} and then perhaps we can inline the (now small) `f`, perhaps to good effect. This is what you want to do here: you want {{{ findByte !off !len !w !arr = boxMaybeInt (go off) }}} to be inlined at every call site. It's be simple to arrange, by giving `len`, `w` and `arr` as extra arguments to `go`. But GHC just isn't clever enough to do that, unless you tell it to do so by hand, with an INLINE pragma. I don't really know how to fix this. But it's a good example. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14865#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14865: GHC Defeats Manual Worker Wrapper with Unboxed Sum -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: UnboxedSums Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by andrewthad): Your suggested change produces the desired behavior: {{{ findByte :: Int -> Int -> Word8 -> ByteArray -> Maybe Int findByte !off !len0 !w0 !arr0 = boxMaybeInt (go off (len0 + off) w0 arr0) where go :: Int -> Int -> Word8 -> ByteArray -> Maybe# Int# go !ix !end !w !arr = if ix < end then if PM.indexByteArray arr ix == w then (# | unboxInt ix #) else go (ix + 1) end w arr else (# (# #) | #) }}} Results in the following Core: {{{ Rec { -- RHS size: {terms: 34, types: 21, coercions: 0, joins: 0/0} $wgo $wgo = \ ww_s3zd ww1_s3zh ww2_s3zl ww3_s3zp -> case tagToEnum# (<# ww_s3zd ww1_s3zh) of { False -> (#_|#) (##); True -> case indexWord8Array# ww3_s3zp ww_s3zd of wild1_a3vI { __DEFAULT -> case tagToEnum# (eqWord# wild1_a3vI ww2_s3zl) of { False -> $wgo (+# ww_s3zd 1#) ww1_s3zh ww2_s3zl ww3_s3zp; True -> (#|_#) ww_s3zd } } } end Rec } -- RHS size: {terms: 18, types: 11, coercions: 0, joins: 0/0} $wfindByte $wfindByte = \ ww_s3zy ww1_s3zC ww2_s3zG ww3_s3zK -> case $wgo ww_s3zy (+# ww1_s3zC ww_s3zy) ww2_s3zG ww3_s3zK of { (#_|#) ds_d3ui -> Nothing; (#|_#) a_a1Dk -> Just (I# a_a1Dk) } -- RHS size: {terms: 21, types: 12, coercions: 0, joins: 0/0} findByte findByte = \ w_s3zs w1_s3zt w2_s3zu w3_s3zv -> case w_s3zs of { I# ww1_s3zy -> case w1_s3zt of { I# ww3_s3zC -> case w2_s3zu of { W8# ww5_s3zG -> case w3_s3zv of { ByteArray ww7_s3zK -> $wfindByte ww1_s3zy ww3_s3zC ww5_s3zG ww7_s3zK } } } } }}} I don't much mind having to do this by hand. It's just nice to know that there's a reliable way to coax GHC into doing it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14865#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14865: GHC Defeats Manual Worker Wrapper with Unboxed Sum -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: UnboxedSums Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by andrewthad): It's weird that trying to manually apply the static argument transformation (which is probably not a performance win here anyway) gets in the way of this. For example, the following code produces the same core I originally was getting: {{{ findByte :: Int -> Int -> Word8 -> ByteArray -> Maybe Int findByte !off !len0 !w0 !arr0 = boxMaybeInt (goA off (len0 + off) w0 arr0) goA :: Int -> Int -> Word8 -> ByteArray -> Maybe# Int# goA !ix0 !end !w !arr = goB ix0 where goB :: Int -> Maybe# Int# goB !ix = if ix < end then if PM.indexByteArray arr ix == w then (# | unboxInt ix #) else goB (ix + 1) else (# (# #) | #) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14865#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14865: GHC Defeats Manual Worker Wrapper with Unboxed Sum -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: UnboxedSums Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by maoe): * cc: maoe (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14865#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC