Puzzled about inlining and specialise inline under ghc -O2

Dear Haskell-Cafe, I'm computing a histogram of a bunch of symbols with up to 8 bits of information each, stored in a unboxed vector of Word8. The histogram is represented as an unboxed vector of Int with size 2^bits. I compute the histogram by folding an increment function. The problem: depending on what types and what annotations I give to the increment and histogram function (see below), the GC gets through a different amount of memory. I'm using GHC 7.0.3 and -O2. I'd like to better understand how and why the optimisation does/doesn't kick in. Here are the functions with the most generic types I can think of: import qualified Data.Vector.Unboxed as UV import qualified Data.Vector.Unboxed.Mutable as UMV import Control.Monad.Primitive (PrimMonad, PrimState) increment :: (PrimMonad m, UMV.Unbox a, Num a, Integral b) => UMV.MVector (PrimState m) a -> b -> m (UMV.MVector (PrimState m) a) increment v x = do n <- UMV.read v (fromIntegral x) UMV.write v (fromIntegral x) (n+1) return v histogram :: (Integral a, UMV.Unbox a) => Int -> UV.Vector a -> UV.Vector Int histogram bitsPerSym v = runST $ do a <- UMV.replicate (2^bitsPerSym) (0::Int) a' <- UV.foldM' increment a v UV.unsafeFreeze a' Running my test load, I get: total alloc = 33,206,568 bytes Looking at the core, ghc is not specialising the functions, even if I tell it to inline them. So let's brutally change the types to be as specific as I need for my application: increment :: UMV.MVector s Int -> Word8 -> ST s (UMV.MVector s Int) histogram :: Int -> UV.Vector Word8 -> UV.Vector Int result: total alloc = 19,581,152 bytes and if I put INLINE pragmas for both functions: 16,952,512 bytes I should be able to achieve the same effect with SPECIALISE INLINE pragmas, right? Let's try that: {-# SPECIALISE INLINE increment :: UMV.MVector s Int -> Word8 -> ST s (UMV.MVector s Int) #-} {-# SPECIALISE INLINE histogram :: Int -> UV.Vector Word8 -> UV.Vector Int #-} result: 33,139,856 bytes (GHC can't figure out application of the first rule, giving: Warning: RULE left-hand side too complicated to desugar) So unfortunately my most generic form won't work here, I need to specialise increment to be in ST (which sucks, because I want it to work for both IO and ST): increment :: (UMV.Unbox a, Num a, Integral b) => UMV.MVector s a -> b -> ST s (UMV.MVector s a) {-# SPECIALISE INLINE increment :: UMV.MVector s Int -> Word8 -> ST s (UMV.MVector s Int) #-} result: 17,016,192 bytes This is very close to the most specific function instantiations and INLINE, but: - I've lost being generic between ST and IO - it's still a little bigger than the specific instances + INLINE So my questions are: what is going on? Can I have genericity between ST and IO while keeping the low GC usage? How come SPECIALISE INLINE does not give the same result as specific instances + INLINE? Obviously, for this example, I don't really *need* increment to work inside IO, since I'm using runST... but I want to understand what is going on. Profiling Haskell performance and memory usage has always been difficult for me. Much thanks in advance, Rafal Kolanski.
participants (1)
-
Rafal Kolanski