Re: GHC Optimization Issue

I think what's happening here is that the rewrite rule SPECIALIZE is producing turns the call to swap16 into a call to the specialized verision before any inlining happens[1], and then the specialized version can't be inlined[2]. Manually writing out a copy of swap16 at type Word16 -> Word16, marking it INLINE and setting up a rewite rule for specialization gives you the more efficient code: changing the definition to swap16 to this: swap16 :: Bits a => a -> a swap16 v = (v `shiftR` 8) + ((v .&. 0xFF) `shiftL` 8) {- # RULES "swap16/Word16" forall (v :: Int16) . swap16 v = swap16Word16 v #-} {- # INLINE swap16Word16 #-} swap16Word16 :: Word16 -> Word16 swap16Word16 v = (v `shiftR` 8) + ((v .&. 0xFF) `shiftL` 8) gives this Core for foo: Main.foo :: GHC.Word.Word16 [GlobalId] [Str: DmdType] Main.foo = case GHC.Word.Word16 GHC.Word.$wshift3 __word 3855 (-8) of ww1 { __DEFAULT -> case GHC.Word.Word16 GHC.Word.$wshift3 __word 15 8 of ww { __DEFAULT -> GHC.Word.W16# (GHC.Prim.narrow16Word# (GHC.Prim.plusWord# ww1 ww)) } } and for swapWord16: Swap.swap16Word16 :: GHC.Word.Word16 -> GHC.Word.Word16 [GlobalId] [Arity 1 NoCafRefs Str: DmdType U(L)m] Swap.swap16Word16 = __inline_me (\ (v :: GHC.Word.Word16) -> GHC.Word.+2 (GHC.Word.$dmshiftR2 v (GHC.Base.I# 8)) (GHC.Word.shift2 (GHC.Word..&.2 v Swap.lit) (GHC.Base.I# 8))) Removing the INLINE directive for swap16Word16 leaves this code for foo, with a call to the worker function for swap16Word16: Main.foo :: GHC.Word.Word16 [GlobalId] [Str: DmdType] Main.foo = case GHC.Word.Word16 Swap.$wswap16Word16 __word 3855 of ww1 { __DEFAULT -> GHC.Word.W16# ww1 } and these definitions for swap16Word16: Swap.$wswap16Word16 :: GHC.Prim.Word# -> GHC.Prim.Word# [GlobalId] [Arity 1 NoCafRefs Str: DmdType L] Swap.$wswap16Word16 = \ (ww :: GHC.Prim.Word#) -> case GHC.Prim.Word# GHC.Word.$wshift3 ww (-8) of ww1 { __DEFAULT -> case GHC.Prim.Word# GHC.Word.$wshift3 (GHC.Prim.and# ww __word 255) 8 of ww11 { __DEFAULT -> GHC.Prim.narrow16Word# (GHC.Prim.plusWord# ww1 ww11) } } Swap.swap16Word16 :: GHC.Word.Word16 -> GHC.Word.Word16 [GlobalId] [Arity 1 Worker Swap.$wswap16Word16 NoCafRefs Str: DmdType U(L)m] Swap.swap16Word16 = __inline_me (\ (w :: GHC.Word.Word16) -> case GHC.Word.Word16 w of w1 { GHC.Word.W16# ww -> case GHC.Word.Word16 Swap.$wswap16Word16 ww of ww1 { __DEFAULT -> GHC.Word.W16# ww1 } }) It looks like marking the function INLINE supresses the worker/wrapper split, and when there is not INLINE and the function get split, only the wrapper is eligible for inlining. Brandon Moore [1] (section 10.7.2 of the 6.4.1 User's guide says "In the earlier phases of compilation, GHC inlines nothing that appears on the LHS of a rule, because once you have substituted for something you can't match against it (given the simple minded matching).") [2] Maybe because the worker isn't annotation with __inline_me in the Core? Maybe from the aside in 7.10.5 about and INLINE pragma preventing any inlining in the RHS of the thing marked inline?
participants (1)
-
Brandon Moore