 
            On February 17, 2009 19:24:44 Max Bolingbroke wrote:
2009/2/17 Tyson Whitehead
: (compiled with ghc 6.10 with options -O2 -ddump-simpl)
That should have been -ddump-stranal instead of -ddump-simpl.
I was wondering why lvl_s1mF is not being inlined into a_s1Gv in the core at the bottom of this email as that is the only place it is ever referenced.
The relevant GHC code is SimplUtils.preInlineUnconditionally. It looks like it dosen't get inlined for two reasons: 1) It's not a manifest lambda (it's an application) so inlining inside another lambda would change the number of times the FVs of lvl_s1mF might occur
I have to confess my ignorance here as my google fu failed and so I still don't know what a manifest lambda is (other than not a application). : )
2) I'm not sure if the use-context is considered interesting by GHC because the application of the function might be hidden by the cast. Not sure about this one.
I was wondering about that, which is why I didn't remove all the cast noise.
So it looks like the problem stems from digit_s1l3 having arity 1 rather than arity 3. You could try and force it into a higher arity somehow, but I can't say exactly how you might do that without seeing the rest of the Core (and in particular the part under the first lambda in the definition of digit).
The thing is that the inner lambdas come from inlining that StateT monad transformers in a StateT q (StateT Int (ErrorT String Identity)) monad (i.e., the first one is the q state -- which works out to an Int -- and the second is the Int state). I guess I could explicitly pass them around, but that would seem to defeat the purpose of having StateT. The actual routines under this implement part of a FSM for (hopefully) efficiently extracting an Int from a ByteString (or a uvector UArr -- source of the Step data type). The relevant part of the actual code, which is a bit hacked up with ! patterns from my attempts to get better code, is as follows. type ParseInt q a = StateT q (StateT Int (ErrorT String Identity)) a next :: q -> Step q Word8 next i | i==n = Done | otherwise = Yield (bs `BS.unsafeIndex` i) (i+1) wrap :: Monad m => (Word8 -> StateT q m a) -> StateT q m a -> StateT q m a wrap yield (done::StateT q m a) = loop where loop :: StateT q m a loop = do q <- get case next q of Yield x q' -> put q' >> yield x Skip q' -> put q' >> loop Done -> done s2 :: ParseInt q Int s2 = wrap yield done where yield :: Word8 -> StateT q (StateT Int (ErrorT String Identity)) Int yield x | x==48 = digit 0 | x==49 = digit 1 | x==50 = digit 2 | x==51 = digit 3 | x==52 = digit 4 | x==53 = digit 5 | x==54 = digit 6 | x==55 = digit 7 | x==56 = digit 8 | x==57 = digit 9 | otherwise = do !y <- lift get return y where digit :: Int -> ParseInt q Int digit !x = do !y <- lift get ( if y <= (maxBound-9)`quot`10 || y <= (maxBound-x)`div`10 then let !y' = y*10+x in (lift $ put y')
s2 else throwError "integer overflow" ) done :: ParseInt q Int done = do !y <- lift get return y
I just finished adding the Parse q Int type to help with email line wrapping. As I alluded to in my original email, if I don't have the Int overflow check in digit, it is not chosen as the loop breaker, all the StateT stuff is compiled away, and you get a really nice efficient assembler loop (which is important because the final FSM has to actually chew through GBs of data). The part of the code under the first lambda in digit is as follows (I didn't keep the original dump, so the uniques have changed here). It's the second part of the Int overflow bounds check (i.e., y <= (maxBound-x)`div`10), and, indeed, something you don't want to compute unless the easy check fails. digit_s1lk = \ (x_aqR [ALWAYS Just U(L)] :: GHC.Types.Int) -> case x_aqR of x_XsQ [ALWAYS Just A] { GHC.Types.I# ipv_s1bD [ALWAYS Just L] -> let { lvl_s1my [ALWAYS Just D(T)] :: GHC.Types.Int [Str: DmdType] lvl_s1my = case GHC.Prim.-# 9223372036854775807 ipv_s1bD of wild2_a1xi [ALWAYS Just L] { __DEFAULT -> case GHC.Base.divInt# wild2_a1xi 10 of wild21_a1xj [ALWAYS Just L] { __DEFAULT -> GHC.Types.I# wild21_a1xj }; (-9223372036854775808) -> case GHC.Base.divInt# (-9223372036854775808) 10 of wild21_a1xl [ALWAYS Just L] { __DEFAULT -> GHC.Types.I# wild21_a1xl } } } in (\ (eta_X1sU [ALWAYS Just L] :: GHC.Types.Int) (eta_s1G7 [ALWAYS Just U(L)] :: GHC.Types.Int) -> . . .
In general, the -ddump-inlinings flag is useful for working out why something wasn't inlined - but it wouldn't have helped you in this case, because it only dumps information about inlining at call sites, and you actually want an unconditional inlining to occur.
I also tried that, and didn't have much luck with it. I didn't understand the output, which there was 48k lines worth of, and the uniques kept changing which made it hard to grep for names from previous -ddump-simpl runs. I'm in the process of reading the "Secrets of the GHC inliner" paper, so, hopefully, that will at least help me with the first of these problems. : ) Thanks very much for looking at this and the details reply! Cheers! -Tyson