
{- Arity: 4 Strictness: LSSL -}
Right. Unboxed args are always given the annotation L. So that function is strict in that pointer arg, but GHC is choosing not to unbox it. I'm not sure why that's the case.
I thought maybe it was because I hadn't said -funbox-strict-fields, but it didn't change when I did.
Is there some semantic advantage to bang-patterns, or is it just a syntactic convenience?
It's syntactic convenience.
I've noticed differences between strictness guards and bang-patterns with GHC. This is a problem, I think, because bang-patterns are GHC only, and I wanted to keep the code portable. This strictness guard: readUTF8Char :: Int -> Int -> Ptr Word8 -> IO Char readUTF8Char x offset p | () !x !offset !p !False = undefined | otherwise = ... where x ! y = seq x y results in this simplifier output: [Arity 3 Str: DmdType LSS] $wreadUTF8Char_r38I = \ (ww_s30B :: GHC.Prim.Int#) (w_s30D :: GHC.Base.Int) (w1_s30E :: GHC.Ptr.Ptr GHC.Word.Word8) -> However, if I change this to: {-# OPTIONS_GHC -fbang-patterns #-} ... readUTF8Char :: Int -> Int -> Ptr Word8 -> IO Char readUTF8Char !x !offset !p | otherwise = then I get this simpifier output: [Arity 3 Str: DmdType LLL] $wreadUTF8Char_r38n = \ (ww_s2Zk :: GHC.Prim.Int#) (ww1_s2Zo :: GHC.Prim.Int#) (ww2_s2Zs :: GHC.Prim.Addr#) -> Also, with bang-patterns I've noticed that fromUTF8Ptr transforms into two functions, which contain very similar code: [Arity 5] Foreign.C.UTF8.$s$wfromUTF8Ptr = \ (acc_X151 :: GHC.Base.String) (new_s_a2GQ :: GHC.Prim.State# GHC.Prim.RealWorld) (a87_a2GR :: GHC.Base.Char) (ww_s2ZH :: GHC.Prim.Addr#) (sc_s36j :: GHC.Prim.Int#) -> [Arity 4 Str: DmdType LLSL] Foreign.C.UTF8.$wfromUTF8Ptr = \ (ww_s2ZD :: GHC.Prim.Int#) (ww1_s2ZH :: GHC.Prim.Addr#) (w_s2ZJ :: GHC.Base.String) (w1_s2ZK :: GHC.Prim.State# GHC.Prim.RealWorld) -> $wfromUTF8Ptr calls itself and $s$wfromUTF8Ptr, but $s$wfromUTF8Ptr only every calls itself, so $wfromUTF8Ptr could be considered the wrapper, and $s$wfromUTF8Ptr the worker, I guess. AFAICT it's a transformation of the various cases in fromUTF8Ptr: | x <= 0x7F -> fromUTF8Ptr (bytes-1) p (chr x:acc) | x <= 0xBF && bytes == 0 -> error "fromUTF8Ptr: ..." | x <= 0xBF -> fromUTF8Ptr (bytes-1) p acc | otherwise -> do c <- readUTF8Char x bytes p fromUTF8Ptr (bytes-1) p (c:acc) The first case, x <= 0x7F, results in a call to $s$wfromUTF8Ptr. The third case, x <= 0xBF, results in a call to $wfromUTF8Ptr. The last case, otherwise, results in a call to $s$wfromUTF8Ptr. The calls to $s$wfromUTF8Ptr pass the newly constructed Char and the rest of the String separately, and they are cons'ed in $s$wfromUTF8Ptr. Not sure what benefit this gives... I don't know what transformation causes this, but it was a bit of a surprise. Alistair