
Alistair You're right, both versions should give the same code. Which version of GHC are you using? Both with the HEAD and with 6.6.1 I get the nice unboxed code with the `seq` version too. My test program is below. If you can make a reproducible test case of the unexpected behaviour please file it as a Trac bug and I will take a look. Pls include the actual code and command line you used to compile it. I have not looked at your second point; again a reproducible example would be helpful. Thanks Simon import GHC.Ptr import GHC.Word readUTF8Char :: Int -> Int -> Ptr Word8 -> Int readUTF8Char x offset p | () !x !offset !p !False = 10 | x>7 = 3 | otherwise = readUTF8Char x offset p where x ! y = seq x y -----Original Message----- From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Alistair Bayley Sent: 05 June 2007 09:37 To: Duncan Coutts Cc: haskell-cafe Subject: Re: [Haskell-cafe] Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen
{- Arity: 4 Strictness: LSSL -}
Right. Unboxed args are always given the annotation L. So that function is strict in that pointer arg, but GHC is choosing not to unbox it. I'm not sure why that's the case.
I thought maybe it was because I hadn't said -funbox-strict-fields, but it didn't change when I did.
Is there some semantic advantage to bang-patterns, or is it just a syntactic convenience?
It's syntactic convenience.
I've noticed differences between strictness guards and bang-patterns with GHC. This is a problem, I think, because bang-patterns are GHC only, and I wanted to keep the code portable. This strictness guard: readUTF8Char :: Int -> Int -> Ptr Word8 -> IO Char readUTF8Char x offset p | () !x !offset !p !False = undefined | otherwise = ... where x ! y = seq x y results in this simplifier output: [Arity 3 Str: DmdType LSS] $wreadUTF8Char_r38I = \ (ww_s30B :: GHC.Prim.Int#) (w_s30D :: GHC.Base.Int) (w1_s30E :: GHC.Ptr.Ptr GHC.Word.Word8) -> However, if I change this to: {-# OPTIONS_GHC -fbang-patterns #-} ... readUTF8Char :: Int -> Int -> Ptr Word8 -> IO Char readUTF8Char !x !offset !p | otherwise = then I get this simpifier output: [Arity 3 Str: DmdType LLL] $wreadUTF8Char_r38n = \ (ww_s2Zk :: GHC.Prim.Int#) (ww1_s2Zo :: GHC.Prim.Int#) (ww2_s2Zs :: GHC.Prim.Addr#) -> Also, with bang-patterns I've noticed that fromUTF8Ptr transforms into two functions, which contain very similar code: [Arity 5] Foreign.C.UTF8.$s$wfromUTF8Ptr = \ (acc_X151 :: GHC.Base.String) (new_s_a2GQ :: GHC.Prim.State# GHC.Prim.RealWorld) (a87_a2GR :: GHC.Base.Char) (ww_s2ZH :: GHC.Prim.Addr#) (sc_s36j :: GHC.Prim.Int#) -> [Arity 4 Str: DmdType LLSL] Foreign.C.UTF8.$wfromUTF8Ptr = \ (ww_s2ZD :: GHC.Prim.Int#) (ww1_s2ZH :: GHC.Prim.Addr#) (w_s2ZJ :: GHC.Base.String) (w1_s2ZK :: GHC.Prim.State# GHC.Prim.RealWorld) -> $wfromUTF8Ptr calls itself and $s$wfromUTF8Ptr, but $s$wfromUTF8Ptr only every calls itself, so $wfromUTF8Ptr could be considered the wrapper, and $s$wfromUTF8Ptr the worker, I guess. AFAICT it's a transformation of the various cases in fromUTF8Ptr: | x <= 0x7F -> fromUTF8Ptr (bytes-1) p (chr x:acc) | x <= 0xBF && bytes == 0 -> error "fromUTF8Ptr: ..." | x <= 0xBF -> fromUTF8Ptr (bytes-1) p acc | otherwise -> do c <- readUTF8Char x bytes p fromUTF8Ptr (bytes-1) p (c:acc) The first case, x <= 0x7F, results in a call to $s$wfromUTF8Ptr. The third case, x <= 0xBF, results in a call to $wfromUTF8Ptr. The last case, otherwise, results in a call to $s$wfromUTF8Ptr. The calls to $s$wfromUTF8Ptr pass the newly constructed Char and the rest of the String separately, and they are cons'ed in $s$wfromUTF8Ptr. Not sure what benefit this gives... I don't know what transformation causes this, but it was a bit of a surprise. Alistair _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe