RE: [Haskell-cafe] Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen

8 Jun 2007

      Alistair

You're right, both versions should give the same code.  Which version of GHC are you using?  Both with the HEAD and with 6.6.1 I get the nice unboxed code with the `seq` version too.  My test program is below.

If you can make a reproducible test case of the unexpected behaviour please file it as a Trac bug and I will take a look.  Pls include the actual code and command line you used to compile it.

I have not looked at your second point; again a reproducible example would be helpful.

Thanks

Simon

import GHC.Ptr
import GHC.Word

readUTF8Char :: Int -> Int -> Ptr Word8 -> Int
readUTF8Char x offset p
  | () !x !offset !p !False = 10
  | x>7 = 3
  | otherwise = readUTF8Char x offset p
  where
    x ! y = seq x y

-----Original Message-----
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Alistair Bayley
Sent: 05 June 2007 09:37
To: Duncan Coutts
Cc: haskell-cafe
Subject: Re: [Haskell-cafe] Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen
...
...
{- Arity: 4 Strictness: LSSL -}
Right. Unboxed args are always given the annotation L. So that function
is strict in that pointer arg, but GHC is choosing not to unbox it. I'm
not sure why that's the case.
I thought maybe it was because I hadn't said -funbox-strict-fields,
but it didn't change when I did.
...
...
Is there some semantic advantage to bang-patterns, or is it just a
syntactic convenience?
It's syntactic convenience.
I've noticed differences between strictness guards and bang-patterns
with GHC. This is a problem, I think, because bang-patterns are GHC
only, and I wanted to keep the code portable. This strictness guard:

readUTF8Char :: Int -> Int -> Ptr Word8 -> IO Char
readUTF8Char x offset p
  | () !x !offset !p !False = undefined
  | otherwise =
  ...
  where
    x ! y = seq x y

results in this simplifier output:

[Arity 3
 Str: DmdType LSS]
$wreadUTF8Char_r38I =
  \ (ww_s30B :: GHC.Prim.Int#)
    (w_s30D :: GHC.Base.Int)
    (w1_s30E :: GHC.Ptr.Ptr GHC.Word.Word8) ->

However, if I change this to:

{-# OPTIONS_GHC -fbang-patterns #-}

...

readUTF8Char :: Int -> Int -> Ptr Word8 -> IO Char
readUTF8Char !x !offset !p
  | otherwise =

then I get this simpifier output:

[Arity 3
 Str: DmdType LLL]
$wreadUTF8Char_r38n =
  \ (ww_s2Zk :: GHC.Prim.Int#)
    (ww1_s2Zo :: GHC.Prim.Int#)
    (ww2_s2Zs :: GHC.Prim.Addr#) ->

Also, with bang-patterns I've noticed that fromUTF8Ptr transforms
into two functions, which contain very similar code:

[Arity 5]
Foreign.C.UTF8.$s$wfromUTF8Ptr =
  \ (acc_X151 :: GHC.Base.String)
    (new_s_a2GQ :: GHC.Prim.State# GHC.Prim.RealWorld)
    (a87_a2GR :: GHC.Base.Char)
    (ww_s2ZH :: GHC.Prim.Addr#)
    (sc_s36j :: GHC.Prim.Int#) ->

[Arity 4
 Str: DmdType LLSL]
Foreign.C.UTF8.$wfromUTF8Ptr =
  \ (ww_s2ZD :: GHC.Prim.Int#)
    (ww1_s2ZH :: GHC.Prim.Addr#)
    (w_s2ZJ :: GHC.Base.String)
    (w1_s2ZK :: GHC.Prim.State# GHC.Prim.RealWorld) ->

$wfromUTF8Ptr calls itself and $s$wfromUTF8Ptr, but $s$wfromUTF8Ptr
only every calls itself, so $wfromUTF8Ptr could be considered the
wrapper, and $s$wfromUTF8Ptr the worker, I guess.

AFAICT it's a transformation of the various cases in fromUTF8Ptr:

      | x <= 0x7F -> fromUTF8Ptr (bytes-1) p (chr x:acc)
      | x <= 0xBF && bytes == 0 -> error "fromUTF8Ptr: ..."
      | x <= 0xBF -> fromUTF8Ptr (bytes-1) p acc
      | otherwise -> do
          c <- readUTF8Char x bytes p
          fromUTF8Ptr (bytes-1) p (c:acc)

The first case, x <= 0x7F, results in a call to $s$wfromUTF8Ptr.
The third case, x <= 0xBF, results in a call to $wfromUTF8Ptr.
The last case, otherwise, results in a call to $s$wfromUTF8Ptr.

The calls to $s$wfromUTF8Ptr pass the newly constructed Char and the
rest of the String separately, and they are cons'ed in $s$wfromUTF8Ptr.
Not sure what benefit this gives...

I don't know what transformation causes this, but it was a bit of a surprise.

Alistair
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen

Simon Peyton-Jones