
On 04/06/07, Duncan Coutts
On Mon, 2007-06-04 at 09:43 +0100, Alistair Bayley wrote:
After some experiments with the simplifier, ... The "portable" unboxed version is within about 15% of the unboxed version in terms of time and allocation.
Well done.
Of course, that might be saying more about the performance of the unboxed version...
Yeah. In Data.ByteString.Char8 we invent this w2c & c2w functions to avoid the test. There should probably be a standard version of this unchecked conversion.
Bulat suggested unsafeChr from GHC.Exts, but I can't see this. I guess I could roll my own; after all it's just (C# (chr# x)).
BTW, what's the difference between the indexXxxxOffAddr# and readXxxxOffAddr# functions in GHC.Prim?
Right. So it'd only be safe to use the index ones on immutable arrays because there's no way to enforce sequencing with respect to array writes when using the index version.
In this case I'm reading from a CString buffer, which is (hopefully) not changing during the function invocation, and never written to by my code. So presumably it'd be pretty safe to use the index- functions.
- Ptrs don't get unboxed. Why is this? Some IO monad thing?
Got any more detail?
OK. readUTF8Char's transformation starts with this: $wreadUTF8Char_r3de = \ (ww_s33v :: GHC.Prim.Int#) (w_s33x :: GHC.Ptr.Ptr GHC.Word.Word8) -> If we expect it to unbox, I'd expect the Ptr to become Addr#. Later, this (w_s33x) gets unboxed just before it's used: case w_s33x of wild6_a2JM { GHC.Ptr.Ptr a_a2JO -> case GHC.Prim.readWord8OffAddr# @ GHC.Prim.RealWorld a_a2JO 1 s_a2Jf readUTF8Char is called by fromUTF8Ptr, where there's a little Ptr arithmetic. The Ptr argument to fromUTF8Ptr is unboxed, offset is added, and the result is reboxed so that it can be consumed by readUTF8Char. All a bit unnecessary, I think e.g. Foreign.C.UTF8.$wfromUTF8Ptr = ... let { p'_s38N [Just D(T)] :: GHC.Ptr.Ptr GHC.Word.Word8 [Str: DmdType] p'_s38N = __scc {fromUTF8Ptr main:Foreign.C.UTF8 !} case w_s33J of wild11_a2DW { GHC.Ptr.Ptr addr_a2DY -> GHC.Ptr.Ptr @ GHC.Word.Word8 (GHC.Prim.plusAddr# addr_a2DY ww_s33H) } } in ... I'd prefer the Ptr arg to fromUTF8Ptr to also be unboxed, so that the primitive plusAddr# can be used directly on it before it's passed to readUTF8Char. Perhaps instead I could push this Ptr arithmetic down to readUTF8Char, and pass it the constant Ptr to the start of the buffer, and the offset into it, rather than a Ptr to the current position. Alistair