GHC.Prim.ByteArray# - confusing documentation

Folks, I found some of the documentation in GHC.Prim confusing - so I thought I'd share. The documentation for the ByteArray# type[1] explains that's it's a raw region in memory that also remembers it's size. Consequently I expected sizeOfByteArray# to return the same number that I passed in to newByteArray#. But it doesn't - It returned however much it decided to allocate, which on my platform is always a multiple of four bytes. This is something which could be clarified in the documentation. So it turns out I need to carry my own length around in something like (Int, ByteArray#). (I tested all of this with the primitive[2] package, which is a thin wrapper around GHC.Prim) Antoine 1: http://hackage.haskell.org/packages/archive/ghc-prim/0.1.0.0/doc/html/GHC-Pr... 2: http://hackage.haskell.org/package/primitive

On Thu, 2009-12-24 at 18:18 -0500, Antoine Latter wrote:
Folks,
I found some of the documentation in GHC.Prim confusing - so I thought I'd share. The documentation for the ByteArray# type[1] explains that's it's a raw region in memory that also remembers it's size.
Consequently I expected sizeOfByteArray# to return the same number that I passed in to newByteArray#. But it doesn't - It returned however much it decided to allocate, which on my platform is always a multiple of four bytes.
Yes, this is an artefact of the fact that ghc measures heap stuff in units of words.
This is something which could be clarified in the documentation.
It would be jolly useful for making short strings for GHC's ByteArray# to to use a byte length rather than a word length. It'd mean a little more bit twiddling in the GC code that looks at ByteArray#s, however it'd save an extra 2 words in a short string type (or allow us to store '\0' characters in short strings). It's been on my TODO list for some time to design a portable low level ByteArray module that could be implemented by hugs, nhc, ghc, etc. The aim would be to be similar to ForeignPtr + Storable but using native heap allocated memory blocks. In turn this would be the right portable layer on which to build ByteString, Text and probably IO buffers too. Duncan

On Sat, Dec 26, 2009 at 12:50 PM, Duncan Coutts
It's been on my TODO list for some time to design a portable low level ByteArray module that could be implemented by hugs, nhc, ghc, etc. The aim would be to be similar to ForeignPtr + Storable but using native heap allocated memory blocks.
It looks like Data.Text.Array has a bit of a head-start on this, although it sounds like you're looking for something even simpler. Currently it only apears to build on GHC and Hugs, but we could have an FFI fall-back case. I prefer the interface to Data.Primitive.ByteArray, though.
In turn this would be the right portable layer on which to build ByteString, Text and probably IO buffers too.
It looks like ByteString is aggressive about calling C code to perform a few operations - which is something we'd only be able to do for pined arrays. Antoine

On 24/12/09 23:18, Antoine Latter wrote:
Folks,
I found some of the documentation in GHC.Prim confusing - so I thought I'd share. The documentation for the ByteArray# type[1] explains that's it's a raw region in memory that also remembers it's size.
Consequently I expected sizeOfByteArray# to return the same number that I passed in to newByteArray#. But it doesn't - It returned however much it decided to allocate, which on my platform is always a multiple of four bytes.
This is something which could be clarified in the documentation.
Thanks, I'll fix the docs. Cheers, Simon

Hello Simon, Wednesday, December 30, 2009, 1:44:54 PM, you wrote:
Consequently I expected sizeOfByteArray# to return the same number that I passed in to newByteArray#. But it doesn't - It returned however much it decided to allocate, which on my platform is always a multiple of four bytes.
This is something which could be clarified in the documentation.
Thanks, I'll fix the docs.
btw, is it possible to fix the behavior? it will reduce overhead for storing small strings -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 30/12/09 11:09, Bulat Ziganshin wrote:
Hello Simon,
Wednesday, December 30, 2009, 1:44:54 PM, you wrote:
Consequently I expected sizeOfByteArray# to return the same number that I passed in to newByteArray#. But it doesn't - It returned however much it decided to allocate, which on my platform is always a multiple of four bytes.
This is something which could be clarified in the documentation.
Thanks, I'll fix the docs.
btw, is it possible to fix the behavior? it will reduce overhead for storing small strings
It would be possible yes. When the RTS needs to know the size of the array in words it would have to do a calculation, so you'd have to find all the places in the RTS that do this (probably only a handful, as most of them go through the arr_words_sizeW() inline function). I don't plan to do this right now. If someone else wants to tackle it then please go ahead, it'd be a fun afternoon hack. Cheers, Simon

On Wed, Dec 30, 2009 at 5:47 AM, Simon Marlow
On 30/12/09 11:09, Bulat Ziganshin wrote:
Hello Simon,
Wednesday, December 30, 2009, 1:44:54 PM, you wrote:
btw, is it possible to fix the behavior? it will reduce overhead for storing small strings
It would be possible yes. When the RTS needs to know the size of the array in words it would have to do a calculation, so you'd have to find all the places in the RTS that do this (probably only a handful, as most of them go through the arr_words_sizeW() inline function).
I don't plan to do this right now. If someone else wants to tackle it then please go ahead, it'd be a fun afternoon hack.
I've written this up as ticket 3800: http://hackage.haskell.org/trac/ghc/ticket/3800 The patches for ghc and integer-gmp are attached to the ticket. Antoine
participants (4)
-
Antoine Latter
-
Bulat Ziganshin
-
Duncan Coutts
-
Simon Marlow