Re: Support conversion from (UArray i Word8) to ShortByteString?

Viktor, Your proposed `arrayToByteArray` seems plausible. Your proposed `byteArrayToShort` is just the newtype-constructor `ShortByteString` which is exposed from Data.ByteString.Short since bytestring-0.12.0.0.
An alternative is to add a tailored version of the UArray and STUArray APIs to 'MutableByteArray' by extending the rather limited API of 'Data.Array.Byte':
How does this compare to the interface provided by primitive:Data.Primitive.ByteArray? Their `runByteArray` is your `runMutableByteArray`.
The 'ShortByteString' type in the "bytestring" package has seen some singnificant improvement recently, and yet its API is still noticeably limited in comparison to its pinned, I/O friendly 'ByteString' elder sibling.
One of the limitations is that that there are fewer ways to construct a 'ShortByteString' object. One often has to restort to constructing a pinned ByteString, and then copy. (There no "ST" Builders that write to resizable MutableByteArrays instead of raw memory pointers).
Meanwhile, under-the covers, both the "UArray i Word8" type and 'ShortByteString' hold an immutable 'ByteArray', and the STUArray API provides a flexible "UArray" construction interface.
Would it be reasonable to "bridge" the two APIs:
Data.Array.Unboxed: (re-export from Data.Array.Base) import Data.Array.Byte
arrayToByteArray :: UArray i Word8 -> ByteArray arrayToByteArray (UArray _ _ _ ba#) = ByteArray ba# {-# INLINE arrayToByArrray #-}
Data.ByteString.Short: (re-export from Data.ByteString.Short.Internal) byteArrayToShort :: ByteArray -> ShortByteString byteArrayToShort = coerce {-# INLINE byteArrayToShort #-}
It would then be possible to write:
short = byteArrayToShort $ arrayToByteArray $ runSTUArray m where m = do a <- newArray (0, last) 0 -- zero fill sequence_ [ writeArray a ix e | (ix, e) <- generator ]
and generate the bytes of a 'ShortByteString' from an arbitrary computation, possibly merging multiple inputs into some bytes by using the recently introduced "modifyArray" (or explicit read/modify/write).
Any thoughts about the wisdom or lack thereof of this proposal?
An alternative is to add a tailored version of the UArray and STUArray APIs to 'MutableByteArray' by extending the rather limited API of 'Data.Array.Byte':
runMutableByteArray :: (forall s. ST s (MutableByteArray s)) -> ByteArray runMutableByteArray m = runST $ m >>= freezeMutableByteArray
freezeMutableByteArray (MutableByteArray mba#) = ST $ \s -> case unsafeFreezeByteArray# mba# s of (# s', ba# #) -> (# s', ByteArray ba# #)
Since "Data.Array.Byte" is an "array" (rather than string) interface, it could have a richer set of indexed read/write/modify primitives along the lines of those found in "Data.Array.STUArray", but specialised to 'Word8' elements and implicit zero-based integer indexing.
The flexible construction I seek would then be via "Data.Array.Bytes", rather than the somewhat too general index and value types from UArray.
short = byteArrayToShort $ runMutableByteArray m where m = do a <- newByteArray size 0 -- 0 fill sequence_ [ writeByteArray a ix e | (ix, e) <- generator ]
In this scenario, the indexed-mutation of ShortByteStrings under construction, or indexed-mutation of copies for various transformations, could live in Data.Array.Byte, with ShortByteString and various applications leveraging the random-access mutation (and resizing, ...) to implement higher level operations.
-- Viktor.

On Mon, Nov 13, 2023 at 09:35:06PM -0500, Matthew Craven wrote:
Your proposed `arrayToByteArray` seems plausible.
Thanks, that could be handy, but see below.
Your proposed `byteArrayToShort` is just the newtype-constructor `ShortByteString` which is exposed from Data.ByteString.Short since bytestring-0.12.0.0.
Thanks. I did not notice these are now exposed without having to import unstable "Internal" interfaces: GHCi, version 9.8.1: ... λ> import Data.Array.Byte λ> import Data.ByteString.Short λ> :t ShortByteString ShortByteString :: Data.Array.Byte.ByteArray -> ShortByteString λ> :t SBS SBS :: GHC.Prim.ByteArray# -> ShortByteString λ> :t ByteArray ByteArray :: GHC.Prim.ByteArray# -> ByteArray So as of GHC 9.8.1 and "bytestring" 12, I have all the missing glue.
An alternative is to add a tailored version of the UArray and STUArray APIs to 'MutableByteArray' by extending the rather limited API of 'Data.Array.Byte':
How does this compare to the interface provided by primitive:Data.Primitive.ByteArray? Their `runByteArray` is your `runMutableByteArray`.
I was unaware that the "primitive" packages already provides what I was looking for. Unlike the case with "array", there's no duplication of bounds checks for performing separate read/write at the same index (because there are no bounds checks), so the "missing" will not be missed. Looks like I'm all set. Just need to use 'SBS' in place of 'ShortByteString' while working with GHC 9.[246].*. For my use case, I don't need the additional safety (bounds checks) of "array", but it is perhaps reasonable to consider adding the proposed bridge (from UArray i Word8), for users who want a bit more safety than one gets with "primitive". -- Viktor.
participants (2)
-
Matthew Craven
-
Viktor Dukhovni