
Hi, On 2018-03-08 at 09:19:29 -0500, Andrew Martin wrote:
Some of the bytes in the word will have garbage in them. However, this could always be masked out with a bit mask (you have to know the platform endianness for this to work right).
Is this safe? I doubt think this could ever cause a segfault but I wanted to check.
Due to historical reasons, this is indeed safe. the underlying `StgArrBytes` structure must be word-aligned in size, otherwise bad things are likely to happen. I've seem some code in the wild which relies on that, and as data-point, I myself exploit that property in some operations (including the masking and endianness-aware handling you refer to) of 'text-short'[1] which is optimised for UTF8-based strings (<shameless-plug>and which besides being a practically useful library having its place in the text/bytearray landscape[2], text-short also serves as an incubation area for optimisation ideas and code of which some may end up in one way or another in the text-utf8 project[3]</shameless-plug>). [1]: https://hackage.haskell.org/package/text-short [2]: https://markkarpov.com/post/short-bs-and-text.html [3]: https://hackage.haskell.org/text-utf8 -- hvr