
On Thu, Feb 21, 2008 at 11:37 AM, Duncan Coutts
On Thu, 2008-02-21 at 10:06 +0100, Johan Tibell wrote:
Hi John!
On Wed, Feb 20, 2008 at 3:39 PM, John Goerzen
wrote: 3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings.
I don't think the examples you gave (i.e. lines and words) make much sense on ByteStrings. You would have to assume that the sequence of bytes are in some particular Unicode encoding and thus words and lines will break if they get passed a ByteString using a different encoding. I don't think either of those two functions make sense on anything but sequence of character types like String.
That's exactly what the Data.ByteString[.Lazy].Char8 modules provide, a Char8 view of a Bytestring. Those modules provide functions like words, lines etc that assume an ASCII compatible 8bit encoding.
I would be very happy if people didn't use the .Char8 versions of ByteString except for being able to write byte literals using pack. (I would be even happier if Haskell had byte literals.) If people start using ByteString in their library interfaces instead of String I'll be really miserable because I can't really use their libraries for writing applications that need to be internationalized because their libraries would be limited to ASCII. Data.ByteString and Data.ByteString.Char8 uses the same ByteString type so I can take some bytes in UTF-32 which I read from the network and use Data.ByteString.Char8 functions on them which will fail. I prefer that a type that represent characters is guarded by encode and decode functions. If that's not the case it's easy to mix data in different encodings by mistake when e.g. writing web applications which involve data in several different encodings.
One day we'll have a separate type that does Unicode with a similar fast packed representation.
That will be a good day. :) -- Johan