New subject: Haskell (Byte)Strings - wrong to separate content from encoding?

19 Mar 2010

      Hi

More and more libraries use ByteStrings these days. And it is great that
we can get fast string handling in Haskell, but is ByteString the right
level of abstractions for most uses?

It seems to me that libraries, like the Happstack server, should use a
string-type which contains both the content (what ByteString contains
today) and the encoding. After all, data in a ByteString have no meaning
if we do not know its encoding.

An example will illustrate my point. If your web-app, implemented with
Happstack, receives a request it looks like
http://happstack.com/docs/0.4/happstack-server/Happstack-Server-HTTP-Types.h... :

data Request = Request { ... rqHeaders :: Headers, ... rqBody ::
RqBody ... }

newtype RqBody = Body ByteString

To actually read the body, you need to find the content-type header, use
some encoding-conversion package to actually know what the ByteString
means. Furthermore, some other library may need to consume the
ByteString. Now you need to know which encoding the consumer expects...

But all this seems avoidable if Happstack returned a string-type which
included both content and encoding.

I could make a similar story about reading text-files.

If some data structure contains a lot of small strings, having both
encoding and content for each string is wasteful. Thus, I am not
suggesting that ByteString should be scraped. Just that ordinarily
programmers should not have to think about string encodings.

An alternative to having a String type, which contains both content and
encoding, would be standardizing on some encoding like UTF-8. I realize
that we have the utf8-string package on Hackage, but people (at least
Happstack and Network.HTTP) seem to prefer ByteString. I wonder why.

Greetings,

Mads Lindstrøm

Haskell (Byte)Strings - wrong to separate content from encoding?

Mads Lindstrøm

Maciej Piechotka

tags

participants (2)