
Hi, tl;dr: I'd like to remove the String instances from the HTTP package. The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings. Unfortunately, the String instance is rather broken. A String ought to represent Unicode data, but the HTTP wire format is bytes, and HTTP makes no attempt to handle encoding. In particular uploaded data (e.g. in POSTs) gets silently truncated and downloaded data is improperly embedded as one byte per character no matter what encoding the server advertises in the Content-Type header. (https://github.com/haskell/HTTP/issues/28) I've spent a while investigating the option of making HTTP encode and decode Strings appropriately, but my tentative conclusion is that it's too hard: - on upload we'd have to pick an encoding by default - probably UTF-8 - and also add it to the Content-Type header which may involve messing with any header supplied by the user. If the user supplied a different encoding in Content-Type then we probably would need to notice and respect that. - on upload Content-Length may also need to be managed somehow. - on download we'd need to be able to handle at least common encodings that the server might send, but on Windows even common encodings like iso-8859-* don't exist and there aren't always appropriate substitutes. - on download we'd also really want to parse HTML/XML documents looking for in-document specifications of the encoding in META tags and XML declarations (see http://www.w3.org/QA/2008/03/html-charset.html) - we'd need to also parse Content-Type to detect when the data is supposed to be binary, and then check that it is actually 8-bit clean on upload. If the user doesn't supply Content-Type at all, then what? I think the right way to do this would be to have proper high-level and low-level APIs where only the high-level API supports strings but also does a lot more active management of standard HTTP headers like content-type/content-length. But HTTP as it stands is a long way from doing that and a short-term fix is needed. So I'm reluctantly drawn to the conclusion that the only reasonable thing to do is to remove the String instances from HTTP completely for now. I imagine this could be quite disruptive, but on the other hand people using the String instance are getting silently broken behaviour and a couple of people have been bitten by this recently. Any thoughts? Cheers, Ganesh