
Hi, tl;dr: I'd like to remove the String instances from the HTTP package. The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings. Unfortunately, the String instance is rather broken. A String ought to represent Unicode data, but the HTTP wire format is bytes, and HTTP makes no attempt to handle encoding. In particular uploaded data (e.g. in POSTs) gets silently truncated and downloaded data is improperly embedded as one byte per character no matter what encoding the server advertises in the Content-Type header. (https://github.com/haskell/HTTP/issues/28) I've spent a while investigating the option of making HTTP encode and decode Strings appropriately, but my tentative conclusion is that it's too hard: - on upload we'd have to pick an encoding by default - probably UTF-8 - and also add it to the Content-Type header which may involve messing with any header supplied by the user. If the user supplied a different encoding in Content-Type then we probably would need to notice and respect that. - on upload Content-Length may also need to be managed somehow. - on download we'd need to be able to handle at least common encodings that the server might send, but on Windows even common encodings like iso-8859-* don't exist and there aren't always appropriate substitutes. - on download we'd also really want to parse HTML/XML documents looking for in-document specifications of the encoding in META tags and XML declarations (see http://www.w3.org/QA/2008/03/html-charset.html) - we'd need to also parse Content-Type to detect when the data is supposed to be binary, and then check that it is actually 8-bit clean on upload. If the user doesn't supply Content-Type at all, then what? I think the right way to do this would be to have proper high-level and low-level APIs where only the high-level API supports strings but also does a lot more active management of standard HTTP headers like content-type/content-length. But HTTP as it stands is a long way from doing that and a short-term fix is needed. So I'm reluctantly drawn to the conclusion that the only reasonable thing to do is to remove the String instances from HTTP completely for now. I imagine this could be quite disruptive, but on the other hand people using the String instance are getting silently broken behaviour and a couple of people have been bitten by this recently. Any thoughts? Cheers, Ganesh

On Mon, Sep 10, 2012 at 3:22 PM, Ganesh Sittampalam
I imagine this could be quite disruptive, but on the other hand people using the String instance are getting silently broken behaviour and a couple of people have been bitten by this recently.
I'm in favour of broken code breaking explicitly, rather than silently doing the wrong thing, so +1 to nuking the String instances in spite of the up-front pain.

On Tue, Sep 11, 2012 at 12:44 AM, Bryan O'Sullivan
On Mon, Sep 10, 2012 at 3:22 PM, Ganesh Sittampalam
wrote: I imagine this could be quite disruptive, but on the other hand people using the String instance are getting silently broken behaviour and a couple of people have been bitten by this recently.
I'm in favour of broken code breaking explicitly, rather than silently doing the wrong thing, so +1 to nuking the String instances in spite of the up-front pain.
+1. I've been bitten by the broken String instances, and now only use the ByteString instances. Erik

On Mon, 10 Sep 2012, Ganesh Sittampalam wrote:
tl;dr: I'd like to remove the String instances from the HTTP package.
The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings.
This instance was also kind of broken, because it used TypeSynonymInstances without need.

Am 11.09.2012 00:22, schrieb Ganesh Sittampalam:
Hi,
tl;dr: I'd like to remove the String instances from the HTTP package.
The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings.
Unfortunately, the String instance is rather broken. A String ought to represent Unicode data, but the HTTP wire format is bytes, and HTTP makes no attempt to handle encoding.
if you remove the String instance I would need to encode my strings manually (and maybe worse than it is done now). Which instance does the package cabal-install use? Which alternative (better maintained) packages could I use if I have to change my code anyway? The header of Network.HTTP contains a "Portability" saying "non-portable (not tested)", but the package contains a test-suite. Are tests (or their lack) a portability issue? (I've seen packages claiming portability with plenty of ghc extensions, that probably only work for a certain ghc versions on few architectures.) Cheers Christian
In particular uploaded data (e.g. in POSTs) gets silently truncated and downloaded data is improperly embedded as one byte per character no matter what encoding the server advertises in the Content-Type header. (https://github.com/haskell/HTTP/issues/28)
I've spent a while investigating the option of making HTTP encode and decode Strings appropriately, but my tentative conclusion is that it's too hard:
- on upload we'd have to pick an encoding by default - probably UTF-8 - and also add it to the Content-Type header which may involve messing with any header supplied by the user. If the user supplied a different encoding in Content-Type then we probably would need to notice and respect that.
- on upload Content-Length may also need to be managed somehow.
- on download we'd need to be able to handle at least common encodings that the server might send, but on Windows even common encodings like iso-8859-* don't exist and there aren't always appropriate substitutes.
- on download we'd also really want to parse HTML/XML documents looking for in-document specifications of the encoding in META tags and XML declarations (see http://www.w3.org/QA/2008/03/html-charset.html)
- we'd need to also parse Content-Type to detect when the data is supposed to be binary, and then check that it is actually 8-bit clean on upload. If the user doesn't supply Content-Type at all, then what?
I think the right way to do this would be to have proper high-level and low-level APIs where only the high-level API supports strings but also does a lot more active management of standard HTTP headers like content-type/content-length. But HTTP as it stands is a long way from doing that and a short-term fix is needed.
So I'm reluctantly drawn to the conclusion that the only reasonable thing to do is to remove the String instances from HTTP completely for now.
I imagine this could be quite disruptive, but on the other hand people using the String instance are getting silently broken behaviour and a couple of people have been bitten by this recently.
Any thoughts?
Cheers,
Ganesh

On Tue, Sep 11, 2012 at 9:30 AM, Christian Maeder
if you remove the String instance I would need to encode my strings manually (and maybe worse than it is done now).
This isn't actually that hard, and particularly it would be easy to do a better job than the current one if you used a real encoding package like text or utf8-string.
Which instance does the package cabal-install use?
Looks like it uses both String and ByteString in various pieces of the code. But it would probably be a sensible idea to switch to ByteString anyway.
Which alternative (better maintained) packages could I use if I have to change my code anyway?
The header of Network.HTTP contains a "Portability" saying "non-portable (not tested)", but the package contains a test-suite. Are tests (or their lack) a portability issue?
There is no standardised meaning of the Portability field, as far as I know, so it's probably best to ignore this. Yours, Ben

On 11/09/2012 09:30, Christian Maeder wrote:
Am 11.09.2012 00:22, schrieb Ganesh Sittampalam:
Hi,
tl;dr: I'd like to remove the String instances from the HTTP package.
The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings.
Unfortunately, the String instance is rather broken. A String ought to represent Unicode data, but the HTTP wire format is bytes, and HTTP makes no attempt to handle encoding.
if you remove the String instance I would need to encode my strings manually (and maybe worse than it is done now).
The obvious way to encode them is to use ByteString.Char8.pack which is exactly what HTTP does now. I can't really think of anything worse that someone might do by accident.
Which alternative (better maintained) packages could I use if I have to change my code anyway?
There's http-conduit, which also doesn't support String, but does support https and has a much cleaner interface. If conduit ever made it into the Platform then it would be an obvious choice to replace HTTP; but I still have some faith in lazy IO which is one of the reasons why I put effort into the HTTP package. Cheers, Ganesh

On Tue, Sep 11, 2012 at 6:38 PM, Ganesh Sittampalam
There's http-conduit, which also doesn't support String, but does support https and has a much cleaner interface. If conduit ever made it into the Platform then it would be an obvious choice to replace HTTP; but I still have some faith in lazy IO which is one of the reasons why I put effort into the HTTP package.
As an aside, the major reason I support HTTP over something like http-conduit is the latter's titanic dependency list. I think especially as a dependency of cabal-install that's something of a dealbreaker: $ cabal install http-conduit | grep 'new package' | wc -l [...] 47

On 09/11/2012 07:27 PM, Ben Millwood wrote:
As an aside, the major reason I support HTTP over something like http-conduit is the latter's titanic dependency list. I think especially as a dependency of cabal-install that's something of a dealbreaker:
$ cabal install http-conduit | grep 'new package' | wc -l [...] 47 I'm not sure what do you want to demonstrate here, number of packages couldn't be a more irrelevant metric. Would you prefer a package that includes everything in one giant codebase ?
For example, the whole haskell TLS stack is responsible for at least 10~15 packages in http-conduit's list. I could easily put everything in one giant package, openssl style. However i think it make more sense to build bricks (asn1, crypto hashes, ..) that can be reused in different libraries/programs (and indeed they are). Now cabal-install is a bit of a special case, and keeping HTTP working is probably a good idea. But at the Platform level, while i agree the amount of work required is huge and not without controversies, keeping HTTP instead of http-conduit just make it likely the platform will be (is) irrelevant for many people. -- Vincent

On Tue, 11 Sep 2012, Vincent Hanquez wrote:
On 09/11/2012 07:27 PM, Ben Millwood wrote:
As an aside, the major reason I support HTTP over something like http-conduit is the latter's titanic dependency list. I think especially as a dependency of cabal-install that's something of a dealbreaker:
$ cabal install http-conduit | grep 'new package' | wc -l [...] 47
I'm not sure what do you want to demonstrate here, number of packages couldn't be a more irrelevant metric. Would you prefer a package that includes everything in one giant codebase ?
I also hesitate to depend on packages with very many dependencies - although I write such packages myself. Chances are high that one of the imported packages fails to compile on a certain system or compiler version.
For example, the whole haskell TLS stack is responsible for at least 10~15 packages in http-conduit's list.
I haven't checked whether it is possible, but maybe there are ways to let the user plug in the TLS functionality if he needs it. Then conduit-http would not need to depend on it.

On 09/11/2012 09:45 PM, Henning Thielemann wrote:
I also hesitate to depend on packages with very many dependencies - although I write such packages myself. Chances are high that one of the imported packages fails to compile on a certain system or compiler version.
I suppose if more stuff were in the platform, that would make it less likely though.
For example, the whole haskell TLS stack is responsible for at least 10~15 packages in http-conduit's list.
I haven't checked whether it is possible, but maybe there are ways to let the user plug in the TLS functionality if he needs it. Then conduit-http would not need to depend on it.
I think it depends on how much control you need on the stack. For simple use, "open a TLS socket and give me a raw bytestream", it's possible. I believe that's how people use HTTP with HsOpenSSL. I think a great deal of possibilities is lost with this approach. -- Vincent

On Tue, Sep 11, 2012 at 09:36:37PM +0100, Vincent Hanquez wrote:
keeping HTTP instead of http-conduit just make it likely the platform will be (is) irrelevant for many people.
Are you saying that http-conduit is better, more popular, or both, than HTTP? According to http://packdeps.haskellers.com/reverse/http-conduit http://packdeps.haskellers.com/reverse/HTTP http-conduit has 33 reverse-deps to HTTP's 139, although these measurements are somewhat flawed as they don't take into account whether some of HTTP's rev-deps are old packages that have been abandoned, and all things being equal you'd expect HTTP to have more users as it's part of the HP. Thanks Ian

On Tue, Sep 11, 2012 at 09:36:37PM +0100, Vincent Hanquez wrote:
keeping HTTP instead of http-conduit just make it likely the platform will be (is) irrelevant for many people. Are you saying that http-conduit is better, more popular, or both, than HTTP? It's hard to get any solid and comparable numbers here, HTTP is a much older
On 09/11/2012 10:00 PM, Ian Lynagh wrote: package and it's part of the HP. I do however think it's currently more popular (yesod, enumerator/conduit, etc.) and more featureful than HTTP. HTTP is probably doing a fine job for lots of users, as long as they are happy with lazy io and no https without jumping through hoops, however http-conduit is providing a superset of this, with bonus of stream io, https, socks, and more. -- Vincent

On Tue, Sep 11, 2012 at 9:36 PM, Vincent Hanquez
On 09/11/2012 07:27 PM, Ben Millwood wrote:
As an aside, the major reason I support HTTP over something like http-conduit is the latter's titanic dependency list. I think especially as a dependency of cabal-install that's something of a dealbreaker:
$ cabal install http-conduit | grep 'new package' | wc -l [...] 47
I'm not sure what do you want to demonstrate here, number of packages couldn't be a more irrelevant metric. Would you prefer a package that includes everything in one giant codebase ?
Sorry, I realise in retrospect that my original message was misleading, I should have been more clear: http-conduit is a great package and I would recommend it for /most/ HTTP applications. But there is virtue to having an /alternative/ that is much less capable but much less heavyweight in terms of the things it needs. Especially since some people will want to install cabal-install without already /having/ cabal-install, using the bootstrap script, which needs to manually download and install the entire transitive dependency list. Imagine if that was 47 packages! The way that http-conduit is designed and built is definitely correct. It should be in many small packages so that it can be reused. But HTTP with a lightweight dependency list also has its place.

On 11/09/2012 19:27, Ben Millwood wrote:
On Tue, Sep 11, 2012 at 6:38 PM, Ganesh Sittampalam
wrote: There's http-conduit, which also doesn't support String, but does support https and has a much cleaner interface. If conduit ever made it into the Platform then it would be an obvious choice to replace HTTP; but I still have some faith in lazy IO which is one of the reasons why I put effort into the HTTP package.
As an aside, the major reason I support HTTP over something like http-conduit is the latter's titanic dependency list. I think especially as a dependency of cabal-install that's something of a dealbreaker:
For what it's worth this this isn't my view; I would happily add dependencies to HTTP if they would improve it, although as I'm constrained by what's in the Platform I won't be going wild any time soon. The cabal-install bootstrap process should be improved if the dependencies become prohibitive, e.g. by having hackage generate a complete download bundle. Though honestly, I wonder if HTTP should be in the Platform at all. It certainly wouldn't get in if it were proposed today. On the other hand an http client is an important battery. Ganesh

On Tue, Sep 11, 2012 at 6:08 PM, Ganesh Sittampalam
Though honestly, I wonder if HTTP should be in the Platform at all. It certainly wouldn't get in if it were proposed today. On the other hand an http client is an important battery.
I think it's mostly there by dint of being a dependency of cabal-install? -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Am 11.09.2012 19:38, schrieb Ganesh Sittampalam:
On 11/09/2012 09:30, Christian Maeder wrote:
Am 11.09.2012 00:22, schrieb Ganesh Sittampalam:
Hi,
tl;dr: I'd like to remove the String instances from the HTTP package.
The HTTP library is overloaded on the type for request and response bodies; there are instances for String and both strict and lazy Bytestrings.
Unfortunately, the String instance is rather broken. A String ought to represent Unicode data, but the HTTP wire format is bytes, and HTTP makes no attempt to handle encoding.
if you remove the String instance I would need to encode my strings manually (and maybe worse than it is done now).
The obvious way to encode them is to use ByteString.Char8.pack which is exactly what HTTP does now. I can't really think of anything worse that someone might do by accident.
My main use-case is simpleHTTP that is bound to the String instance, currently. There are no such short-cuts for byte-strings, are there? I'ld suggest to make a proper byte-string interface first and then deprecate the String stuff. (before calling Char8.pack, strings could be checked or filtered for "isAscii") Cheers Christian

On 12/09/2012 11:09, Christian Maeder wrote:
My main use-case is simpleHTTP that is bound to the String instance, currently. There are no such short-cuts for byte-strings, are there?
That's a good point. I guess I would make simpleHTTP overloaded while I was making breaking changes anyway.
I'ld suggest to make a proper byte-string interface first
What do you mean by "proper"? Unfortunately I don't really have time to do any substantial refactoring in the near future. Given lots of time now, I'd immediately make high-level and low-level interfaces with encoding only handled in the high-level one.
and then deprecate the String stuff.
Is it possible to deprecate an instance? I could perhaps instead provide an escape hatch with a newtype like UnsafeChar8String or something, either temporarily or permanently.
(before calling Char8.pack, strings could be checked or filtered for "isAscii")
The problem is more on the download side; if it's a wide encoding like UTF-16, even 7-bit cleanliness isn't enough to make Char8.unpack safe. On the upload side, automatically using UTF-8 would probably be good enough. Cheers, Ganesh

Am 12.09.2012 23:57, schrieb Ganesh Sittampalam:
On 12/09/2012 11:09, Christian Maeder wrote:
My main use-case is simpleHTTP that is bound to the String instance, currently. There are no such short-cuts for byte-strings, are there?
That's a good point. I guess I would make simpleHTTP overloaded while I was making breaking changes anyway.
Ah, I thought about something like "simpleByteStringHTTP".
I'ld suggest to make a proper byte-string interface first
What do you mean by "proper"? Unfortunately I don't really have time to do any substantial refactoring in the near future.
Given lots of time now, I'd immediately make high-level and low-level interfaces with encoding only handled in the high-level one.
and then deprecate the String stuff.
Is it possible to deprecate an instance?
I believe, no. So forget deprecation (just document it) but consider to remain backward compatible.
I could perhaps instead provide an escape hatch with a newtype like UnsafeChar8String or something, either temporarily or permanently.
(before calling Char8.pack, strings could be checked or filtered for "isAscii")
The problem is more on the download side; if it's a wide encoding like UTF-16, even 7-bit cleanliness isn't enough to make Char8.unpack safe.
Just to make the string instance work, it is enough to ignore encoding and return only ascii bytes as chars or change bytes 128--255 to a replacement ascii char (i.e. '?'). For proper encodings other functions or (text) instances must be used.
On the upload side, automatically using UTF-8 would probably be good enough.
Cheers,
Ganesh

On 13/09/2012 09:21, Christian Maeder wrote:
Just to make the string instance work, it is enough to ignore encoding and return only ascii bytes as chars or change bytes 128--255 to a replacement ascii char (i.e. '?').
I don't think it's really any better than using Char8.unpack. Depending on the actual encoding you'll get variously broken results either way. Cheers, Ganesh

On 11 September 2012 00:22, Ganesh Sittampalam
So I'm reluctantly drawn to the conclusion that the only reasonable thing to do is to remove the String instances from HTTP completely for now.
I imagine this could be quite disruptive, but on the other hand people using the String instance are getting silently broken behaviour and a couple of people have been bitten by this recently.
Any thoughts?
Yes. And I'd be in favour of removing the class entirely. Just use a single ByteString type. I don't think the overloading buys us anything. As for the effect on cabal-install, I've no problem with making the appropriate fixes. As for the pipes, conduits etc etc. My hope is that will stabilise at some point with a clear right winner and we can adopt one of them, add it to the platform etc. (Personally I hope the "doing it right" approach of pipes works out in practice) Duncan

On 12/09/2012 22:34, Duncan Coutts wrote:
Yes. And I'd be in favour of removing the class entirely. Just use a single ByteString type. I don't think the overloading buys us anything.
Which one should it use, lazy bytestring? I'm not particularly keen on removing the overloading as I don't think keeping it costs much for now and I kind of like the idea. We could even replace String with [Word8] though that seems rather pointless in practice. On the other hand if there's strong feelings in favour of removing it, now is a good opportunity since there'll be a breaking change anyway. Ganesh

On Thu, Sep 13, 2012 at 10:32 AM, Ganesh Sittampalam
Yes. And I'd be in favour of removing the class entirely. Just use a single ByteString type. I don't think the overloading buys us anything.
Which one should it use, lazy bytestring?
Probably yes, assuming we want to retain the ability to lazily stream responses. Which is very nearly the only raison d'etre of the HTTP package at this point.
I'm not particularly keen on removing the overloading as I don't think keeping it costs much for now and I kind of like the idea.
It doesn't cost much, but it also seems to no longer have any benefit, which suggests that it could usefully be dropped.

On 13/09/2012 19:24, Bryan O'Sullivan wrote:
Probably yes, assuming we want to retain the ability to lazily stream responses. Which is very nearly the only raison d'etre of the HTTP package at this point.
Also that it's in the Platform and is kind of needed there for cabal-install.
I'm not particularly keen on removing the overloading as I don't think keeping it costs much for now and I kind of like the idea.
It doesn't cost much, but it also seems to no longer have any benefit, which suggests that it could usefully be dropped.
I view easy switching between lazy and strict bytestrings as a benefit. Ganesh

Ganesh Sittampalam
So I'm reluctantly drawn to the conclusion that the only reasonable thing to do is to remove the String instances from HTTP completely for now.
+1 ...and in case this is open for debate: +1 for getting rid of the typeclass abstraction altogether (like e.g. Duncan suggested)
participants (11)
-
Ben Millwood
-
Brandon Allbery
-
Bryan O'Sullivan
-
Christian Maeder
-
Duncan Coutts
-
Erik Hesselink
-
Ganesh Sittampalam
-
Henning Thielemann
-
Herbert Valerio Riedel
-
Ian Lynagh
-
Vincent Hanquez