[patch] #562: cabal-install update fails going through a HTTP proxy

Dear Cabal maintainers, A couple of days ago I was unable to cabal-install a library from Hackage -- 00-index.tar.gz was not downloaded completely. And if obtaining one with wget, other .tar.gz files (packages) still could not be fetched in full. This problem has been reported already (see http://hackage.haskell.org/trac/hackage/ticket/562). 06/11/09 09:49:25 changed by duncan:
Why has this started cropping up all of a sudden? Never seen this before then 3 reports in as many days. Do we suspect HTTP-4000.0.6 -> 7 perhaps?
06/11/09 13:02:47 changed by michaeldever:
So it's definitely a problem with the HTTP package in my opinion. I'm not sure if it is a problem with the packages proxy handling, as it does download some of the package, but not all of it.
Seeing as both the Zlib library, and tar yield an end of stream error, its something that I'm reckoning is happening during transport.
06/13/09 09:05:26 changed by michaeldever:
The bug is not in HTTP API, but in the way cabal-install uses it. The type of HTTP response body is polymorphic within `HTTP' library (rspBody :: Response a -> a) but it is specialized to Lazy.ByteString by cabal-install's `getHTTP' function (Distribution/Client/HttpUtils.hs). Once the type of response body is changed to _strict_ ByteString, files get downloaded through proxy completely. The attached module [proxy-POC.hs] makes this quite apparent (you need to be behind a proxy; HTTP >= 4000.0.8): vvv@takeshi:~/src$ time runhaskell proxy-POC.hs Content-Length: 1200593 bytes downloaded: 2408 proxy-POC.hs: user error (sizes differ) real 0m1.210s user 0m0.556s sys 0m0.052s vvv@takeshi:~/src$ time runhaskell -DSTRICT proxy-POC.hs Content-Length: 1200593 bytes downloaded: 1200593 real 0m17.956s user 0m0.620s sys 0m0.028s vvv@takeshi:~/src$ runhaskell proxy-POC.hs # repeatable Content-Length: 1200593 bytes downloaded: 2408 proxy-POC.hs: user error (sizes differ) There are only 4 lines that need to be changed (2 in HttpUtils.hs and 2 in Fetch.hs); see the accompanying patch. ...And could anyone explain me, why don't lazy ByteString cause cropped downloads in proxy-free environment? Thank you. -- vvv

On Sat, 2009-09-19 at 00:50 +0300, Valery V. Vorotyntsev wrote:
Dear Cabal maintainers,
Thanks very much for investigating this Valery. It's great that you have shed some light on this previously mysterious bug.
The bug is not in HTTP API, but in the way cabal-install uses it.
The type of HTTP response body is polymorphic within `HTTP' library (rspBody :: Response a -> a) but it is specialized to Lazy.ByteString by cabal-install's `getHTTP' function (Distribution/Client/HttpUtils.hs).
Once the type of response body is changed to _strict_ ByteString, files get downloaded through proxy completely.
But that is exactly what makes me think it's a bug in the HTTP library. The HTTP library provides instances for String, strict ByteString and lazy ByteString. With one instance provided by the HTTP library your test program fails and with the other it works. It not completely implausible that it could be the fault of the way we use the HTTP library. For example if we were holding onto the lazy ByteString for a long period without demanding all of it then perhaps that could upset the network flow by causing timeouts or something, however I don't think anything like that is going on here. The code pretty swiftly takes the response and writes the content out to disk. As I mentioned in the Cabal ticket, I'd be very interested in hearing Sigbjorn's diagnosis before considering whether we want to work around the problem by switching to strict ByteString. If at all possible I would prefer to stick to lazy ByteString. Duncan

On Sat, Sep 19, 2009 at 3:17 AM, Duncan Coutts
But that is exactly what makes me think it's a bug in the HTTP library. The HTTP library provides instances for String, strict ByteString and lazy ByteString. With one instance provided by the HTTP library your test program fails and with the other it works.
I see...
It not completely implausible that it could be the fault of the way we use the HTTP library. For example if we were holding onto the lazy ByteString for a long period without demanding all of it then perhaps that could upset the network flow by causing timeouts or something, however I don't think anything like that is going on here. The code pretty swiftly takes the response and writes the content out to disk.
As I mentioned in the Cabal ticket, I'd be very interested in hearing Sigbjorn's diagnosis before considering whether we want to work around the problem by switching to strict ByteString. If at all possible I would prefer to stick to lazy ByteString.
Okay, let's wait then. Thank you, Duncan. -- vvv
participants (3)
-
Duncan Coutts
-
Valery V. Vorotyntsev
-
valery.vv@gmail.com