
On Sunday 28 January 2007 09:14, Neil Mitchell wrote:
Hi Alistair,
Is there a simple way to get the contents of a webpage using Haskell on a Windows box?
This isn't exactly what you want, but it gets you partway there. Not sure if LineBuffering or NoBuffering is the best option. Line buffering should be fine for just text output, but if you request a binary object (like an image) then you have to read exactly the number of bytes specified, and no more.
This works great for haskell.org, unfortunately it doesn't work as well with the rest of the web universe.
With www.google.com I get: Program error: <handle>: IO.hGetChar: illegal operation
With www.slashdot.org I get: 501 Not Implemented returned
www.msnbc.msn.com works fine.
Any ideas why?
At the very least it's missing the HTTP version on the request line, and you almost always need to send a Host header. For a start you could try changing client to: client server port page = do h <- connectTo server (PortNumber port) hSetBuffering h NoBuffering putStrLn "send request" hPutStrLn h ("GET " ++ page ++ " HTTP/1.1\r") hPutStrLn h ("Host: " ++ server ++ "\r") hPutStrLn h "\r" hPutStrLn h "\r" putStrLn "wait for response" readResponse h putStrLn "" Note that I haven't tried this, or the rest of Alistair code at all, so the usual 30 day money back guarantee doesn't apply. It certainly won't handle redirects.
Are there any alternatives to read in a file off the internet (i.e. wget but as a library)
The http library sort of works most of the time, but there are several bugs that cause it to fail on many 'in the wild' webservers. HXT has a wrapper around a command line invocation of cURL. It works better. There is still a problem with redirects, but thats an easy enough fix. I doubt that it would be very easy to extract it from the surrounding HXT framework though. It would be nice to have a binding to libcurl. Daniel