[GSoC] WWW::Mechanize-like package for Haskell

Hi, I'm interested in working on a library for a stateful web browsing in Haskell during Google Summer of Code. The basic idea is described at http://hackage.haskell.org/trac/summer-of-code/ticket/1107. WWW::Mechanize is a ready to use library written in Perl, though I used python's mechanize when I wrote some simple scripts (http://wwwsearch.sourceforge.net/mechanize/), which provides much cleaner interface. Anyway, it gives simple and convenient way to retreive web-sites, to handle cookies, history and to process retrieved content and forms. There are basics of it Network.Browser module from Haskell's HTTP library (http://hackage.haskell.org/packages/archive/HTTP/3001.0.4/doc/html/Network-B...), but it's ugly (uses unsafePerformIO for error reporting) and lacks a greater part of needed functionality. My aim is to greatly improve Network.Browser module and to make coding small scripts with it in more functional way possible. At this moment it uses BrowserAction state monad. Though, the deadline is approaching, I still seek some ways to improve my proposal. So here are the questions: are there any other data structures that will make programming with this library more convenient, besides simple state monad? Should we contrive more sophisticated system with other other separate data structures? What other improvements you'd like to see? Thanks in advance for any advice. -- Max.

On 7 apr 2008, at 14.46, Max Desyatov wrote:
Hi,
I'm interested in working on a library for a stateful web browsing in Haskell during Google Summer of Code. The basic idea is described at http://hackage.haskell.org/trac/summer-of-code/ticket/1107. WWW::Mechanize is a ready to use library written in Perl, though I used python's mechanize when I wrote some simple scripts (http://wwwsearch.sourceforge.net/mechanize/), which provides much cleaner interface. Anyway, it gives simple and convenient way to retreive web-sites, to handle cookies, history and to process retrieved content and forms. There are basics of it Network.Browser module from Haskell's HTTP library (http://hackage.haskell.org/packages/archive/HTTP/3001.0.4/doc/ html/Network-Browser.html), but it's ugly (uses unsafePerformIO for error reporting) and lacks a greater part of needed functionality.
My aim is to greatly improve Network.Browser module and to make coding small scripts with it in more functional way possible. At this moment it uses BrowserAction state monad. Though, the deadline is approaching, I still seek some ways to improve my proposal. So here are the questions: are there any other data structures that will make programming with this library more convenient, besides simple state monad? Should we contrive more sophisticated system with other other separate data structures? What other improvements you'd like to see?
It doesn't have to be perfect. Make sure you know how to use monad transformers. Also take a look at tag soup and the various HTML/XML parsers. I'm sure there's plenty to work on. My guess would be, that you try to write non-trivial example applications and see what is needed. For example, you could write a script to download/upload a Haskell wiki page logging in if necessary. Take a look of what other WWW::Mechanize packages are used. That kind of stuff. Also, for a GSoC proposal you should try to convince the mentors, why your project is useful for Haskell in general. So maybe you have some more arguments there, too. / Thomas

Max Desyatov wrote:
I'm interested in working on a library for a stateful web browsing in Haskell during Google Summer of Code.
Thomas Schilling wrote:
Also, for a GSoC proposal you should try to convince the mentors, why your project is useful for Haskell in general. So maybe you have some more arguments there, too.
There are obviously zillions of uses for automating interaction with web pages. Perhaps you are asking why not just do it Perl or Python? One classic application is unit testing and stress testing of web-based applications. There are also various lambdabot plugins that could be vastly cleaned up and extended. So it would be extremely useful to have an improved library for this in Haskell. The next step would be to build various nice testing libraries on top of this for QuickCheck and various web programming frameworks, but that's already too much for one GSoC project. Something we could look forward to though. Regards, Yitz

On 7 apr 2008, at 15.36, Yitzchak Gale wrote:
Max Desyatov wrote:
I'm interested in working on a library for a stateful web browsing in Haskell during Google Summer of Code.
Thomas Schilling wrote:
Also, for a GSoC proposal you should try to convince the mentors, why your project is useful for Haskell in general. So maybe you have some more arguments there, too.
There are obviously zillions of uses for automating interaction with web pages. Perhaps you are asking why not just do it Perl or Python?
One classic application is unit testing and stress testing of web- based applications. There are also various lambdabot plugins that could be vastly cleaned up and extended.
So it would be extremely useful to have an improved library for this in Haskell.
The next step would be to build various nice testing libraries on top of this for QuickCheck and various web programming frameworks, but that's already too much for one GSoC project. Something we could look forward to though.
I proposed this project last year, so I believe in its usefulness. But you just gave some very good reasons why mentors should rank Max's proposal highly. So I hope Max will incorporate the above motivations in his proposal. :) / Thomas

On Mon, Apr 7, 2008 at 4:11 PM, Thomas Schilling
It doesn't have to be perfect. Make sure you know how to use monad transformers. Also take a look at tag soup and the various HTML/XML parsers. I'm sure there's plenty to work on.
My guess would be, that you try to write non-trivial example applications and see what is needed. For example, you could write a script to download/upload a Haskell wiki page logging in if necessary. Take a look of what other WWW::Mechanize packages are used. That kind of stuff.
Also, for a GSoC proposal you should try to convince the mentors, why your project is useful for Haskell in general. So maybe you have some more arguments there, too.
/ Thomas
There's many benefits of having such library in Haskell: improved automated testing (as Yitzchak Gale mentioned) due to pure nature of inner algorithms (BrowserAction can be pure and be transformed into IO only on demand), static typing (just hate a bunch of stupid bugs while writing all those scripts in python/perl). We can use powerful HTML/XML parsers available there in Haskell (HXT with its "arrowed" XML filters). Haskell community will definitely benefit from such library. Firstly, as I see, in indirect way: I know many people that don't want to use or learn more about Haskell, saying it lacks libraries for their everyday work. Network libraries still aren't "cool" enough, and personally I want to improve them at least to the point when I can say "look! here's the network libraries and they aren't worse than yours, even better: pure and checked!" :). Secondly, new libraries are useful for the community directly, we have aforementioned lambdabot, e.g. I'd like to write some bots looking at new changes at haskellwiki or something like that, thing I'm doomed to code in sh&curl/perl/python now. -- Max
participants (3)
-
Max Desyatov
-
Thomas Schilling
-
Yitzchak Gale