Re: [Haskell-cafe] Get data from HTML pages

1 Sep 2009


      José Romildo Malaquias wrote:
...
Currently the application has an option to indirectly import movie data
from web pages. For that first the user should access the page in a web
browser. Then the user should copy the rendered text in the web browser
into an import window in my application and click an "import" button. In
response the application parses the given text and collects any relevant
data it knows about, using regular expressions.
For instance, to get the director information from a movie in the
AllCenter web site I use the following regular expression:
^Direção:\s+(.+)$
I want to modify this scheme in order to eliminate the need to copy the
rendered text from a web browser. Instead my application should download
and parse the HTML page directly.
Which libraries are available in Haskell that would make it easy to get
content information from a HTML document, in the way described above?
To parse HTML documents, I've had success with TagSoup in the past. You
can take a look at the HTTP package to download the HTML from the
server. Both packages are available from Hackage.

HTH, Jochem

-- 
Jochem Berndsen | jochem@functor.nl