HTML library with DOM? - Haskell-Cafe - Haskell.org

newer
Notes from "Haskell takes over the...

HTML library with DOM?

older
CFP MSCS Issue: Dependently Typed...

Günther Schmidt

7 Oct 2010 7 Oct '10

3 a.m.

Hi all, is there an HTML parsing library that creates a DOM from a page? Günther

Reply

Sign in to reply online Use email software

Show replies by date

Johannes Waldmann

7 Oct 7 Oct

3:11 a.m.

is there an HTML parsing library that creates a DOM from a page?

tagsoup produces trees ( http://hackage.haskell.org/package/tagsoup ) I use it with hxt ( http://hackage.haskell.org/package/hxt ) to tree-walk HTML pages. J.W.

Reply

Sign in to reply online Use email software

Gregory Collins

5:14 a.m.

Günther Schmidt writes:

Hi all,

is there an HTML parsing library that creates a DOM from a page?

I've got the month of October off, and one of the things I've been planning on working on is a compliant HTML5 parser for Haskell -- something which is sorely needed! I will ping the list back if/when I get it finished. G -- Gregory Collins

Reply

Sign in to reply online Use email software

Edward Z. Yang

1:50 p.m.

Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:

I've got the month of October off, and one of the things I've been planning on working on is a compliant HTML5 parser for Haskell -- something which is sorely needed! I will ping the list back if/when I get it finished.

I've heard that some of the existing HTML parsers in Haskell were already HTML5 compliant (this topic came up when I was complaining that there were some algorithms that you absolutely had to have state for, because that was how they were specified.) I never verified this assertion though. Edward

Reply

Sign in to reply online Use email software

Gregory Collins

2:05 p.m.

"Edward Z. Yang" writes:

Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:

...
I've got the month of October off, and one of the things I've been planning on working on is a compliant HTML5 parser for Haskell -- something which is sorely needed! I will ping the list back if/when I get it finished.

I've heard that some of the existing HTML parsers in Haskell were already HTML5 compliant (this topic came up when I was complaining that there were some algorithms that you absolutely had to have state for, because that was how they were specified.) I never verified this assertion though.

If there's already a library which *correctly* parses html5 documents into DOM trees, could someone please let me know so I can use it instead of wasting a bunch of time writing one? Thanks, G -- Gregory Collins

Reply

Sign in to reply online Use email software

Michael Snoyman

6:07 p.m.

2010/10/7 Gregory Collins :

"Edward Z. Yang" writes:

...
Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:

...
I've got the month of October off, and one of the things I've been planning on working on is a compliant HTML5 parser for Haskell -- something which is sorely needed! I will ping the list back if/when I get it finished.

I've heard that some of the existing HTML parsers in Haskell were already HTML5 compliant (this topic came up when I was complaining that there were some algorithms that you absolutely had to have state for, because that was how they were specified.) I never verified this assertion though.

If there's already a library which *correctly* parses html5 documents into DOM trees, could someone please let me know so I can use it instead of wasting a bunch of time writing one?

As far as I know, Neil Mitchel's tagsoup[1] parses according to the HTML 5 parsing rules, but it just generates a list of Tags[2], so you'd have to build the DOM tree up from there. I personally have had great experience with tagsoup. It's even the core of HTML-scraping technology powering searchonce[3]. Michael [1] http://hackage.haskell.org/package/tagsoup [2] http://hackage.haskell.org/packages/archive/tagsoup/0.11.1/doc/html/Text-HTM... [3] http://www.search-once.com/

Reply

Sign in to reply online Use email software

Gregory Collins

6:11 p.m.

Michael Snoyman writes:

As far as I know, Neil Mitchel's tagsoup[1] parses according to the HTML 5 parsing rules, but it just generates a list of Tags[2], so you'd have to build the DOM tree up from there. I personally have had great experience with tagsoup. It's even the core of HTML-scraping technology powering searchonce[3].

Yep, someone else wrote me privately to say this (that tagsoup respects the html5 lexing rules). So I'll be using this as the basis of an html5 DOM parser. Stay tuned! G -- Gregory Collins

Reply

Sign in to reply online Use email software

Neil Mitchell

8 Oct 8 Oct

3:04 a.m.

Yes, I don't think I've officially announced a version of TagSoup that has had HTML 5 parsing, but it now does as standard for the last few releases. The HTML 5 spec is still changing, so it's entirely possible something is incorrect in a corner case, but please let me know and I'll fix it. Thanks, Neil 2010/10/7 Gregory Collins :

Michael Snoyman writes:

...
As far as I know, Neil Mitchel's tagsoup[1] parses according to the HTML 5 parsing rules, but it just generates a list of Tags[2], so you'd have to build the DOM tree up from there. I personally have had great experience with tagsoup. It's even the core of HTML-scraping technology powering searchonce[3].

Yep, someone else wrote me privately to say this (that tagsoup respects the html5 lexing rules). So I'll be using this as the basis of an html5 DOM parser. Stay tuned!

G -- Gregory Collins _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

5571

Age (days ago)

5572

Last active (days ago)

Download

7 comments

6 participants

tags

participants (6)

Edward Z. Yang
Gregory Collins
Günther Schmidt
Johannes Waldmann
Michael Snoyman
Neil Mitchell