Haskell and XML, need some tips from practioners

Hi everyone, I had to transform a 136 page word document into an XML-Document so that it can be imported into an application. The word document contained records in nested tables. Anyway through a very, very tedious process of xslt-transformations I finally have the XML document I need. But now I need to amend the attributes of some elements with looked up values from the outside, the lookup-key is a particular attribute value of the nodes. This I cannot do through xslt processing as the information needed is not within the xml document. I was thus going to use Haskell instead of an XSLT-processor for this final step. My question to those with experience of the Haskell-XML tools: which one should I use? Günther

From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Günther Schmidt
Anyway through a very, very tedious process of xslt-transformations I finally have the XML document I need.
But now I need to amend the attributes of some elements with looked up values from the outside, the lookup-key is a particular attribute value of the nodes. This I cannot do through xslt processing as the information needed is not within the xml document.
Not the answer you were looking for, but...
Is the lookup table in another XML document? If so then you might well be able to use the document() function, if your xslt processor supports it e.g.

Dear Alistair, after working intensely with XSLT again (with a break for several years), I wholeheartedly concur. You guys are right though, through the document function it would be possible. So any particular tool-set you could recommend? Günther Am 25.02.10 15:01, schrieb Bayley, Alistair:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Günther Schmidt
Anyway through a very, very tedious process of xslt-transformations I finally have the XML document I need.
But now I need to amend the attributes of some elements with looked up values from the outside, the lookup-key is a particular attribute value of the nodes. This I cannot do through xslt processing as the information needed is not within the xml document.
Not the answer you were looking for, but...
Is the lookup table in another XML document? If so then you might well be able to use the document() function, if your xslt processor supports it e.g.
That said, if you can, use Haskell to do all the transformations i.e. avoid xslt altogether. I despise xslt.
Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Günther Schmidt
You guys are right though, through the document function it would be possible.
So any particular tool-set you could recommend?
On Windows, the Microsoft xslt processor supports document(), I believe. Other than that, I have no recommendations. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

Hi Alistair, sorry, misunderstanding. I meant which *Haskell* - XML-tools. Günther Am 25.02.10 15:44, schrieb Bayley, Alistair:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Günther Schmidt
You guys are right though, through the document function it would be possible.
So any particular tool-set you could recommend?
On Windows, the Microsoft xslt processor supports document(), I believe. Other than that, I have no recommendations.
Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

"Günther" == Günther Schmidt
writes:
Günther> Dear Alistair, after working intensely with XSLT again Günther> (with a break for several years), I wholeheartedly concur. Günther> You guys are right though, through the document function it Günther> would be possible. Günther> So any particular tool-set you could recommend? Saxon is by far the best, if you're happy with a java program. (I can hardly recommend my own. Not through modesty, of which I am not over-endowed, but because I refuse to support it since the W3C abolished the concept of XML as self-describing data.) -- Colin Adams Preston Lancashire

"Günther" == Günther Schmidt
writes:
Günther> But now I need to amend the attributes of some elements Günther> with looked up values from the outside, the lookup-key is a Günther> particular attribute value of the nodes. This I cannot do Günther> through xslt processing as the information needed is not Günther> within the xml document. You probably can. Via implementing some custom URI resolver, or an extension function, or such like. Depending upon the xslt implementation. Not that I'm discouraging you from doing it in Haskell instead. -- Colin Adams Preston Lancashire

Günther Schmidt
My question to those with experience of the Haskell-XML tools: which one should I use?
You'll need to evaluate which one fits your needs best; in my mind the
contenders are:
------------------------------------------------------------------------------
xml: http://hackage.haskell.org/package/xml
Small, simple, comprehensible DOM interface with some simple
search + cursor functions, uses String internally. If performance is not
a concern this one is the "nicest" in my opinion.
------------------------------------------------------------------------------
hexpat: http://hackage.haskell.org/package/hexpat
A binding to the expat C library; super-fast as a result, most of the
useful functions from xml have been ported over here. Has support for
SAX parsing. This is the one I usually use when I don't need things like
DTD validation or XPath support (i.e. 100% of the time). Not much in the
way of docs (haddock only) but it's small enough to be comprehensible.
------------------------------------------------------------------------------
HXT: http://hackage.haskell.org/package/hxt
Dauntingly enormous, oodles of features, you need to grok arrows. Uses
String internally. The documentation is pretty iffy -- individual
modules are haddocked pretty well but there are a zillion of them and a
table of contents is sorely needed. Website docs/manuals are of the
"read this wiki page, this paper, and my master's thesis" variety, but
the wiki page is actually pretty good. This is the one I use when I need
a feature hexpat doesn't have, but normally I avoid it if I can because
arrows cause me to "grind the gears".
------------------------------------------------------------------------------
HaXml: http://hackage.haskell.org/package/HaXml
Lots of modules and features here, uses String internally, has the same
"table of contents" issue as HXT, manual seems to consist of an ICFP
paper from 1999, Haddock is a little terse/spotty.
Hope this helps,
G
--
Gregory Collins

Hello Gregory, we have pretty much come to the same conclusion :) Günther Am 25.02.10 17:17, schrieb Gregory Collins:
Günther Schmidt
writes: My question to those with experience of the Haskell-XML tools: which one should I use?
You'll need to evaluate which one fits your needs best; in my mind the contenders are:
------------------------------------------------------------------------------ xml: http://hackage.haskell.org/package/xml
Small, simple, comprehensible DOM interface with some simple search + cursor functions, uses String internally. If performance is not a concern this one is the "nicest" in my opinion.
------------------------------------------------------------------------------ hexpat: http://hackage.haskell.org/package/hexpat
A binding to the expat C library; super-fast as a result, most of the useful functions from xml have been ported over here. Has support for SAX parsing. This is the one I usually use when I don't need things like DTD validation or XPath support (i.e. 100% of the time). Not much in the way of docs (haddock only) but it's small enough to be comprehensible.
------------------------------------------------------------------------------ HXT: http://hackage.haskell.org/package/hxt
Dauntingly enormous, oodles of features, you need to grok arrows. Uses String internally. The documentation is pretty iffy -- individual modules are haddocked pretty well but there are a zillion of them and a table of contents is sorely needed. Website docs/manuals are of the "read this wiki page, this paper, and my master's thesis" variety, but the wiki page is actually pretty good. This is the one I use when I need a feature hexpat doesn't have, but normally I avoid it if I can because arrows cause me to "grind the gears".
------------------------------------------------------------------------------ HaXml: http://hackage.haskell.org/package/HaXml
Lots of modules and features here, uses String internally, has the same "table of contents" issue as HXT, manual seems to consist of an ICFP paper from 1999, Haddock is a little terse/spotty.
Hope this helps,
G

Gregory Collins
xml: http://hackage.haskell.org/package/xml hexpat: http://hackage.haskell.org/package/hexpat HXT: http://hackage.haskell.org/package/hxt HaXml: http://hackage.haskell.org/package/HaXml
After experimenting with a couple of the above, I ended up using tagsoup, which is relatively fast and (very) simple - but useful for just extracting data without validation or any "real" XML stuff. In my case, the files were fairly large, which didn't go down well with the more proper XML parsers I tried. (This may have changed in later times, of course.) -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (5)
-
Bayley, Alistair
-
Colin Paul Adams
-
Gregory Collins
-
Günther Schmidt
-
Ketil Malde