HXT: Replace an element with its text

I would like to replace, <body><a href="#">foo</a></body> with, <body>foo</body> using HXT. So far, the closest I've come is to parse the HTML and apply the following stuff: is_link :: (ArrowXml a) => a XmlTree XmlTree is_link = hasName "a" replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (getText >>> mkText) `when` is_link Unfortunately, this just removes the "a" element and its text entirely. The other-closest solution is, replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (txt "foo") `when` is_link Of course, I don't want to hard-code the value "foo", and I can't figure out a way to feed the element's text back into 'txt'. Anyone tried this before?

Hi,
You code fails because a link is not a node of kind Text, I think.
What you want is to get the text from a child node of an anchor node.
I think the following should work:
is_link :: (ArrowXml a) => a XmlTree XmlTree
is_link = hasName "a"
process_link :: (ArrowXml a) => a XmlTree XmlTree
process_link = getChildren >>> getText >>> mkText
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree
replace_links_with_their_text =
processTopDown $ process_link `when` is_link
Cheers,
Ivan.
On 26 June 2012 06:58, Michael Orlitzky
I would like to replace,
<body><a href="#">foo</a></body>
with,
<body>foo</body>
using HXT. So far, the closest I've come is to parse the HTML and apply the following stuff:
is_link :: (ArrowXml a) => a XmlTree XmlTree is_link = hasName "a"
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (getText >>> mkText) `when` is_link
Unfortunately, this just removes the "a" element and its text entirely. The other-closest solution is,
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (txt "foo") `when` is_link
Of course, I don't want to hard-code the value "foo", and I can't figure out a way to feed the element's text back into 'txt'.
Anyone tried this before?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

(And just to be as precise as I can and avoid confusion, when I said
"link" I meant "unnamed anchor node with an href attribute")
On 26 June 2012 10:15, Ivan Perez
Hi, You code fails because a link is not a node of kind Text, I think. What you want is to get the text from a child node of an anchor node. I think the following should work:
is_link :: (ArrowXml a) => a XmlTree XmlTree is_link = hasName "a"
process_link :: (ArrowXml a) => a XmlTree XmlTree process_link = getChildren >>> getText >>> mkText
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ process_link `when` is_link
Cheers, Ivan.
On 26 June 2012 06:58, Michael Orlitzky
wrote: I would like to replace,
<body><a href="#">foo</a></body>
with,
<body>foo</body>
using HXT. So far, the closest I've come is to parse the HTML and apply the following stuff:
is_link :: (ArrowXml a) => a XmlTree XmlTree is_link = hasName "a"
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (getText >>> mkText) `when` is_link
Unfortunately, this just removes the "a" element and its text entirely. The other-closest solution is,
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (txt "foo") `when` is_link
Of course, I don't want to hard-code the value "foo", and I can't figure out a way to feed the element's text back into 'txt'.
Anyone tried this before?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 06/26/12 05:15, Ivan Perez wrote:
Hi, You code fails because a link is not a node of kind Text, I think. What you want is to get the text from a child node of an anchor node. I think the following should work:
Yes, thank you. That makes sense now.
process_link :: (ArrowXml a) => a XmlTree XmlTree process_link = getChildren >>> getText >>> mkText
This works!

Michael Orlitzky wrote
I would like to replace,
<body><a href="#">foo</a></body>
with,
<body>foo</body>
using HXT. So far, the closest I've come is to parse the HTML and apply the following stuff:
is_link :: (ArrowXml a) => a XmlTree XmlTree is_link = hasName "a"
replace_links_with_their_text :: (ArrowXml a) => a XmlTree XmlTree replace_links_with_their_text = processTopDown $ (getText >>> mkText) `when` is_link
processTopDown $ (deep getText >>> mkText) `when` is_link should do it. The "deep getText" will find all Text nodes, independent of the nesting of elements in the <a>...</a> element. If you then write the result into a document every thing is fine. One small problem can occur when the content of the <a> Element has e.g. the form <body><a href="#">foo<b>bar</b></a></body> The resulting DOM then still contains two text nodes, one for "foo" and one for "bar". If you later search for a text "foobar" you don't find a node. The melting of adjacent text nodes can be done with ... (xshow (deep getText) >>> mkText) ... Cheers, Uwe -- Uwe Schmidt FH Wedel Web: http://www.fh-wedel.de/~si/

On 06/26/12 10:39, Uwe Schmidt wrote:
processTopDown $ (deep getText >>> mkText) `when` is_link
should do it. The "deep getText" will find all Text nodes, independent of the nesting of elements in the <a>...</a> element. If you then write the result into a document every thing is fine.
One small problem can occur when the content of the <a> Element has e.g. the form
<body><a href="#">foo<b>bar</b></a></body>
The resulting DOM then still contains two text nodes, one for "foo" and one for "bar". If you later search for a text "foobar" you don't find a node. The melting of adjacent text nodes can be done with
... (xshow (deep getText) >>> mkText) ...
Thanks for elaborating. This is just for display purposes, so hopefully it won't be ever a problem. I'm parsing somebody else's HTML, though, so who knows. I'll make a note in a comment. Thanks again.
participants (3)
-
Ivan Perez
-
Michael Orlitzky
-
Uwe Schmidt